These forums have been archived and are now read-only.

The new forums are live and can be found at https://forums.eveonline.com/

EVE Technology Lab

 
  • Topic is locked indefinitely.
 

How to deal with huge amount of market data?

First post
Author
Aleksey Rzhegov
Aliastra
Gallente Federation
#1 - 2017-02-16 15:14:48 UTC  |  Edited by: Aleksey Rzhegov
About 2 months ago i started a development of an application that would analyze market data for a certain set of regions. Something like "Sell->Buy Order Tool" on EVE-Central with blackjack and lambdas.
When i first tried to download a list of orders for all regions it took about 9 minutes and the whole volume of data was about 300-400 mb, even month later it was pretty much the same.
But now it can take hours to download all available pages for only "The Forge" region and amount of data is going over 3,5 Gb (I just stopped downloading at this point, I'm pretty sure there are much more data)
Was it all just a fake stub data back then?
Now I really don't know what to do with such amount of data.
The biggest problem for me in this case is that some orders may become outdated on the time I would finish downloading them and there is no way to refresh certain set of orders. As far as I understand each page has dynamic list of orders that are not bound to anything.
I can see that CREST provides pretty much the same market api as ESI so there are probably already established ways to solve this problem.
Could someone please give me an advise on that issue?

(BTW I know that EVE-Central is open source, but I'm not into Scala.)
Steve Ronuken
Fuzzwork Enterprises
Vote Steve Ronuken for CSM
#2 - 2017-02-16 17:12:04 UTC
That's interesting. I _think_ you've got a bug.

I'm downloading the entire market and it's around the 170MB mark. And that's for _everything_

If you can read python https://github.com/fuzzysteve/FuzzMarket/blob/master/scripts/aggloader-esi.py may be of interest.

It's how I'm downloading everything (including public citadels. There's a config file I've not included which has some details in it)

Woo! CSM XI!

Fuzzwork Enterprises

Twitter: @fuzzysteve on Twitter

Aleksey Rzhegov
Aliastra
Gallente Federation
#3 - 2017-02-16 17:34:37 UTC  |  Edited by: Aleksey Rzhegov
Well it's pretty hard for me to understand PHP, but the algorithm I use to receive data is pretty simple:
I load page after page for a certain region until there are 0 orders on the page.
Is it correct way to do it? At least it worked before.
  • Edit: I tried one of the links from logs that supposed to return 10 000 orders on ESI page and it has returned an empty array. ( for example: https://esi.tech.ccp.is/latest/markets/10000002/orders?datasource=tranquility&page=4809&order_type=all )
    There is totally some kind of bug here...

    Thanks, Steve.
    Aleksey Rzhegov
    Aliastra
    Gallente Federation
    #4 - 2017-02-16 17:49:07 UTC  |  Edited by: Aleksey Rzhegov
    Steve Ronuken
    Fuzzwork Enterprises
    Vote Steve Ronuken for CSM
    #5 - 2017-02-16 18:40:16 UTC
    there's no orders on page 4809. That's why it's an empty array.

    Ask for each page in turn, checking the size of the returned array. When it's empty, stop asking.

    Woo! CSM XI!

    Fuzzwork Enterprises

    Twitter: @fuzzysteve on Twitter

    Aleksey Rzhegov
    Aliastra
    Gallente Federation
    #6 - 2017-02-16 20:10:39 UTC  |  Edited by: Aleksey Rzhegov
    Steve Ronuken wrote:
    there's no orders on page 4809. That's why it's an empty array.

    Ask for each page in turn, checking the size of the returned array. When it's empty, stop asking.

    That is exactly what i did, but URL was wrong all along (as i described in my previous comment) and was actually returning the first page so it has reached page 4809 before failing with some server-side error.

    Thanks again.
    Tonto Auri
    Vhero' Multipurpose Corp
    #7 - 2017-02-16 23:44:22 UTC

    Because "https://esi.tech.ccp.is/latest/markets/10000002/orders" is not the same address as "https://esi.tech.ccp.is/latest/markets/10000002/orders/"…

    It's that simple.
    Just because you were redirected from one to another on some website doesn't mean the next one would do the same. It is not obliged to.

    Two most common elements in the universe are hydrogen and stupidity. -- Harlan Ellison

    Aleksey Rzhegov
    Aliastra
    Gallente Federation
    #8 - 2017-02-17 04:39:29 UTC  |  Edited by: Aleksey Rzhegov
    Tonto Auri wrote:

    Because "https://esi.tech.ccp.is/latest/markets/10000002/orders" is not the same address as "https://esi.tech.ccp.is/latest/markets/10000002/orders/"…

    It's that simple.
    Just because you were redirected from one to another on some website doesn't mean the next one would do the same. It is not obliged to.

    Yes it's not the same and it's not obliged to, but it does not look like a good practice to me.
    Especially when it just redirects to the first page instead of returning some error.
    Tonto Auri
    Vhero' Multipurpose Corp
    #9 - 2017-02-19 12:45:05 UTC
    Not a good practice? Aye.
    Can be handled better? I concur.
    But as in any other cmplicated system, there's a point where you have to make a choice between feature and performance.
    In this case, of course, I'd rather accept 400 or 404, than a redirect.

    Two most common elements in the universe are hydrogen and stupidity. -- Harlan Ellison