These forums have been archived and are now read-only.

The new forums are live and can be found at https://forums.eveonline.com/

EVE Technology Lab

Forum Index

EVE Forums » EVE Technology and Research Center » EVE Technology Lab » How to deal with huge amount of market data?

Topic is locked indefinitely.

How to deal with huge amount of market data? First post
Author

Previous Topic Next Topic

Aleksey Rzhegov

Aliastra

Gallente Federation

Likes received: 0

#1 - 2017-02-16 15:14:48 UTC | Edited by: Aleksey Rzhegov

About 2 months ago i started a development of an application that would analyze market data for a certain set of regions. Something like "Sell->Buy Order Tool" on EVE-Central with blackjack and lambdas.
When i first tried to download a list of orders for all regions it took about 9 minutes and the whole volume of data was about 300-400 mb, even month later it was pretty much the same.
But now it can take hours to download all available pages for only "The Forge" region and amount of data is going over 3,5 Gb (I just stopped downloading at this point, I'm pretty sure there are much more data)
Was it all just a fake stub data back then?
Now I really don't know what to do with such amount of data.
The biggest problem for me in this case is that some orders may become outdated on the time I would finish downloading them and there is no way to refresh certain set of orders. As far as I understand each page has dynamic list of orders that are not bound to anything.
I can see that CREST provides pretty much the same market api as ESI so there are probably already established ways to solve this problem.
Could someone please give me an advise on that issue?

(BTW I know that EVE-Central is open source, but I'm not into Scala.)

Steve Ronuken

Fuzzwork Enterprises

Vote Steve Ronuken for CSM

Likes received: 6,759

#2 - 2017-02-16 17:12:04 UTC

That's interesting. I _think_ you've got a bug.

I'm downloading the entire market and it's around the 170MB mark. And that's for _everything_

If you can read python https://github.com/fuzzysteve/FuzzMarket/blob/master/scripts/aggloader-esi.py may be of interest.

It's how I'm downloading everything (including public citadels. There's a config file I've not included which has some details in it)

Woo! CSM XI!

Fuzzwork Enterprises

Twitter: @fuzzysteve on Twitter

Aleksey Rzhegov

Aliastra

Gallente Federation

Likes received: 0

#3 - 2017-02-16 17:34:37 UTC | Edited by: Aleksey Rzhegov

Well it's pretty hard for me to understand PHP, but the algorithm I use to receive data is pretty simple:
I load page after page for a certain region until there are 0 orders on the page.
Is it correct way to do it? At least it worked before.

Edit: I tried one of the links from logs that supposed to return 10 000 orders on ESI page and it has returned an empty array. ( for example: https://esi.tech.ccp.is/latest/markets/10000002/orders?datasource=tranquility&page=4809&order_type=all )
There is totally some kind of bug here...

Thanks, Steve.

Aleksey Rzhegov

Aliastra

Gallente Federation

Likes received: 0

#4 - 2017-02-16 17:49:07 UTC | Edited by: Aleksey Rzhegov

Nice... it was all because there was no / after orders in the link:
Correct:
https://esi.tech.ccp.is/latest/markets/10000002/orders/?datasource=tranquility&page=4809&order_type=all
Incorrect:
https://esi.tech.ccp.is/latest/markets/10000002/orders?datasource=tranquility&page=4809&order_type=all
which would be the same as
https://esi.tech.ccp.is/latest/markets/10000002/orders/
just why? Ugh


Steve Ronuken Fuzzwork Enterprises Vote Steve Ronuken for CSM Likes received: 6,759	#5 - 2017-02-16 18:40:16 UTC 1 there's no orders on page 4809. That's why it's an empty array. Ask for each page in turn, checking the size of the returned array. When it's empty, stop asking. Woo! CSM XI! Fuzzwork Enterprises Twitter: @fuzzysteve on Twitter

Aleksey Rzhegov

Aliastra

Gallente Federation

Likes received: 0

#6 - 2017-02-16 20:10:39 UTC | Edited by: Aleksey Rzhegov

Steve Ronuken wrote:

there's no orders on page 4809. That's why it's an empty array.

Ask for each page in turn, checking the size of the returned array. When it's empty, stop asking.

That is exactly what i did, but URL was wrong all along (as i described in my previous comment) and was actually returning the first page so it has reached page 4809 before failing with some server-side error.

Thanks again.

Tonto Auri

Vhero' Multipurpose Corp

Likes received: 300

#7 - 2017-02-16 23:44:22 UTC

Aleksey Rzhegov wrote:

Because "https://esi.tech.ccp.is/latest/markets/10000002/orders" is not the same address as "https://esi.tech.ccp.is/latest/markets/10000002/orders/"…

It's that simple.
Just because you were redirected from one to another on some website doesn't mean the next one would do the same. It is not obliged to.

Two most common elements in the universe are hydrogen and stupidity. -- Harlan Ellison

Aleksey Rzhegov

Aliastra

Gallente Federation

Likes received: 0

#8 - 2017-02-17 04:39:29 UTC | Edited by: Aleksey Rzhegov

Tonto Auri wrote:

Yes it's not the same and it's not obliged to, but it does not look like a good practice to me.
Especially when it just redirects to the first page instead of returning some error.


Tonto Auri Vhero' Multipurpose Corp Likes received: 300	#9 - 2017-02-19 12:45:05 UTC Not a good practice? Aye. Can be handled better? I concur. But as in any other cmplicated system, there's a point where you have to make a choice between feature and performance. In this case, of course, I'd rather accept 400 or 404, than a redirect. Two most common elements in the universe are hydrogen and stupidity. -- Harlan Ellison