These forums have been archived and are now read-only.

The new forums are live and can be found at https://forums.eveonline.com/

EVE Technology Lab

 
  • Topic is locked indefinitely.
 

Why are we not combining market data, player devs?

Author
Ilyk Halibut
Deep Core Mining Inc.
Caldari State
#221 - 2012-04-20 21:02:13 UTC
This isn't related to the spec, but as a side note, please set User-Agent strings for the uploader! I don't think the EVE Marketeer uploader does this, and that makes nginx access logs sad.

EVE Market Data Relay - A real-time feed of EVE Market data http://www.eve-emdr.com

Desmont McCallock
#222 - 2012-04-20 21:38:13 UTC
Something else. Is there any meaning in uploading a message for an item that doesn't have any orders?
Is there any valuable data to that?
Dragonaire
Here there be Dragons
#223 - 2012-04-21 14:56:29 UTC
Desmont McCallock - You are correct that adding a 'z' for UTC is allowed even preferred but having no zone offset would mean the DT was in local zone which isn't useful in this application Blink What you are probably running into is the server is in UTC so .NET doesn't think it needs to add it. I had to do this in PHP to make it do what I needed so you might need to do something similar
gmdate('c', strtotime($row[$k] . '+00:00'));
You could append just 'z' but I felt it would make it easier for anyone that has to parser it manually to always work with numbers.

Nothing jumps out at me as invalid accorded to the format in your example other than the DT issue.

As to empty order messages I guess it would depend on how the receiver site uses that info. If they take it to mean there aren't any orders but they show some that could be useful information but it'll depend on what each site decided which would be outside of what the format cover of course Blink

Finds camping stations from the inside much easier. Designer of Yapeal for the Eve API. Check out the Yapeal PHP API Library thread.

Desmont McCallock
#224 - 2012-04-21 15:28:05 UTC
Thanks Dragonaire for your input.

I noticed though that EMDR is feeding with data that don't have the "+00:00" suffix. Shouldn't that be corrected?

As for the empty orders the only useful data I can think of is the "generatedAt" info, as this plays part in indicating how fresh are the data. So I decided to include the empty orders in the feed.
Ilyk Halibut
Deep Core Mining Inc.
Caldari State
#225 - 2012-04-21 16:41:11 UTC
Desmont McCallock wrote:
Thanks Dragonaire for your input.

I noticed though that EMDR is feeding with data that don't have the "+00:00" suffix. Shouldn't that be corrected?

I noticed that ISO 8601 has a few different ways to denote timezone offsets. I've seen Z00:00, T00:00, and +00:00. I'm not really sure which way to go here. Can we all agree on the +00:00?

Desmont McCallock wrote:

As for the empty orders the only useful data I can think of is the "generatedAt" info, as this plays part in indicating how fresh are the data. So I decided to include the empty orders in the feed.

EMDR is currently filtering out empty orders, since I was concerned about that being a really easy way to wipe data items off of badly written sites, but I guess that may be the price to pay for having a badly written site. For anyone consuming the market data, would you like these empty records? I'm cool with whatever.

EVE Market Data Relay - A real-time feed of EVE Market data http://www.eve-emdr.com

Dragonaire
Here there be Dragons
#226 - 2012-04-21 17:00:08 UTC  |  Edited by: Dragonaire
I've added some more info to the format page to help make it clearer what is expected with the dates.

Edit: read the Time zone designators section as it makes it clear what is allowed but to directly answer your question the +00:00 format is preferred. The +00 form is also allowed by the standard but is not recommended either.

Something that was never directly said is that the extended format is also preferred but the condensed format without '-' and ':' can be used as well. Just remember that if you use either the whole date/time must use the same format including the offset.

Remember one of the ideas in developing this format is that someone should be able to read the data and understand it without ever having seen anything about the unified uploader format before. So a date/time that looks like this 2012-05-12T10:50:00+00:00 is understandable but 120512T105000+00 is not.

Finds camping stations from the inside much easier. Designer of Yapeal for the Eve API. Check out the Yapeal PHP API Library thread.

Ilyk Halibut
Deep Core Mining Inc.
Caldari State
#227 - 2012-04-22 00:27:38 UTC
I have corrected this in EMDR:

2012-04-21T20:25:55+00:00

EVE Market Data Relay - A real-time feed of EVE Market data http://www.eve-emdr.com

Ilyk Halibut
Deep Core Mining Inc.
Caldari State
#228 - 2012-04-23 19:24:49 UTC
I need clarification on a bit here:

Quote:
Forwarders should not modify anything outside of “uploadKeys” and “currentTime” in the message. If they do they should remove any array elements for endpoints in “uploadKeys” where they don't understand how the “key” is formed to prevent a possibly bad message from being past to that endpoint which can cause unnecessary extra work for them.


EMDR is currently changing currentTime, and, like the spec says, may eventually have its hands in uploadKeys. To do that, however, I have to re-generate the JSON.

The problem is, I am then technically the JSON generator, and the "generator" field in the spec says:

Quote:
An object with a “name” and “version” pairs to identify the JSON message generator


Given that I'm changing currentTime with each relayed message, am I also going to need to change generator, since I am technically generating the updated JSON? The first quote above seems to forbid that, but the second one seems to say I should.

EVE Market Data Relay - A real-time feed of EVE Market data http://www.eve-emdr.com

Desmont McCallock
#229 - 2012-04-23 20:11:54 UTC  |  Edited by: Desmont McCallock
This also means that any forwarder that re-generates the JSON message has to change the generator field as well.

If this is allowed it gives an additional tool for duplicates detection.
Dragonaire
Here there be Dragons
#230 - 2012-04-24 05:04:35 UTC
Ilyk Halibut - You have some good points and it does need to be clearer what is a forward, generator, etc. I'll try to make a stab at it here and get some input from everyone.

Endpoint - Something that receives a message from either a generator and/or a forwarder and uses it in some way. A endpoint may also be a forwarder but is more commonly some kind of market research site, etc.

Forwarder - Something that forwards a message from a generator to something else and may only modify the "currentTime" and "uploadKeys" as stated in the existing docs.

Generator - Something that gets data directly/indirectly from Eve database either from cache files of Eve client or MarketOrder API and first sends the data as a message to an endpoint. Other things that would be considered a generator would be any endpoints that receive messages from generators and/or forwarders and then merges the messages or the data in the messages from the multiple sources together and then sent the combine data as a new message.

So let me know if that makes things clearer or if anyone have anything else to add to it etc. Everyone's input on this is welcome as we try to better define stuff.

Finds camping stations from the inside much easier. Designer of Yapeal for the Eve API. Check out the Yapeal PHP API Library thread.

Desmont McCallock
#231 - 2012-04-24 07:11:02 UTC  |  Edited by: Desmont McCallock
Slightly oftopic but may be affecting all cache scrapper libraries, so it makes it kinda relative.

Escalation patch notes wrote:
Technical

New file format for objects now result in faster loading and more efficient caching.

Edit: After doing some testing in SiSi, nothing seems to be affected. Everything works normal. I'll evaluate it again after patch goes live.

Edit2: Nothing seems to be affected. Everything works normally.
Kaladr
Viziam
Amarr Empire
#232 - 2012-04-25 04:31:24 UTC
Desmont McCallock wrote:

Edit2: Nothing seems to be affected. Everything works normally.


Good...

Creator of EVE-Central.com, the longest running EVE Market Aggregator

Desmont McCallock
#233 - 2012-04-25 15:29:31 UTC  |  Edited by: Desmont McCallock
This is my attempt to explain how the market data network should be structured and work and how the message in Unified Format (from now on called UF) should be and used.

The network is currently consisted of the following entities:

  • A: Data Uploader.
  • B: Data Aggregator.
  • C: Data Relay.

The traffic between entities is described below.

  1. A can send data to B and C.
  2. B can send data to C. B can gather data by subscribing to C or another B via API.
  3. C broadcasts the data it’s receiving, without targeting anyone specifically.

Let’s examine the roles of each entity in the network.

Data Uploader

Let’s explain what A should be doing.
A is generating a UF message by gathering market data either from an EVE cache scrapper or the EVE API. The message falls under the specs described at: http://dev.eve-central.com/unifieduploader/start . Special attention should be given when constructing the “uploadKeys” field. This field purpose is to provide to a B type entity, info that is related to it, like a specific ‘key’ used to identify a user (bind to the entity’s DB) and give credits for uploading data. Although the use of such a function is optional for a B, the field has been added to the specs in order to provide such a possibility.
Quote:
Example: “uploadKeys”: [
{ “name” : “Probably the name of a B”, “key” : ”A key provided by a B”},
{ “name” : “Probably the name of another B”, “key” : ”A key provided by another B”},
{ “name” : “Probably the name of C”, “key” : ”Something irrelevant as C doesn’t provide keys or blank”}
]
A special issue was raised when I was constructing EMUU and it involved the exposition of the ‘key’ values as a potential security risk.

There are two ways to construct a UF message.

  1. Constructing the message specifically for each endpoint with the appropriate info in the “uploadKeys” field and send it only to that endpoint.
  2. Constructing the message with info for all endpoints in the “uploadKeys” field and send the same message to all endpoints.

Case 1 is counterproductive but safe.
Case 2 is the most efficient but exposes user info to receiver.

As I want EMUU to be most efficient, thinking about case 2 I came to the conclusion that exposing the ‘key’ value of a user to others has no particular value to them. If someone decided to use another ones key, as own key, he will only end up giving credits to the owner of the key.

continued...
Desmont McCallock
#234 - 2012-04-25 15:30:01 UTC
Data Aggregator

Let’s explain what B should be doing.
B can receive data from three different sources, that is A, B and C. Usually data from another B is provided via API and should not be forwarded to C as this is not the role of B in the network. Data from A is received in the form of an UF message (or another format compatible to B, which is outside of this document scope). Data from C is received solely in the form of an UF message.
Usually a B will have two different scripts for processing UF messages from A and C, so let’s see each one separately. Processing data from another B is out of this document scope.

  1. Processing UF from A (script 1)
  2. Usually this script’s job would be to validate the UF message (check its format is according to the UF specs), if need be process the provided data and store them and optionally send the message to C.
    Note here that if a reference to C is found in the “uploadKeys” field then there is no meaning in sending the UF message to C, as it has already been sent. This can also be used as duplicate data detection criteria, as the same message is going to be received and processed by script 2, so the script can pass that message (if C is still in the network).
    Special attention should be given when B decides to send the received UF message to C (acting as a forwarder). According to the UF specs, a B is allowed to modify only the “currentTime” and “uploadKeys” fields. In the case of the “uploadKeys” field, B should erase all ‘key’ values as there is no point in forwarding them to C. They are irrelevant to anyone else.
    Note here that if a key for another B is present in the “uploadKeys” field, this UF message has already been sent to that B.
    Additionally a reference to C should be added in the “uploadKeys” field.
    Quote:
    Example: “uploadKeys”: [
    { “name” : “Probably the name of C”, “key” : ”Something irrelevant as C doesn’t provide keys or blank”}
    { “name” : “Probably the name of the B”, “key” : ””},
    { “name” : “Probably the name of another B”, “key” : ””},
    ]

    Now, if the question “why should a B retain info other than C in that field?” pops up in your head, the answer is given further below.

  3. Processing UF from C (script 2)
  4. Usually this script’s job would be to validate the UF message (check its format is according to the UF specs), and if need be process the provided data and store them. A special job for this script is to detect if the received UF message has already been processed by script 1, as the message may have been sent to C from B itself or from another B.

    Now, here comes the tricky part that needs attending by the UF specs and specs be modified accordingly to provide a way to easily detect duplicate messages.
    The ‘generator’ field can’t be used because usually that field would be an A or in some special cases C (see section “Data Relay”). B could be a possible generator but that is outside of its role in the network.
    Parsing the info of the “uploadKeys” field could yield possible detection candidates. The ‘key’ value can’t be used as a B may have sent that UF message to C while having previously erased the ‘key’ values. If A has sent that UF message, the ‘key’ value may be there, but still is not a permanent way of detection.
    The ‘name’ value could be a possible detection candidate. In an ideal world the value of ‘name’ in the “uploadKeys” field would probably be the name of B. But because this is a value created by an A and A is forced to use a user entered text for this value (as it can’t determine the name of B by any other means), it’s highly improbable to use that value as a detection criteria. Imagine the user entering as name: “I don’t like that site but I’m sending anyway”.
    The only value that is partially constant is the gateway URL (and I say partially because a B could change its gateway URL without an A being able to know). From my point of view the URL is a good candidate as detection criteria and could also provide some routing info.
    Another possible solution would be for C to provide an API that will provide info about the B participating in the network such as “official name”, “URL”, “server status”. In that case an A could implement an info retrieving mechanism (like ‘online endpoints’, Ilyk and Callean knows what I’m talking about) and just have the user select to which endpoint to upload by checking a check box. This also provides standardized info that can be used in “uploadKeys” field and then be used as duplicate detection criteria.



Data Relay

Let’s explain what C should be doing.
Basically the job of C is to receive UF messages from A and B, validate the UF message (check its format is according to the UF specs) and broadcast it back to the subscribers. During this process, according to UF specs, it may update the “currentTime” field, although I don’t see it being necessary as the time difference will only be a matter of seconds.
Additionally C could act as a data collector from sources that are not participating in the network but provide a way of getting their data (via API, or SMTP for example). In this case C acts as an A.
C has the easiest job of all. Big smile


Brain storming sessions now open.
Dragonaire
Here there be Dragons
#235 - 2012-04-25 15:30:02 UTC  |  Edited by: Dragonaire
I'm not sure if the lack of comments on my last post is a good thing or not Blink Ok so you posted your stuff at the same time I did P

Finds camping stations from the inside much easier. Designer of Yapeal for the Eve API. Check out the Yapeal PHP API Library thread.

Desmont McCallock
#236 - 2012-04-25 15:32:06 UTC
I just posted my comments. lol
Dragonaire
Here there be Dragons
#237 - 2012-04-25 16:01:16 UTC
Quote:
Note here that if a reference to C is found in the “uploadKeys” field then there is no meaning in sending the UF message to C, as it has already been sent.
That's assuming that C actually received it which might not have happened.

Quote:
Special attention should be given when B decides to send the received UF message to C (acting as a forwarder). According to the UF specs, a B is allowed to modify only the “currentTime” and “uploadKeys” fields. In the case of the “uploadKeys” field, B should erase all ‘key’ values as there is no point in forwarding them to C. They are irrelevant to anyone else.
I can see cases where someone might want the uploadKeys for farther processing so just dropping them without reason is IMHO a bad idea. They should only be dropped in cases where either something has changed that makes them invalid or where it is unknown if they are still valid.

As to detecting duplicates it seem everyone is ignoring the simplest thing. The orderID and when it was generated and received. If you are receiving something from a source that matches existing data you need to only determine if its an update or outdated. You may also what to weight how trusted the source is etc but that's really out of scope of what UF's purpose is.

Finds camping stations from the inside much easier. Designer of Yapeal for the Eve API. Check out the Yapeal PHP API Library thread.

Desmont McCallock
#238 - 2012-04-25 16:08:33 UTC  |  Edited by: Desmont McCallock
Dragonaire wrote:
That's assuming that C actually received it which might not have happened.
The whole message is :
Quote:
Note here that if a reference to C is found in the “uploadKeys” field then there is no meaning in sending the UF message to C, as it has already been sent. This can also be used as duplicate data detection criteria, as the same message is going to be received and processed by script 2, so the script can pass that message (if C is still in the network).
Dragonaire wrote:
I can see cases where someone might want the uploadKeys for farther processing so just dropping them without reason is IMHO a bad idea. They should only be dropped in cases where either something has changed that makes them invalid or where it is unknown if they are still valid.
I'm talking about the 'key' value there. I'm quite specific. I'm not talking about dropping the entire content of the "uploadKeys" field.
Dragonaire wrote:
As to detecting duplicates it seem everyone is ignoring the simplest thing. The orderID and when it was generated and received. If you are receiving something from a source that matches existing data you need to only determine if its an update or outdated. You may also what to weight how trusted the source is etc but that's really out of scope of what UF's purpose is.
And what about trips to the DB? We are looking for an efficient way here, not hammering the DB.
Snarf Aldes
University of Caille
Gallente Federation
#239 - 2012-04-25 18:20:41 UTC
Dragonaire wrote:

As to detecting duplicates it seem everyone is ignoring the simplest thing. The orderID and when it was generated and received. If you are receiving something from a source that matches existing data you need to only determine if its an update or outdated. You may also what to weight how trusted the source is etc but that's really out of scope of what UF's purpose is.

The question is, can we discard incoming data by just looking at it's origin and the data itself?
If we can, we don't need to do any further processing, which is a big plus for aggregating sites.

At the ,moment i have to accept all data, and later process it. If the data i have in the DB is newer, i discard the data, if not, I update the DB.
But this requires costly calls to the database.

At the moment, only about 33% of the uploads i am seeing are unique.

Creator of Eve Addicts

Ilyk Halibut
Deep Core Mining Inc.
Caldari State
#240 - 2012-04-25 18:57:09 UTC
Snarf Aldes wrote:

At the moment, only about 33% of the uploads i am seeing are unique.

This is a very important number in this discussion, and makes a lot of sense given how EMDR currently gets its data:

  • EVE Central
  • EVE Marketeer
  • EVE Marketdata
  • EVE Addicts
  • Direct user uploads

The top three are all defaults in the EVE Marketeer Upload client, which is probably one of the most (if not the most) widely used uploader.

If EVEMon starts uploading market data by default, and sends it to multiple endpoints, we're looking at a ton of duplicate data, unless EMDR stops accepting relayed data from all of the other endpoints included in EVEMon's default settings.

So Dragonaire, I think it's important to look at this in terms of the volume of data and duplicates that we are looking at in early June or so when EVEMon's uploader may go active. This concern is being voiced by quite a few now, so we do need to consider any ways to help pick dupes out, even if it just makes it easier to cross-check data.

EVE Market Data Relay - A real-time feed of EVE Market data http://www.eve-emdr.com