These forums have been archived and are now read-only.

The new forums are live and can be found at https://forums.eveonline.com/

EVE Technology Lab

 
  • Topic is locked indefinitely.
 

Why are we not combining market data, player devs?

Author
Dragonaire
Here there be Dragons
#121 - 2011-12-07 00:00:54 UTC
First problem we had was coming up with a list of field that would be used to form the hash or if using UUID version 3 clients would have to know their URL which usually is hard to come up with with most providers plus since it's usually based on IP which is served to them by DHCP also worthless. What if their neighbor also plays Eve and gets the same IP but upload from a different region but the UUID would be the same etc problems. There is that plus many other problems try to come up with one that works for everyone. The methods that could solve these type of problems had problem in some parts of the world because that use to high of encryption as well and those are just the ones I've thought of.

Finds camping stations from the inside much easier. Designer of Yapeal for the Eve API. Check out the Yapeal PHP API Library thread.

Dragonaire
Here there be Dragons
#122 - 2011-12-07 06:18:32 UTC
Ok sorry for the short reply earlier but was trying to do it in 5 minutes at the end of my lunch Blink Let me go over in more detail why IMHO it's a waste of time and bandwidth to add anything like an ID or any of the other proposed ideas. I'm going to use as my example a SHA1 of the complete package except of coarse itself Smile

So the client computes the SHA1 and adds it into the JSON message before sending it. First notice it has to make the message without itself, compute the SHA1, then add it to the message somewhere which is a couple of extra steps already but since it on the client the extra work isn't really a problem. Now it send the message to say EMK Blink

The EMK server receives the message and now has to decide what to do with it. One thing it can do as you said is check the SHA1 and decide if it's a dup. This is a bad idea and I'll show why. Let's say the sender actually is into metagaming and has decided to start sending fake market information to sites like EMT to game the Eve market. He has a script that generates huge amounts of fake market data based on the actual ingame data but all the real data from everyone else keeps getting in his way so he decides to start compute the SHA1 of the real data but put his fake data in the message instead so when EMK receive the real data it ignores it as a dup instead. So after a little while you as the developer on EMK figure out what is happening and start computing the SHA1 on the data to detect the fake data. Let's look at what it takes to do that.

First you need to pull the SHA1 out of the message so it's not in your way when you compute your own from the message, then you compare that with the one you received to try to decided if this one is the fake.

Now the question I have is if you are going to have to figure the SHA1 yourself on the server anyway why would you want to waste EMK's incoming bandwidth on something your just going to end up doing anyway plus extra work splitting out the sent SHA1 and doing the comparisons etc?

I could go on to show other examples with all the other methods but in the end all of them fail because they are based on trusting the data received from the client that you have no control over.

Now that I've pointed out why none of the simple ideas work let's look at one that can but why it has issues as well.

Something that will let you be some what reassured that everything is on the up and up would be to use a full digital signing system where they use the site's public key and their personal private key etc but that requires that every site and uploader to get the keys from a trusted party and keep them secure etc. and usually cost some RL money for one or both parties and will require some additional overhead computing and in bandwidth as will. I don't think anyone is going to do this it just doesn't make sense to work that hard for a little ingame data that isn't even real.

So to summarize it. It's a waste of time having the client compute anything which doesn't decrease the server load and any ID etc can be fake so it is useless. In addition it wastes bandwidth and computing resources that can be better used for other things on the site server.

Finds camping stations from the inside much easier. Designer of Yapeal for the Eve API. Check out the Yapeal PHP API Library thread.

Callean Drevus
Perkone
Caldari State
#123 - 2011-12-07 09:35:04 UTC
http://en.wikipedia.org/wiki/Universally_unique_identifier
Version 4

Seems like exactly the thing that solves all the trouble. I'm not actually interested in a hash of the data of any sort. I'm just interested in a semi random number that could possibly identify that specific upload, like the autoincrementing ID column in a database. I'm not interested in whether it actually identifies the data in that upload. Just that I haven't seen it before.

For all I care this could be a random number in the 64 bit range, but a random UUID seems like a more standardized way to do it.

If some faker would like to try and guess the same totally random uuid with fake data, he can be my guest :P

Developer/Creator of EVE Marketeer

Dragonaire
Here there be Dragons
#124 - 2011-12-07 15:37:00 UTC
Ok my point is you've already received the whole thing before you can even check the UUID and at that point it only takes a couple microseconds at worse to make a SHA1 yourself to check if it's a dup. It's going to take as long to receive the UUID as it does to calculate the SHA1 or a MD5 of the whole message with as small as the message is. Part of my point is if you really want to do something like that you can simple use it in 'uploadKeys' and it won't interfere with anyone else doing something else. Both way get you to the same point just one way let's the method used change where the other doesn't allow for changes without causing problem for everyone else if they don't want too or have a different idea how to do things kind of like what's going on now Blink

Just to make it clear I have no problem with the idea of UUID or anything else but from the start no one has shown where the extra bandwidth used has any real benefits. People keep seeming to think in one way or another it'll make their life easier but in the end you'll still have to decide if it a dup or not and probably validate the data on top of that so you might as well just do it yourself in the server. For what you actually are wanting to do you might as well just be looking at whether it's the same regionID and typeID and done at about the same time. I guess it's just me but how I see it is the actual orderID and data is what's important here and you're going to have to look at them to figure it out if you really have something new or not and there nothing that says a client can't be sending the same random UUID or different one with the same data. Once again there's nothing you can trust from the client that will do what you want.

I know you've posted on other threads about having to find ways to filter out bad or fake market data etc so why would you be so ready to trust them not to do things to fake stuff with this and now trust them?

Finds camping stations from the inside much easier. Designer of Yapeal for the Eve API. Check out the Yapeal PHP API Library thread.

Dragonaire
Here there be Dragons
#125 - 2011-12-07 16:53:25 UTC  |  Edited by: Dragonaire
Ok so I guess what we need to get everyone's input on a couple things here and decided which way to go. I've done my best to point out the problems and failing I see in trying to add any kind of ID/hash etc to the JSON message but if that's what most people want I'll go along with it because have the common format for everyone to use is what's really the most important thing.


Question One
Edit: Should we include some kind of universal ID like one of the UUID versions or a crc32, md5 or whatever in the message or leave it up to each site to decide what they want and put it in 'uploadKeys' if they decide they want it?

For clients it will make little difference as each site is likely to require them to add additional info to 'uploadKeys' anyway to give credit to user etc.

Question Two
If the answer to the above is yes what are we going to use?

  • crc32 - simple to make and use and should have few problems with dups.
  • md5 - Also simple to make and use and chances of dups on the small data sizes of most messages is going to be nil.
  • sha1 - Takes a bit more to make and probably overkill for something this small.
  • UUID version 3/5 - By itself isn't enough since it's basically the URL/hostname which for most clients is a problem because of getting their IP from DHCP and there's a high risk of more than one Eve player getting it and causing conflicts.
  • UUID version 4 - Numbers totally random as long as client has a good source of truly random number generator but most have at best a relatively poor pseudo-random source instead.
  • something else ?


Question three
If using one of the hash/digest methods above what parts of the message should be used or should the whole message be used?

Finds camping stations from the inside much easier. Designer of Yapeal for the Eve API. Check out the Yapeal PHP API Library thread.

Callean Drevus
Perkone
Caldari State
#126 - 2011-12-07 19:17:26 UTC  |  Edited by: Callean Drevus
As I see it, the difference in our opinions stems from the fact that we do not understand what the other means. I'll repeat again, that I'm not interested in using this for any kind of validation, just in determining whether I am going to do anything with the upload at all.

As for the answers to your questions:
1. Yes. Using uploadkeys is not an alternative, as it's not a uuid per uploader, but per message, besides, if not everyone implements it, the whole use becomes moot.

2. UUID 4. It's guaranteed to be pretty much random per client, and the chances of two clients generating the same UUID at the same time are about as small as tripping over a specific grain of sand during your two week journey in the desert. I would like to point out that the clients will (if implemented correctly) be using different seeds for their pseudo random numbers so they will never be generating the same sequences.

3. If people choose a digest method, it do not really care which it is, as I will not be using it :P

Regardless of this whole discussion. I've started working on a new uploader, supporting limitless endpoints, multiple clients and threading. It is starting to look better by the minute. Considering it's the first Python app I'm building, it's going fairly well :P

Developer/Creator of EVE Marketeer

Tonto Auri
Vhero' Multipurpose Corp
#127 - 2011-12-07 22:48:20 UTC
As I've not folloved the discussion closely, my question would probably be rather stupid, but still:
What is the intended purpose of the field? As Dragonaire, I don't see any use for it, at any point.

Two most common elements in the universe are hydrogen and stupidity. -- Harlan Ellison

Callean Drevus
Perkone
Caldari State
#128 - 2011-12-07 23:42:11 UTC  |  Edited by: Callean Drevus
Consider this:

A client is happily uploading his data to a certain website, using their proprietary uploader, which only uploads to that specific website. Luckily, that website exposes an API that allows you to retrieve recent uploads. A retrieving website knows that all uploads that have gone to that website and are retrieved are unique and probably new.

In the following scenario:

A client is happily uploading using an uploader with multiple endpoints, the data is being received by both site A and site B, however, not everyone is smart enough to use the universal uploader and some people still upload to only site A or site B. Both of the websites have an API to retrieve orders from the others, so this isn't much of a problem, when site B pulls all orders from site A, he checks whether the uploads he just retrieved from A have not already been processed because they were also uploaded to B.
The owner of site B is happy there is a single identifier, as it saves him the trouble of having to think of his own messy alghorithm to check whether two uploads are or are not equal.
It also saves him the lookup of all uploads and calculating their MD5 hashes, because he was too lazy to compute them when the data was uploaded. Most of all, he just thinks it's convenient.

Now, I could simply hope that all EVE Central users will suddenly move to use the EVE Marketeer uploader (or the universal uploader I'm now developing), but I just don't see that happening, and that means that IF we are going to implement this universal format, we'll have to consider sending data to and fro between websites. That is the single purpose for which I think this UUID would be useful. And being able to see an upload as a single entity, instead of as a blob of data. Most of all, it's just convenience.

I'm open for anyone who wishes to change my point of view, but either it hasn't happened yet, or we aren't talking about the same thing, so I hope the above illustration makes it a bit clearer.



And the Univeral Uploader is working :D I just have to add the EVE central upload format again, and it's be a threaded uploading monster.

Developer/Creator of EVE Marketeer

Dragonaire
Here there be Dragons
#129 - 2011-12-08 00:12:45 UTC  |  Edited by: Dragonaire
EDIT: I didn't see you last post above answering Tonto Auri since we appearly were both doing them at the same time but I still think my post here cover the same ground I would have anyway just a little differently then if I had saw it.
PS: I also forgot before to say I'm glad to hear the good news on the new uploader I was concerned it might take a while to catch up the work that was lost.
Quote:
I'll repeat again, that I'm not interested in using this for any kind of validation, just in determining whether I am going to do anything with the upload at all.
I do understand that but why I'm saying it won't really do what you think it will. You can have someone sending different data but the same ID, different IDs with the same data and both of the first with faked data. There's also the possibility of a third party doing a man in the middle attack and doing the same things. So if the ID has no dependable connection with the actual data then you're just wasting your time having it. You end up either ignoring good data or getting mostly bad data if someone decides to.

Easiest way to make it worthless is to simple start sending the same fake data and count up through the ID range while doing so and you fill everyone's database with lots of useless data that will make it impossible to use the good anymore. That's where at least to hashes have an advantage as when the received and local one don't match you know something is wrong.

You also stated you basically want something that can act like a primary key from a database but the only way you can end up with something that does that is to have a common list where there's a single point in charge of issuing them which we just don't have.

This problem is one that's been around since before there were even computers but the solutions to it haven't really changed which is you can't do it without a single point of control.

Another way to look at it is as a synchronization problem to insure you don't have different issuers issuing the same timestamp for different groups of data.

There's other ways to look at this problem as I've shown above but they all come down to you have to trust the person giving you the data is being truthful which we all know you can't trust anything coming from a unknown party on the Internet Big smile If you can trust you have to verify it and the thing that needs verified is the data which your method ends up ignoring.

EDIT: Just one more point about the ID is when pulling data from other site there nothing saying they will forward the same set of data they got from the uploaders I would in fact myself be only forwarding verified data not just everything I've received by default if it was me. I might provide a raw feed as well but I would make it clear it's a bad idea to use it and the cooked version should be used. If sites don't do that in some of the example above it'll lead to all of them being brought down and possibly need to drop all they're tables to recover because there's no way to sort the bad from the good.

Finds camping stations from the inside much easier. Designer of Yapeal for the Eve API. Check out the Yapeal PHP API Library thread.

Kaladr
Viziam
Amarr Empire
#130 - 2011-12-08 06:10:25 UTC
Dragonaire wrote:

Question One
Edit: Should we include some kind of universal ID like one of the UUID versions or a crc32, md5 or whatever in the message or leave it up to each site to decide what they want and put it in 'uploadKeys' if they decide they want it?


Some food for thought:

Upload_keys should be an API key entry for sites that require it (currently non EVE-Central). This is something unique to the user (potentially).

As for UUID? Nice idea in principal, but, as Dragonaire pointed out, its just another piece of data. I know data into EVE-Central is faked at times, and there really isn't anything to prevent that from happening. Even cooked data is barely cooked, since there is no "more trustworthy" vs "less trustworthy" source.

UUID would be simply there to prevent routing loops if sites syndicate amongst themselves by pushing to an API.

If they don't, there is no need for UUID. If the site wants to detect duplicate data, it can do so its self.

Creator of EVE-Central.com, the longest running EVE Market Aggregator

Tonto Auri
Vhero' Multipurpose Corp
#131 - 2011-12-08 12:43:48 UTC
Damn... I hate EVE forum.

Ok, short version.
Don't mind the key, you're looking into wrong direction.
You're discussing data interchange protocol in context of data interchange format. You're wrong.

Two most common elements in the universe are hydrogen and stupidity. -- Harlan Ellison

Tonto Auri
Vhero' Multipurpose Corp
#132 - 2011-12-08 13:06:26 UTC  |  Edited by: Tonto Auri
Protocol: you have 3 cases for data exchange.
1. Client - originator of the data submit data to aggregator.
2. Aggregator pushing data to another aggregator.
3. Aggregator feeding clients (such as mining buddy or griefwatch).

Case 1 and 3 are basically the same - receiving party don't mind the data being sent to it. While internally it makes a difference, on protocol level it is just that - a data been send without hesitation, caring less about anything barring the authentication check. Then the client send a hash table followed with data upload blocks. If, after reading the data, server was unable to find all hashes mentioned in header, it sending back the list of hashes he want to be resent. And so on, until all data is sent, or server gets tired of stupid client.

Case 2. Here we have an issue. Your party aggregator would want raw data to run it's own sanity checks over it. Here comes the additional step to data submission. Before pushing the bundle, originating aggregator-client sends a list of hashes he want to communicate, exploiting the protocol into a situation, where none of the hashes corresponding to data block. Receiving server sift through his own pool and send back the list he want to see. The transmission going the same way as the orinating upload.

We need the way to calculate hash for uploads. Yes. But it has nothing to do with data submission block format.

JSON format you've devised so far allowing for submission of multiple data blocks in single push, which, to me, is a good thing. The only problem I see is that EVERY block would have sender signature. Should probably put it into a separate block prepending the data blocks? Or, put it into HTTP header...

Add: gzip'ing the data would be desirable. Do you know if there'd be any major issues if client set "Content-Transfer-Encoding: gzip" or the like, and pack the data accordingly?

Two most common elements in the universe are hydrogen and stupidity. -- Harlan Ellison

Dragonaire
Here there be Dragons
#133 - 2011-12-08 17:29:03 UTC
Quote:
2. Aggregator pushing data to another aggregator.
One thing here it won't be push it'll be pull so receiver is ask for what it wants to see.

The only place push is used is from the client-originator to the aggregator and like you said aggregator always wanting the data though it does need a way to deal with a stupid client that keeps sending it the same old data but they have to do that no matter what so still don't need anything in the message as only the aggregator can really decided if the client is being stupid by somehow comparing the data to what it already has received.

Quote:
3. Aggregator feeding clients (such as mining buddy or griefwatch).
Once again they would be pulling. It wasn't clear if you where thinking push or pull here but once again they asked for it with pull so just like the aggregator above they'll have to decided if it's new. There might be a possibility here to use if-modified-since header which would be nice and save transfer dups and I think the aggregators should implement it even if not every receiver will use it.

Note that if-modified-since will also work with 2 above as well since the plan was to use the same feeds for both. The aggregator may need to start tracking when stuff is updated but I'm thinking most of them are in some way already so shouldn't take any real changes to be able to respond correctly to if-modified-since headers.

Quote:
Add: gzip'ing the data would be desirable. Do you know if there'd be any major issues if client set "Content-Transfer-Encoding: gzip" or the like, and pack the data accordingly?
This is at the HTTP level and really out side the JSON message. Like always if the client and server both support it there's no reason it can't be used.

Just to make it clear the JSON message that is being sent will be the data using normal HTTP protocols though nothing says they have to use port 80 etc just that that's the easiest by far. This is all planned as a RESTful service so using normal POST, GET, PUT, etc.

Quote:
JSON format you've devised so far allowing for submission of multiple data blocks in single push, which, to me, is a good thing. The only problem I see is that EVERY block would have sender signature. Should probably put it into a separate block prepending the data blocks? Or, put it into HTTP header...
You're touching on something I've been thinking about if in a little different way. As it is now there is both stuff like 'regionID' and 'generator' in the metadata header together. That's fine and I basically was move what had been GET parameters into the message from the original system but it could be improved IMHO as well.

If there instead was a metadata header for the 'generator' part with 'resultType', 'version', 'uploadKeys', 'currentTime', and of course 'generator' and the 'regionID' and 'typeID' where moved down into the 'result' some where then inside the result you could have multiple blocks of data for say the same 'typeID' but different regions or the other way around or even more than one of each so clients could even upload everything from the cache in one message which would save a lot of bandwidth for the aggregators not having to receive each as a separate connection and not having a lot of extra overhead being told over and over again who the data was from.

IMHO this would greatly streamline the whole process. I don't know how much of a change it would require in the aggregators to work with multiple 'data sections' but I can see it really helping them with their network bandwidth usage with more uploaders that they may end up with having a unified uploader etc that we are working to have. I'll look at adding in a new section on the wiki my ideas how the new JSON would look so we can look at them side by side and see what people think.

Finds camping stations from the inside much easier. Designer of Yapeal for the Eve API. Check out the Yapeal PHP API Library thread.

Zaepho
Goosefleet
Gooseflock Featheration
#134 - 2011-12-08 18:13:09 UTC
Dragonaire wrote:
Quote:
2. Aggregator pushing data to another aggregator.
One thing here it won't be push it'll be pull so receiver is ask for what it wants to see.

The only place push is used is from the client-originator to the aggregator and like you said aggregator always wanting the data though it does need a way to deal with a stupid client that keeps sending it the same old data but they have to do that no matter what so still don't need anything in the message as only the aggregator can really decided if the client is being stupid by somehow comparing the data to what it already has received.


I think that he's referring to aggregator sites pushing automatically the raw reports they receive to another aggregator site. Similar to the service that Eve-Central offered via SMTP and EVE-Metrics did via AMQ (if i recall correctly). I suppose this could also be done via a pull method but it may be desirable as a push.

Data Clients that he mentioned II agree with you, they should be requesting data and would likely receive processed data (resultant prices etc) rather than raw order data. There may be exceptions to this but I suspect those will be the minority.
Dragonaire
Here there be Dragons
#135 - 2011-12-08 18:24:34 UTC
Yes I understand some of them now use push but I think pull would be much better as there's a lot less work for them since there's no need to track who needs what sent to them. Amost all the old services that have used push in the past on the Internet have moved to pull instead. RSS is a good example of a pull service that killed off several old push services that existed before because it simply works better. I see no problem with those that wish to continue their push stuff doing so but IMHO the new stuff should be pull.

I've put the purposed new format up for people to look at http://dev.eve-central.com/unifieduploader/start

Finds camping stations from the inside much easier. Designer of Yapeal for the Eve API. Check out the Yapeal PHP API Library thread.

Zaepho
Goosefleet
Gooseflock Featheration
#136 - 2011-12-08 20:07:08 UTC
Dragonaire wrote:
Yes I understand some of them now use push but I think pull would be much better as there's a lot less work for them since there's no need to track who needs what sent to them. Amost all the old services that have used push in the past on the Internet have moved to pull instead. RSS is a good example of a pull service that killed off several old push services that existed before because it simply works better. I see no problem with those that wish to continue their push stuff doing so but IMHO the new stuff should be pull.

I've put the purposed new format up for people to look at http://dev.eve-central.com/unifieduploader/start


I don't disagree, just wanted to be absolutely clear and insure everyone was on the same page. (so i posted the idiot line to either be sure or get flamed to death :) )
Kaladr
Viziam
Amarr Empire
#137 - 2011-12-08 22:18:23 UTC
Dragonaire wrote:

I've put the purposed new format up for people to look at http://dev.eve-central.com/unifieduploader/start


I made results a list (otherwise it wouldn't work at all). Otherwise I'm good with the multi-payload option.

As for gzip, etc. Thats something HTTP can solve (Accept-Encoding, et al). EVE-Central already advertizes and will happily gzip/deflate all content automatically (I just globally turned on mod_deflate, it hasn't caused any CPU usage issues).

Creator of EVE-Central.com, the longest running EVE Market Aggregator

Callean Drevus
Perkone
Caldari State
#138 - 2011-12-08 22:48:03 UTC
Quote:
Easiest way to make it worthless is to simple start sending the same fake data and count up through the ID range while doing so and you fill everyone's database with lots of useless data that will make it impossible to use the good anymore. That's where at least to hashes have an advantage as when the received and local one don't match you know something is wrong.

You also stated you basically want something that can act like a primary key from a database but the only way you can end up with something that does that is to have a common list where there's a single point in charge of issuing them which we just don't have.

This problem is one that's been around since before there were even computers but the solutions to it haven't really changed which is you can't do it without a single point of control.


While I can see I'm outmatched in my wishes of wanting a global ID, I really have to refute anything you write here.

1. You cannot count up through the ID range with a UUID, you could do it, but you'd waste such an immense amount of time on setting up a connection every time you wanted to send something that the impact you'd have on the whole would be negligible. Anyone generating a UUID randomly would have a random pick of 5,316,911,983,139,663,491,615,228,241,121,400,000 possiblities. Good luck having any influence on that range. My server would give out long before anyone was able to insert that many records in my database.
2. UUID is designed to solve the problem of multiple databases not having a unique global identifier. (http://debuggable.com/posts/why-uuids:48c906cc-7a6c-4f22-9e20-6ffd4834cda3)
3. Thus, the solution has actually changed, namely, UUID.

In my statements I was indeed assuming that every site would implement a service much like the mails EVE Central is sending now. I'd personally going for AMQ as soon as I get myself over the barrier of learing the intricacies of rabbitMQ. But I'm convinced that push is the nicest way to do this, especially for market aggregation sites.

Now that we're discussing this, I finally understand why there are 3 different market sites, we all have such different opinions that it would be impossible to work together on a single one. Besides that, I have difficulty understanding half of the post here (or rather, my mind switches into TL;DR mode very quickly).

And jup, the uploader actually went pretty fast, I did work almost a full day on it though, it will be this weekend before I'm able to continue and finish it.

Developer/Creator of EVE Marketeer

Tonto Auri
Vhero' Multipurpose Corp
#139 - 2011-12-08 23:34:04 UTC
Dragonaire wrote:
> 2. Aggregator pushing data to another aggregator.
One thing here it won't be push it'll be pull so receiver is ask for what it wants to see.

Two moments:
Receiver have no idea of what is available to send, or that is there ANY data available to send. Yes, it could use blind poll in hope there' something new, but let's we not strafe from the original subject.
We're discussing aggregator-to-aggregator syndication, an interaction between two web services, supposedly running around the clock, with nearly persistent availability. To me, it is more desirable to have data pushed towards the other party, when it's available, and only push what the party want, to save the time and traffic.
OTOH, pulling data at irregular intervals could be desirable for standalone clients, like EVEMon or jEveAssets, but that kind of pulls is limited to processed data.
As I said, you have two data sets, and each of them is related to specific demands. I agree that polling processed data through RESTful URL's would simplify one end of the pole, but on the other end, you propose to have both push and pull for the same (=raw) data, depends on the originator. Is this desirable?

Quote:
The only place push is used is from the client-originator to the aggregator and like you said aggregator always wanting the data though it does need a way to deal with a stupid client that keeps sending it the same old data but they have to do that no matter what so still don't need anything in the message as only the aggregator can really decided if the client is being stupid by somehow comparing the data to what it already has received.

You seem to have misunderstood my message a little in regard to client-server stupidity. I'll touch that later.

Quote:
> 3. Aggregator feeding clients (such as mining buddy or griefwatch).
Once again they would be pulling. It wasn't clear if you where thinking push or pull here but once again they asked for it with pull so just like the aggregator above they'll have to decided if it's new. There might be a possibility here to use if-modified-since header which would be nice and save transfer dups and I think the aggregators should implement it even if not every receiver will use it.

Possibility... Yes, I know what you speaking about, but if I can avoid endless polling, I would do so. They could subscribe for data pushes and set a schedule at which they want the data fed to them, if it's meaningful, or just let the server send data, when available.
If they want only certain categories, they could only subscribe to them.
Imagine this scenario: a killboard want median prices for all items in the game - a fairly big list of entities, but actually very little data associated to them. The problem is preparing that data. It does not happen instantly, and creating the median upon request is hardly the sane idea, even less so - recalculating it on every upload. To wear off the load, you run a cron job at certain point in time to slice your pie the way you want it.
Our respectable client (the killboard) have only rough idea of when that script is run, or when the fresh data would be ready, but he do want it fresh.
Variant I: Client poll the data "if-modified-since" every now and then. Client will need to have something like cron job running, and care for return value of the request, etc.
Variant II: Client subscribe to the data feed and will have the data pushed to it when it's ready. It'll need to implement the submission outlet of the aggregation api and then it could just sit and happily wait for new data.
Which variant you'd prefer, if you have choice?

Quote:
Note that if-modified-since will also work with 2 above as well since the plan was to use the same feeds for both. The aggregator may need to start tracking when stuff is updated but I'm thinking most of them are in some way already so shouldn't take any real changes to be able to respond correctly to if-modified-since headers.

Again, the "If-modified-since" header's purpose is to notify about changes to a data source that is changed irregularly, or requested at irregular intervals.
With raw market data, it is always changed. Rather - it's always new. The issue is that it could be duplicated (say, pilot A using multi-feed uploader, pilot B only uploading to EVE-c. EM will want the data from EVE-c, but only these from pilot B, as it's already have data from pilot A), or absent (if something broken, or you gone wrong and forcing everyone to crawn through RESTful URL's in search for new uploads).

Quote:
> Add: gzip'ing the data would be desirable.
This is at the HTTP level and really out side the JSON message. Like always if the client and server both support it there's no reason it can't be used.

Just to make it clear the JSON message that is being sent will be the data using normal HTTP protocols though nothing says they have to use port 80 etc just that that's the easiest by far. This is all planned as a RESTful service so using normal POST, GET, PUT, etc.

My question was, if you know any HTTP clients, that could have issues dealing with packed JSON data. I'm only working with PHP, and I know that, at the very least, I could decode the stream manually, before feeding it to parser.

Two most common elements in the universe are hydrogen and stupidity. -- Harlan Ellison

Tonto Auri
Vhero' Multipurpose Corp
#140 - 2011-12-08 23:35:27 UTC
Callean Drevus wrote:
Anyone generating a UUID randomly would have a random pick of 5,316,911,983,139,663,491,615,228,241,121,400,000 possiblities.

Google S3ViRGE versus Delphi 4.

Two most common elements in the universe are hydrogen and stupidity. -- Harlan Ellison