These forums have been archived and are now read-only.

The new forums are live and can be found at https://forums.eveonline.com/

EVE Technology Lab

 
  • Topic is locked indefinitely.
 

XML API timing out

Author
Ydnari
Estrale Frontiers
#1 - 2015-04-17 23:01:55 UTC
Seeing various endpoints timing out today after some data received, but apparently not all of it; seeing in my logs:

[Task\UpdateIndustryJobsHistoryTask] INFO Failed to connect to API: Operation timed out after 90000 milliseconds with 880910 bytes received

Previously I had a 30 second timeout configured, I bumped it to 90s but it's not helped.

Endpoints I can see in the logs are alliances, jobs history, active industry jobs. notifications, character sheets - so it's not all large ones.

--

salacious necrosis
Garoun Investment Bank
Gallente Federation
#2 - 2015-04-18 03:30:19 UTC
Can you report what time you saw this? I show exactly three timeouts in my logs today around 12:47 EDT today. Otherwise, my logs are clean.

Use EveKit ! - Tools for EVE Online 3rd party development

Ydnari
Estrale Frontiers
#3 - 2015-04-18 12:55:22 UTC
Logs here:

http://pastebin.com/raw.php?i=VYCmVKck

These logs go back to 2014 there's the odd few here and then, but from 2015-04-16T13:51:32+00:00 onwards there's a whole load of this new one

They all have some data received but then end with a timeout - normally if there's a network problem then it's 0 bytes received.

There's lots of other requests that do work, but I am getting the same from the commandline too - I requested the AllianceList API ( curl https://api.eveonline.com/eve/AllianceList.xml.aspx ) and it got some distance through but then hangs.

All the partial responses are around 900kb-1Mb. Is there something going on with the API servers that's limiting things?

This is only happening on one server, which is running a site used by a few people, pulling from 42 API keys. I'm respecting the cache timers and even have a rate limiter on, not that I get near the published rate limits. From my desktop machine I can pull the large endpoints OK.

--

salacious necrosis
Garoun Investment Bank
Gallente Federation
#4 - 2015-04-18 13:59:15 UTC
Wow, you're seeing this constantly. I'm pulling about 100 keys all day long as fast as the cache timers allow and I'm not seeing anything, so I don't think it's anything systemic. If your IP were being blocked, I'd expect the connection would drop immediately without seeing any data, so it's probably not that.

The fact that you only see this on one server is highly suspect. If this is linux and you have access to the logs you might poke around syslog a bit. Some other things to try:

- run ping with several hundred packets (e.g. ping -c 300) to make sure you can consistently reach the API servers
- try curl with verbose to get some debug info against the API servers
- try nc (netcat) to hit the API servers, another way to get some lower level debug info


Use EveKit ! - Tools for EVE Online 3rd party development

Ydnari
Estrale Frontiers
#5 - 2015-04-18 14:49:01 UTC
It's pulling the other smaller payload responses OK, connectivity appears fine, nothing suspect in syslog, no unusual firewall, I can't find any significant recent changes on my side.

zKill had a big red notice up earlier in the week saying they had to slow down API fetches as they'd been falsely hit by some CCP anti-DDOS measure - holding open connections like this is one common countermeasure, perhaps I've been added to a list somewhere?

Sounds like I am pulling less data than you though...

I wonder... well, I tried a request for AllianceList piped to another process to see exactly how far it got, and it got all the way through, for the first time.

It occurred to me that it had significantly slowed down the network part of it as it was formatting things out to the terminal and blocking on that, so I tried bandwidth limiting the requests (I'm already rate limited and cache limited), and what do you know, that lets it get to the end.

My server's a Linode in the London data centre so it's probably a few miles and a bundle of fibre optic cable away from the API server and can get up to some major network speed. Whereas my desktop's on a normal consumer Internet connection so can't get up to that sort of speed.

I don't know what the threshold is, I've throttled right back to 1MB/second (CURLOPT_MAX_RECV_SPEED_LARGE = 1048576)

This is a bit gruesome, and it's still a guess but it appears to be working, perhaps someone from CCP could comment on what's going on here?

--

Pete Butcher
The Scope
Gallente Federation
#6 - 2015-04-18 17:43:07 UTC
EVE API has been acting weird lately, even on smaller scales.

http://evernus.com - the ultimate multiplatform EVE trade tool + nullsec Alliance Market tool

Medusa The Gorgon
Temple of the Serpent
The Gorgon Empire
#7 - 2015-04-18 23:23:22 UTC  |  Edited by: Medusa The Gorgon
Our entire alliance infrastructure has broken by this behavior. Actually, this is Anti DDOS protection on the loose. When I start fetch of killmails after ~5 api keys I'm banned by IP for 5 minutes. Ban can be triggered even on the middle of the session. I even tried to put 10 sec sleep b/w every api request without any success!

telnet api.eveonline.com 443
Trying 87.237.39.199...
telnet: connect to address 87.237.39.199: Operation timed out
telnet: Unable to connect to remote host

I have ~100 API keys and I'm sure I can't produce any significant load. My petitions are ignored, this is complete disaster for me.
Kaitai Noctus
Doomheim
#8 - 2015-04-19 01:24:38 UTC
For what it's worth, I've been seeing the exact same (timeout expiries) from mobile phone apps that use the EVE API -- apps like PI Timer, Aura, and Evanova.

Also for what it's worth -- may be related, but then again may not be -- I've been seeing intermittent oddities when hitting http://www.eveonline.com/ and http://community.eveonline.com/ (specifically browsers saying "Connection reset", where subsequent reloads appear to work). During these failures, www.eveonline.com resolved to 87.237.39.180 and community.eveonline.com resolved to 209.15.13.134. I haven't done packet captures to determine what's happening, but chances are TCP RST is being sent back to the client.

Overall issue smells like a load balancer problem, or a single misbehaving box behind a load balancer, but it's impossible for an end-user to determine that given the nature of the problems (timeouts).
Ydnari
Estrale Frontiers
#9 - 2015-04-19 02:25:46 UTC
Seems to be affecting CREST as well; getting "connection reset by peer", "failed to enable crypto" errors when fetching the industry/systems endpoint.

--

Kira Doshu
Imperial Shipment
Amarr Empire
#10 - 2015-04-20 02:42:21 UTC
Kaitai Noctus wrote:
For what it's worth, I've been seeing the exact same (timeout expiries) from mobile phone apps that use the EVE API -- apps like PI Timer, Aura, and Evanova.

*snip*

Overall issue smells like a load balancer problem, or a single misbehaving box behind a load balancer, but it's impossible for an end-user to determine that given the nature of the problems (timeouts).


Agreed. I have seen this using CURL while debugging my own scripts receiving similar errors since Saturday-18-April. If you poke a simple API query a few times, you'll get one of the 'dead' servers after enough tries (or the connection reset by peer).

If your code caches, it may be possible to whitelist the bad one. But that's not exactly optimal. I normally just let the errors disappear (or suppress for a few hours), and only get worried if they persist after 24h.

At least all those chiming in on this thread aren't crazy. :)
salacious necrosis
Garoun Investment Bank
Gallente Federation
#11 - 2015-04-20 12:33:55 UTC
Interesting. So either I'm incredibly lucky and not seeing this as often as everyone else, or I'm also seeing this and just not detecting it in my logs. I checked again this morning and I'm still not seeing as many errors as others. I produce a report every day and keep track of errors which stop a sync from working. Here's what this looked like for 4/19:

I called the XML API about 7000 times yesterday across both corporation and capsuleer endpoints (about 100 accounts total). Not all of these are full calls since we're respecting the cache timers, so many of these calls may be just to account balance, etc.

A total of 284 capsuleer syncs failed due to some sort of IO error (usually connection drop, but not always). A total of 174 corporation syncs failed due to IO error.

So 458 total failures or around 7%. That's not too bad considering I don't stop syncing during downtime. But I need to do more work to see how many of these are actually failing during downtime.

Things I'm doing that may be different than others:

- I'm setting a User-Agent. Everyone is probably doing this as well, assuming you're using one of the standard XML API libraries.
- My stuff runs on Google so my IPs would have a Google origin. Not sure that would help much if this is Anti-DDOS



Use EveKit ! - Tools for EVE Online 3rd party development