These forums have been archived and are now read-only.

The new forums are live and can be found at https://forums.eveonline.com/

EVE General Discussion

 
  • Topic is locked indefinitely.
 

UPDATED - Unscheduled Downtime - November 4th

First post First post
Author
Kenneth O'Hara
Sebiestor Tribe
#221 - 2012-11-04 19:53:22 UTC  |  Edited by: Kenneth O'Hara
CCP Explorer wrote:
Clystan wrote:
If it's of any interest - I noticed that searches for places started returning zero results about 1/2 hour before the outage.
That was indeed the first symptom.

Ah okay, now this is all starting to make a little more sense. I'm guessing the services used to pull information from the databases failed or decided to go on strike. Or, the storage system decided to freak out for whatever reason that is stll being investigated. I know most of the information is pulled from our HDDs (Hard Disk Drives) like ship, station, planets, moons, etc. models and textures.
Im guessing is more along the line of all the meta data that is handled by the servers like player, market, corp, alliance, etc databases stored on the server HDDs and backup HDDs. It makes sense to me at least and I thank everyone at CCP currently looking into this for us.

Good luck in figuring it all out.

Bring Saede Riordan back!! Never Forget! _"__Operation Godzilla Smacks Zeus"  ~__Graygor _

Ranzabar
Doomheim
#222 - 2012-11-04 20:02:39 UTC
I had to talk to my wife for like, hours. It was aweful

Abide

Sephira Galamore
Inner Beard Society
Kvitravn.
#223 - 2012-11-04 20:19:10 UTC  |  Edited by: Sephira Galamore
Kenneth O'Hara wrote:
CCP Explorer wrote:
Clystan wrote:
If it's of any interest - I noticed that searches for places started returning zero results about 1/2 hour before the outage.
That was indeed the first symptom.

Ah okay, now this is all starting to make a little more sense. I'm guessing the services used to pull information from the databases failed or decided to go on strike. Or, the storage system decided to freak out for whatever reason that is stll being investigated. I know most of the information is pulled from our HDDs (Hard Disk Drives) like ship, station, planets, moons, etc. models and textures.
Im guessing is more along the line of all the meta data that is handled by the servers like player, market, corp, alliance, etc databases stored on the server HDDs and backup HDDs. It makes sense to me at least and I thank everyone at CCP currently looking into this for us.

Good luck in figuring it all out.

IIRC 99.7% of all data queries can be answered using the RAM. Which explains why it actually ran _somehow_ for a while, til they shut it down.

And I would assume the RAM handling stuff on a demand basis.. e.g. when something is needed the first time, it gets retrieved from the HDD and then stored in the RAM while there's still space. When the RAM is getting used up, there'll be some strategy to decide which data to release, likely the items with the oldest last query are dropped first. (I hope you don't mind, Explorer, but assuming is fun!)
Kenneth O'Hara
Sebiestor Tribe
#224 - 2012-11-04 20:46:02 UTC  |  Edited by: Kenneth O'Hara
Sephira Galamore wrote:
Kenneth O'Hara wrote:
CCP Explorer wrote:
Clystan wrote:
If it's of any interest - I noticed that searches for places started returning zero results about 1/2 hour before the outage.
That was indeed the first symptom.

Ah okay, now this is all starting to make a little more sense. I'm guessing the services used to pull information from the databases failed or decided to go on strike. Or, the storage system decided to freak out for whatever reason that is stll being investigated. I know most of the information is pulled from our HDDs (Hard Disk Drives) like ship, station, planets, moons, etc. models and textures.
Im guessing is more along the line of all the meta data that is handled by the servers like player, market, corp, alliance, etc databases stored on the server HDDs and backup HDDs. It makes sense to me at least and I thank everyone at CCP currently looking into this for us.

Good luck in figuring it all out.

IIRC 99.7% of all data queries can be answered using the RAM. Which explains why it actually ran _somehow_ for a while, til they shut it down.

And I would assume the RAM handling stuff on a demand basis.. e.g. when something is needed the first time, it gets retrieved from the HDD and then stored in the RAM while there's still space. When the RAM is getting used up, there'll be some strategy to decide which data to release, likely the items with the oldest last query are dropped first. (I hope you don't mind, Explorer, but assuming is fun!)

Oh yeah, definitely. It just depends on what they find out upon investigation. HDDs have a way higher failure rate than RAM. If it is a hardware failue, Im leaning towards this scenario. It could be one of the rare occurances where the RAM has indeed failed but not likely. It could just be how the services handle the data in memory and got confused. No likely but it could be along the lines of a stack overflow or a cache overflow. Hell, someone might have turned heuristics on in the antivirus settings which threw everything into quarantine. Shocked

Bring Saede Riordan back!! Never Forget! _"__Operation Godzilla Smacks Zeus"  ~__Graygor _

Miner Idiot
#225 - 2012-11-04 22:41:59 UTC
to bring the whole data center up time percentages into respective (if you've not already done the calcs) there is a rating called Nines ... "four nines" - "five nines", that data centers try to pimp themselves as being. Kind of a rating that attracts clients to use their service. (like ours)

Four nines is 99.99% and works out to be 52.6 (rounded) minutes down time per year !
Five nines would be 99.999% and that is only 5.26 minutes per year !
yep - Six would be ~31 seconds/per year !

Now, lets say all your hardware is uber l33t and is touched by the hand of gaud and will never go down. You can't say the same for you OC3 data link to the internet backbone. You have two of them btw ... each from a different provider coming into the building via a different route, on a different side of the building, and some thing happens to the primary and you switch over to the secondary, only to find out that your secondary has core switch routing issues... There goes your five nines out the window and you're eating away at the four nines.

I'm telling you, It was so real ... I was there !

I drank WHAT ?

Sephira Galamore
Inner Beard Society
Kvitravn.
#226 - 2012-11-04 23:12:41 UTC
I do remember the nines... Even some joke about 6 nines... 0.999999% :p
KIller Wabbit
MEME Thoughts
#227 - 2012-11-04 23:18:32 UTC  |  Edited by: KIller Wabbit
Shawnm339 wrote:
Ok It is 1128am here in blighty having just finished a lovely pair of bacon cobs with the prerequisite tomato sauce and a nice cup of coffee I boot up Eve in the hopes of maybe wasting a couple of hours to see my Eve is not accepting connections..I assumed the problem listed in this thread had been resolved, any confirm or deny?

Ninja edit restarted the client and I'm in


I love the web! So... I'm curious, can you describe " bacon cobs with the prerequisite tomato sauce" in more detail?

(mods forgive the off topic, but I just have to know!)

Tidbit for the incident - I noticed attempts to mail someone kept coming back with "this person doesn't exist". When I tried mailing myself, with same result, I relogged - which of course didn't change things. This was about 2 hours before the universe came to a screeching halt.
Othran
Route One
#228 - 2012-11-04 23:28:22 UTC
KIller Wabbit wrote:


I love the web! So... I'm curious, can you describe " bacon cobs with the prerequisite tomato sauce" in more detail?

(mods forgive the off topic, but I just have to know!)




Bacon on a large crusty bread roll with Heinz tomato ketchup.
Shaishi Otichoda
Science and Trade Institute
Caldari State
#229 - 2012-11-05 06:33:52 UTC
Miner Idiot wrote:
to bring the whole data center up time percentages into respective (if you've not already done the calcs) there is a rating called Nines ... "four nines" - "five nines", that data centers try to pimp themselves as being. Kind of a rating that attracts clients to use their service. (like ours)

Four nines is 99.99% and works out to be 52.6 (rounded) minutes down time per year !
Five nines would be 99.999% and that is only 5.26 minutes per year !

What is this blasphemy?! I have always delivered what I promised: nine fives.
Blondie Amelana
The CodeX Alliance Executive Holdings Corporation
The CodeX Alliance
#230 - 2012-11-06 18:18:56 UTC
Ok, so I have already put in a petition about this, but after eve came back online I proceeded with selling a bunch of stuff at Jita. I am guessing that because of the downtime my offerings were selling at a fairly fast clip, but I noticed that my wallet never changed to reflect these sales. Has this happened to anyone else?
ctx2007
Republic Military School
Minmatar Republic
#231 - 2012-11-06 19:01:27 UTC
Ranzabar wrote:
I had to talk to my wife for like, hours. It was aweful


My sympathies to you ... I was asleep lucky

You only realise you life has been a waste of time, when you wake up dead.

Kenneth O'Hara
Sebiestor Tribe
#232 - 2012-11-06 19:12:29 UTC  |  Edited by: Kenneth O'Hara
Othran wrote:
KIller Wabbit wrote:


I love the web! So... I'm curious, can you describe " bacon cobs with the prerequisite tomato sauce" in more detail?

(mods forgive the off topic, but I just have to know!)




Bacon on a large crusty bread roll with Heinz tomato ketchup.

Bacon is freakin' awesome!!!

And to keep this on topic, You can fix all of the computer issues by laying strips of bacon across all of the components. The goal is to let the awesomeness of bacon seep in through osmosis. It will in turn increase bandwidth and electron flow rates through the circuits and thus improving performance. It will also repair any damages immediately. It will run a little hotter but the bacon serves as a nice conducter and release of heat.

If this doesn't work, than you all obviously have no idea what you're doing.

Bring Saede Riordan back!! Never Forget! _"__Operation Godzilla Smacks Zeus"  ~__Graygor _

Atum
Eclipse Industrials
Quantum Forge
#233 - 2012-11-08 04:52:04 UTC
Been a couple days now... anything learned from the charred wreckage?
Lors Dornick
Kallisti Industries
#234 - 2012-11-09 09:37:39 UTC
Atum wrote:
Been a couple days now... anything learned from the charred wreckage?

The community team are probably trying to get the virtual worlds team to say something more publishable than "our logs show nothing".

And based on my experience with server techs, they're in for a tough mission ;)

CCP Greyscale: As to starbases, we agree it's pretty terrible, but we don't want to delay the entire release just for this one factor.

Atum
Eclipse Industrials
Quantum Forge
#235 - 2012-11-09 15:11:19 UTC
Lors Dornick wrote:
The community team are probably trying to get the virtual worlds team to say something more publishable than "our logs show nothing".

And based on my experience with server techs, they're in for a tough mission ;)

+1
I'm always looking for root cause info on what causes outages... my job, literally, is to make sure yarrsters are kept cool and fed.