These forums have been archived and are now read-only.

The new forums are live and can be found at https://forums.eveonline.com/

EVE Information Portal

 
  • Topic is locked indefinitely.
 

Dev Blog: Behind the scenes of a long EVE Online downtime!

First post First post
Author
Oxide Ammar
#21 - 2015-08-07 16:16:08 UTC
drunklies wrote:
Oxide Ammar wrote:
We are starting to make Blogs about that ? What a sad year for EVE development this year Ugh


A large portion of the community was asking for such a blog the minute the problem was resolved.

Insert line here [This did not take development time away from Eve]


You missed the point ...totally....

Lady Areola Fappington:  Solo PVP isn't dead!  You just need to make sure you have your booster, remote rep, cyno, and emergency Falcon alts logged in and ready before you do any solo PVPing.

Plato Idari
SoE Roughriders
Electus Matari
#22 - 2015-08-07 16:21:36 UTC
I look forward to more installments of this tale!
Ezekiel Marr
Sebiestor Tribe
Minmatar Republic
#23 - 2015-08-07 16:30:33 UTC
So... is castello.is a pizza place of choice for CCP?
CCP Goliath
C C P
C C P Alliance
#24 - 2015-08-07 16:41:22 UTC
Ezekiel Marr wrote:
So... is castello.is a pizza place of choice for CCP?


We usually get pizza from Castellos yeah. It doesn't actually deliver to our area unless it's for us :p

CCP Goliath | QA Director | EVE Illuminati | @CCP_Goliath

Bienator II
madmen of the skies
#25 - 2015-08-07 17:14:32 UTC
its not the first time that logging causes problems in distributed software projects ;)

another classic pitfall would be when you have a highly concurrent program on a single node and as soon you enable logging suddenly nothing runs concurrently anymore

how to fix eve: 1) remove ECM 2) rename dampeners to ECM 3) add new anti-drone ewar for caldari 4) give offgrid boosters ongrid combat value

Projak Dynamo
Pro Synergy
#26 - 2015-08-07 17:14:45 UTC
While I followed about one word in twelve of that, I just wanted to say thank you to all you mad, beer swilling, pizza eating Devs and Tech-heads for getting EVE back as soon as you possibly could.

Sometimes it really cannot be solved by just turning it off and turning it back on again!

The Pro Synergy Pilot is not just a fighting man, he is a salvage expert. If it is lost, in the blackness of space, he will find it. If it has been destroyed, he will loot and salvage it. If it is in his way, he will move it. If he is lucky he will be podded 20 jumps from home, for this is the closest he come to being hero.

Vincent Athena
Photosynth
#27 - 2015-08-07 17:38:14 UTC  |  Edited by: Vincent Athena
Legacy code?
I'll make a prediction. The channel you used for campaign logging was used in the past for doing something else. You thought that code was removed, but some part of it still remains. When you started campaign logging, some old code woke up, tried to do something related to that channel, and "bad things" resulted.

Edit: CCP, you do not really answer one question we all kept asking over and over.

Why not roll back while you worked the issue? In the blog you stated you stated "Our test of the rollback was confirmed to work, but we still didn’t believe the code to be the issue". But, so what? Why did you let this belief stop you from doing a rollback and letting us get on the server?

I just do not see the link here. I see your thought: "we still didn’t believe the code to be the issue", and the result: No roll back, but I do not understand your reasoning for letting that thought get that result. What was your reasoning?

Know a Frozen fan? Check this out

Frozen fanfiction

Arla Sarain
#28 - 2015-08-07 17:44:51 UTC
So, TL;DR?

FozziSov Literally did kill EVE?
Dersen Lowery
The Scope
#29 - 2015-08-07 17:59:16 UTC
Vincent Athena wrote:
Legacy code?
I'll make a prediction. The channel you used for campaign logging was used in the past for doing something else. You thought that code was removed, but some part of it still remains. When you started campaign logging, some old code woke up, tried to do something related to that channel, and "bad things" resulted.


"I'm not a logger. I'm a pirate! Q'apla!"

Proud founder and member of the Belligerent Desirables.

I voted in CSM X!

Archetype 66
Perkone
Caldari State
#30 - 2015-08-07 18:00:35 UTC
Thank you for that blog. So still running TQ without campains logs ?
CCP Masterplan
C C P
C C P Alliance
#31 - 2015-08-07 18:27:26 UTC  |  Edited by: CCP Masterplan
Vincent Athena wrote:
Legacy code?
I'll make a prediction. The channel you used for campaign logging was used in the past for doing something else. You thought that code was removed, but some part of it still remains. When you started campaign logging, some old code woke up, tried to do something related to that channel, and "bad things" resulted.

Edit: CCP, you do not really answer one question we all kept asking over and over.

Why not roll back while you worked the issue? In the blog you stated you stated "Our test of the rollback was confirmed to work, but we still didn’t believe the code to be the issue". But, so what? Why did you let this belief stop you from doing a rollback and letting us get on the server?

I just do not see the link here. I see your thought: "we still didn’t believe the code to be the issue", and the result: No roll back, but I do not understand your reasoning for letting that thought get that result. What was your reasoning?


When we said "Our test of the rollback was confirmed to work..." that was more referring to the fact that the rollback process would work, not that the rollback would fix the problem. So we verified that we could re-deploy the previous day's build to TQ without corrupting the game state in the DB, not that the previous day's build would manage to get past startup.

Sometimes when we deploy some new changes/feature, we have to mutute the data in the DB in some one-way fashion. Therefore such code updates cannot be rolled back in isolation without either writing an explicit revert mutation, or doing a full DB restore from backup (which can be done but takes time).

All that that comment really means is that such a code rollback would require no special DB operations to go along side it.

"This one time, on patch day..."

@ccp_masterplan  |  Team Five-0: Rewriting the law

Vincent Athena
Photosynth
#32 - 2015-08-07 18:33:24 UTC  |  Edited by: Vincent Athena
CCP Masterplan, I get that. Thanks for the reply. But why not keep going and do the startup? Did you have some "one way" DB changes with this update that would have taken extra effort to revert or restore? So much effort that, at any given time, it looked better to just keep trying to fix the issue rather than get de-railed trying to roll back?

Also, it looked like you found the temporary fix by experimenting on TQ, something you would not have been able to do if you had done the roll back.

Know a Frozen fan? Check this out

Frozen fanfiction

Ransu Asanari
Perkone
Caldari State
#33 - 2015-08-07 18:34:48 UTC
Pretty fascinating, thanks for the detailed explanation.
Ishtanchuk Fazmarai
#34 - 2015-08-07 18:56:15 UTC
"See, I am a log. I MIGHT show you something, but then I MUST kill you... got it?"

Roses are red / Violets are blue / I am an Alpha / And so it's you

Kerodan Alduin
Sebiestor Tribe
Minmatar Republic
#35 - 2015-08-07 19:04:29 UTC
Thanks for the amusing writeup!

I totally know what its like when you run code that in theory should work but in practice doesn't. Then again, the maximum number of users waiting for my bits of programming was around 3 Cool
Eli Stan
Center for Advanced Studies
Gallente Federation
#36 - 2015-08-07 20:06:04 UTC
Very interesting writeup, thank you!

How exactly is a log channel set up?
I assume the default log channel and campaign log channel both are directed to the same storage space?
Is it possible for a log entry to trigger a process elsewhere, rather than being explicitly called? A while ago I had to help some devs figure out some MSSQL performance issues that was caused by triggers. Hate those things...
elitatwo
Zansha Expansion
#37 - 2015-08-07 20:34:00 UTC
Windoze memory management... Could it be possible that the second call caused the server nodes to use swap space and the surge in memory requests made the harddrives go nuts?

Maybe the second call called the first one, creating 250*500*500 calls instead of 250*500 which would explain the behavior. Maybe rename a word in the second call so you can see which ones are displayed.

Or I am totally wrong and it's a MSSQL thing.

Eve Minions is recruiting.

This is the law of ship progression!

Aura sound-clips: Aura forever

Kasli Catal
Deep Core Mining Inc.
Caldari State
#38 - 2015-08-07 21:31:05 UTC
What the **** did I even just read? Shocked
Iam Widdershins
Project Nemesis
#39 - 2015-08-07 21:59:16 UTC
CLIFFHANGER BOYS

Lobbying for your right to delete your signature

Jasmine Cheryu
Bearers of Impurism
#40 - 2015-08-07 22:41:33 UTC
CCP Goliath wrote:
Ezekiel Marr wrote:
So... is castello.is a pizza place of choice for CCP?


We usually get pizza from Castellos yeah. It doesn't actually deliver to our area unless it's for us :p




Do you think they would deliver during fanfest next year??

If so I'm buying the entire Dev Team pizza on one of the fanfest days, they deserve it for putting in all this hard work for us players!! Cool


Sure we pay your wages by playing the game and paying for plex//subscriptions, but we all (or well.. most of us) really do appreciate all you do the keep the game we love and enjoy online for us

Thanks for the blog outlining that terrible day Smile

Jas