These forums have been archived and are now read-only.

The new forums are live and can be found at https://forums.eveonline.com/

EVE Information Portal

 
  • Topic is locked indefinitely.
 

Dev blog: Tranquility Tech III

First post First post
Author
Steve Ronuken
Fuzzwork Enterprises
Vote Steve Ronuken for CSM
#41 - 2015-10-13 20:34:55 UTC
CCP Gun Show wrote:
Ix Method wrote:
Volcano-powered Singularity.

Yes.


we are thinking about renaming Singularity to Eyjafjallajökull Big smile

kidding



I'd just like to say: You are a large scary man Blink

Woo! CSM XI!

Fuzzwork Enterprises

Twitter: @fuzzysteve on Twitter

Haffsol
#42 - 2015-10-13 20:59:37 UTC
Quote:
[..... bla blah nerdy things....] what could possibly go wrong?!

Exactly Pirate
Bienator II
madmen of the skies
#43 - 2015-10-13 21:10:26 UTC
so you will have fewer solar system nodes but they will have more bandwidth and be better connected?

how to fix eve: 1) remove ECM 2) rename dampeners to ECM 3) add new anti-drone ewar for caldari 4) give offgrid boosters ongrid combat value

Nafensoriel
Brutor Tribe
Minmatar Republic
#44 - 2015-10-13 21:29:38 UTC
So... since your engineers have decided to stop purchasing our superior minmatar duct tape...

Well actually that's it.. we're screwed. CCP engineers were 90% of our customer base. I guess we can start making server polish?

Seriously though awesome. Old codes going out the door and now the kludge hardware is going to. This is an awesome day for EVE.

Though seriously.. the engineers convinced you to keep the old cluster so they could play doom on it and have nerdgasams.. admit it.
virm pasuul
Imperial Academy
Amarr Empire
#45 - 2015-10-13 21:29:48 UTC
Bienator II wrote:
so you will have fewer solar system nodes but they will have more bandwidth and be better connected?


Hardware or software?
I think the nodes are probably virtualised, so divide the total hardware resources by the number of nodes.
Virtual stuff when done properly can be very efficient. e.g. a big gang roaming hops from one node to another, but they are virtual software nodes, if on the same host the net load on the underlying hardware would remain unchanged even though the gang had moved node.

CCP will be able to provision new nodes and drop unused nodes automatically. Also see the load balancing presentation they did a few fanfests ago where they explained their node balancing algorithm in detail.
Moving nodes around to do hardware maintenance with virtualisation is a doddle. Nodes can be moved live from hardware host to hardware host whilst still doing active work for clients and not dropping a single packet mid move.

The hardware abstraction from virtualisation, the storage abstraction, along with all the hardware redundancy makes the setup described pretty bulletproof. The only point of failure left now is that little "feature" in the CCP automation system that no one thought could break. Amazon, Google, Microsoft, and pretty much every UK bank have all had unbreakable cloud setups break.

It is an amazing bit of kit that CCP is investing in. There's probably well over seven digits of new hardware there.

Now if only CCP could come up with multi threading server code.... :)



CCP DeNormalized
C C P
C C P Alliance
#46 - 2015-10-13 21:40:59 UTC
Master Degree wrote:
as a IT pro, from experience i can tell that high IO SQL DB running in M$ Failover cluster @ vmware is not the best choice, rather go SQL always on, more storage needed, agree, but failovers are much easier (and much faster).. and you can replicate more times eg active, replica 1, replica 2 etc, can use one of the replica for reads and dont bother with operations on active writting db .. only thing what can be problem is switching of listener between nodes during sudden HW crash or vmotion (MAC address conflict in vmware 5.0, hope they fix it in 6.0 while running vmotion on loaded hosts)

eventually switch to hyper-v (core preferably due patches), license is cheaper as esx(i), but the downside is, that hyper-v is with features at least two releases behind vmware (if you dont pay huge money for scvmm)


just my 5 cents, i assume you made the math already :-)

PS: really nice HW, just vendor is not one of my favorites :)


thx for the comment and info MD!

I hear you on the VMWare possibly not being the best choice as there is definitely overhead invovled (both in I/O resources as well as licensing costs!). We'll do some testing and see the impact it has, and if we don't get to where we want with it, it's out! :)

In regards to AlwaysOn we'll be using this on top of whatever route we go w/ the cluster. This will be our primary replication method for keeping both our DRS in sync as well as offering live reporting services to internal users.

CCP DeNormalized - Database Administrator

CCP DeNormalized
C C P
C C P Alliance
#47 - 2015-10-13 21:49:41 UTC
Steve Ronuken wrote:
CCP Gun Show wrote:
Ix Method wrote:
Volcano-powered Singularity.

Yes.


we are thinking about renaming Singularity to Eyjafjallajökull Big smile

kidding



I'd just like to say: You are a large scary man Blink


This doesn't become really really true until you spend 2 days of heavy drinking in the middle of the icelandic wilderness with the man...

"Don't wake the Balrog!" Is a slogan we force all new Operations team members to learn very early on :)

Ops Offsite best offsite!

CCP DeNormalized - Database Administrator

Gospadin
Bastard Children of Poinen
#48 - 2015-10-13 21:52:27 UTC
I'm shocked that a system designed to deploy in 2016 is even using rotating drives. That data must be REALLY cold.
TigerXtrm
KarmaFleet
Goonswarm Federation
#49 - 2015-10-13 22:03:50 UTC
No worries people. EVE is still dying on schedule. That's why they are pumping I don't even know how many hundreds of thousands of dollars into new server hardware. Because if it's going to die, it's going to die in style Cool

My YouTube Channel - EVE Tutorials & other game related things!

My Website - Blogs, Livestreams & Forums

xrev
Brutor Tribe
Minmatar Republic
#50 - 2015-10-13 22:10:42 UTC
Gospadin wrote:
I'm shocked that a system designed to deploy in 2016 is even using rotating drives. That data must be REALLY cold.

It's called auto-tiering. The hot storage blocks reside on the fast SSD's or the internal read cache. When blocks of data aren't touched, they move to slower disks that are still more cost effective if you look to volume for your buck. Compared to Ssd's, hard disks suck at random i/o but serial streams will do just fine.
Bienator II
madmen of the skies
#51 - 2015-10-13 22:30:22 UTC  |  Edited by: Bienator II
virm pasuul wrote:
Bienator II wrote:
so you will have fewer solar system nodes but they will have more bandwidth and be better connected?


Hardware or software?

http://i.imgur.com/xCjjFc9.png



virm pasuul wrote:

Now if only CCP could come up with multi threading server code.... :)


eve has 8k solar systems or so. which means there will be over 100 solar systems per physical server node. So parallelism is already possible without having the actual server code multithreaded. Thats prob why ccp seems to see MT as low priority atm.

how to fix eve: 1) remove ECM 2) rename dampeners to ECM 3) add new anti-drone ewar for caldari 4) give offgrid boosters ongrid combat value

Cor'len
Doomheim
#52 - 2015-10-13 23:13:39 UTC
Bienator II wrote:
Thats prob why ccp seems to see MT as low priority atm.


Actually, CCP would love to multithread the ~space code~ (can't remember the component name, haha). But it's practically impossible to get a consistent result; operations must be done in sequence, otherwise you get dead ships killing living ships, and other ~exciting~ edge cases.

This is the ultimate limiter on EVE performance. They might conceivably be able to MT the processing of different grids in a single system, but everything that happens on a single grid must execute in a deterministic fashion, and in the correct order.

Plus, even if that wasn't a problem, they run Stackless Python, with the beloved global interpreter lock which effectively prevents multithreading.


tl;dr CCP wants to multithread all the things, but it's so hard it's bordering on impossible. Hence the effort to not have big fights.
Gospadin
Bastard Children of Poinen
#53 - 2015-10-13 23:16:18 UTC
xrev wrote:
Gospadin wrote:
I'm shocked that a system designed to deploy in 2016 is even using rotating drives. That data must be REALLY cold.

It's called auto-tiering. The hot storage blocks reside on the fast SSD's or the internal read cache. When blocks of data aren't touched, they move to slower disks that are still more cost effective if you look to volume for your buck. Compared to Ssd's, hard disks suck at random i/o but serial streams will do just fine.


I know how it works.

It's just interesting to me that TQ's cold data store is satisfied with about 10K IOPS across those disk arrays. (Assuming 200/disk for 10K SAS and about 50% utilization given their expected multipath setup and/or redundancy/parity overhead)
Bienator II
madmen of the skies
#54 - 2015-10-13 23:54:02 UTC
Cor'len wrote:
Bienator II wrote:
Thats prob why ccp seems to see MT as low priority atm.


Actually, CCP would love to multithread the ~space code~ (can't remember the component name, haha). But it's practically impossible to get a consistent result; operations must be done in sequence, otherwise you get dead ships killing living ships, and other ~exciting~ edge cases.

splitting tasks up is only one way of parallelism. You can distribute sequential tasks on different compute hardware via pipelining/layering etc.

but the thing is ccp does not have to do that. since they already can reach parallelism by simply running multiple processes on the same node. again: they are running 100+ systems on a single node. All they have to do is to run them in N processes instead of 1. (would not surprise me if they would run every system in its own process tbh)


mustithreading would only help in the worst case scenario: whole eve population is in the same system
but according to ccp this is not even certain since the bottleneck seems to be memory bandwidth not computing power.

how to fix eve: 1) remove ECM 2) rename dampeners to ECM 3) add new anti-drone ewar for caldari 4) give offgrid boosters ongrid combat value

Berahk
Lightweight Dynamics
#55 - 2015-10-14 00:13:40 UTC
So, few questions

How much closer does this server setup bring us to never needing downtime?

Also

How closer to being able to failover a tremendously busy system onto one of the combat nodes without having to wait until the following downtime? (or booking it in advance)

Thanks


/b
Mara Rinn
Cosmic Goo Convertor
#56 - 2015-10-14 00:56:01 UTC
Berahk wrote:
How much closer does this server setup bring us to never needing downtime?


Most important question in the thread :D

http://community.eveonline.com/news/dev-blogs/death-to-downtimes/
Alundil
Rolled Out
#57 - 2015-10-14 01:45:01 UTC  |  Edited by: Alundil
Raphendyr Nardieu wrote:
OMG, amazing blog. Nice that you added so much specifics.

I hope you get the virtualization working. Would provide nice benefits :)

Came to say this. Excellent article. Vmotion on terrific hardware is sweet sweet sweet. We use this in our 20000 user environment to great effect.

Keep up the great work.

I'm right behind you

Shamwow Hookerbeater
Nine Inch Ninja Corp
#58 - 2015-10-14 03:26:38 UTC  |  Edited by: Shamwow Hookerbeater
Gospadin wrote:
xrev wrote:
Gospadin wrote:
I'm shocked that a system designed to deploy in 2016 is even using rotating drives. That data must be REALLY cold.

It's called auto-tiering. The hot storage blocks reside on the fast SSD's or the internal read cache. When blocks of data aren't touched, they move to slower disks that are still more cost effective if you look to volume for your buck. Compared to Ssd's, hard disks suck at random i/o but serial streams will do just fine.


I know how it works.

It's just interesting to me that TQ's cold data store is satisfied with about 10K IOPS across those disk arrays. (Assuming 200/disk for 10K SAS and about 50% utilization given their expected multipath setup and/or redundancy/parity overhead)


Kinda funny in a way, at my last company we had some rather beefy 7420 ZFS appliances with ram/ssd/15K disks and we weren't happy when we were only getting approx 50-60K IOPS from pure disk operations across multiple pools. We could hit 200K+ on things that were cached....but we only needed that performance for some edge cases of ours. then we tested an AFF on our extreme edge cases...and were like crap why didn't these things get cheaper faster.

The AFF was incredibly faster than our 7420s in most cases especially anything approaching high levels of random IO (not surprising) it was so bad that a moderately powered vm (4 or 8 vcpus and like 64GB) was beating our 24 core 196GB physical boxes in total transactions when running things like HammerOra
Bienator II
madmen of the skies
#59 - 2015-10-14 04:56:12 UTC
Mara Rinn wrote:
Berahk wrote:
How much closer does this server setup bring us to never needing downtime?


Most important question in the thread :D

http://community.eveonline.com/news/dev-blogs/death-to-downtimes/


having DT only every second day would be a start :P

how to fix eve: 1) remove ECM 2) rename dampeners to ECM 3) add new anti-drone ewar for caldari 4) give offgrid boosters ongrid combat value

Raiz Nhell
State War Academy
Caldari State
#60 - 2015-10-14 05:21:32 UTC
Amazing stuff...

Wish I could convince the boss that we need a 10th of this stuff...

Keep up the good work...

P.S. Would like to see photos of Sisi's Volcano powered lair :)

There is no such thing as a fair fight...

If your fighting fair you have automatically put yourself at a disadvantage.