These forums have been archived and are now read-only.

The new forums are live and can be found at https://forums.eveonline.com/

EVE Information Portal

 
  • Topic is locked indefinitely.
 

Dev blog: Tranquility Tech III

First post First post
Author
Josia
Sunrise Services
#81 - 2015-10-14 18:31:45 UTC
Can you run Minecraft on it?
Luca Lure
Obertura
#82 - 2015-10-14 19:32:36 UTC
Clearly EVE is dying. Hamsters need new cages.

――――――――――――――――――――――――――――――――――――――――――

The essence of the independent mind lies not in what it thinks, but in how it thinks.

Indahmawar Fazmarai
#83 - 2015-10-14 20:05:15 UTC
Out of curiosity... where will come from all the additional players needed to use/justify such powerful hardware? Straight
Corraidhin Farsaidh
Federal Navy Academy
Gallente Federation
#84 - 2015-10-14 21:02:55 UTC
CCP DeNormalized wrote:
Freelancer117 wrote:
Gratz on moving to DDR4 Cool

Your new server cpu's are over a year on the market and mid range, hope you got a good price.

source: http://ark.intel.com/products/family/78583/Intel-Xeon-Processor-E5-v3-Family#@All


Regards, a Freelancer



CPU's are E7-8893 v3, not E5

http://ark.intel.com/products/84688/Intel-Xeon-Processor-E7-8893-v3-45M-Cache-3_20-GHz

Launch Date Q2'15

Errr, at least the DB CPU's are :) I don't really care so much about the others :)


The others matter?
CCP Gun Show
C C P
C C P Alliance
#85 - 2015-10-14 21:07:34 UTC
Aryth wrote:
In the writeup I don't see why IBM. Did you bake these off against UCS and they were faster? Or just whitebox.

Maybe your performance needs are very niche but I have yet to see a bakeoff where IBM won against almost anyone.


We did an intensive comparison with couple of vendors but came to this conclusion out of couple of reasons

This is a vague answer and does not tell you much apart from that we did our du diligence Smile

Plus our relationship with the Icelandic vendor is excellent after decade of cooperation, I literally can call the lead IBM SAN expert anytime 24/7 and they are quick to support us in a critical scenario with good escalation path into IBM

Hope this answer helps and please keep on asking about TQ Tech III
CCP Gun Show
C C P
C C P Alliance
#86 - 2015-10-14 21:09:34 UTC  |  Edited by: CCP Gun Show
Corraidhin Farsaidh wrote:
CCP DeNormalized wrote:
Freelancer117 wrote:
Gratz on moving to DDR4 Cool

Your new server cpu's are over a year on the market and mid range, hope you got a good price.

source: http://ark.intel.com/products/family/78583/Intel-Xeon-Processor-E5-v3-Family#@All


Regards, a Freelancer



CPU's are E7-8893 v3, not E5

http://ark.intel.com/products/84688/Intel-Xeon-Processor-E7-8893-v3-45M-Cache-3_20-GHz

Launch Date Q2'15

Errr, at least the DB CPU's are :) I don't really care so much about the others :)


The others matter?



Oh yes they do ! The entire cluster matters

CCP Denormalized is just lazer focused on the DB machines apparently Big smile
virm pasuul
Imperial Academy
Amarr Empire
#87 - 2015-10-14 22:20:57 UTC
What's the warranty on the new kit? Default 3 year or extended?
CCP DeNormalized
C C P
C C P Alliance
#88 - 2015-10-14 22:27:23 UTC
CCP Gun Show wrote:



Oh yes they do ! The entire cluster matters

CCP Denormalized is just lazer focused on the DB machines apparently Big smile


I suppose I care about the rest as well... If not for those others my shiny DB servers would just sit idle all day long :)

CCP DeNormalized - Database Administrator

Disco Dancer Dancing
Doomheim
#89 - 2015-10-14 22:51:29 UTC
Being someone that builds complex, large datacenters for both private and public use on a rather period basis, I'm not that impressed on the path of physical architecture that you are looking at. For whatever reason, why are you looking at a traditional, silo-based solution with storage and compute in different silos and traversing a "slow" FC link to make it work when not hitting the cache in RAM. Any particular reason why you are not looking on a more modern, flexible and scalable platform then the one described in the blogpost?

Seeing that everything not in the RAM-cache have to traverse the FC switch we can quickly give a few numbers on the actual latency and round-trip on several different ways of accessing data on different locations

L1 cache reference 0.5 ns
Branch Mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1KB with Zippy 3,000 ns
Sent 1KB over 1Gbps network 10,000 ns 0.01 ms
Read 4K randomly from SSD 150,000 ns 0.15 ms
Read 1MB sequentially from memory 250,000 ns 0.25 ms
Round trip within datacenter 500,000 ns 0.5 ms
Read 1MB sequentially from SSD 1,000,000 ns 1 ms, 4x memory
Disk seek 10,000,000 ns 10 ms, 20x datacenter round trip
Read 1MB sequentially from disk 20,000,000 ns 20 ms, 80x memory, 20x SSD
Send packet CA -> Netherlands -> CA 150,000,000 ns 150 ms

Looking at the figures, as soon as we start to traverse several layers we add up latency on the whole request, if we need to traverse the FC, to hit the storage nodes, then hit the disk and back, latency can rather quickly add up. Keeping the data as local as possible is the key, mainly in Memory, or as close to the node as possible without traversing the network (Sure, FC is stable, proven and gives a rather low latency, but from other standpoint you could argue that it is dead in the upcoming years as we are moving towards utilizing RDMA over Converged Fabrics or the like).

On another note, if we look on a mainstream enterprise SSD we can find a few figures:
500MB/s Read and 460MB/s Write
If we put these into the following calculation to see when we saturate a traditional storage network:
numSSD = ROUNDUP((numConnections * connBW (in GB/s))/ ssdBW (R or W))

We get the following table:
Network BW SSDs required to saturate network BW
Controller Connectivity Available Network BW Read I/O Write I/O
Dual 4Gb FC 8Gb == 1GB 2 3
Dual 8Gb FC 16Gb == 2GB 4 5
Dual 16Gb FC 32Gb == 4GB 8 9
Dual 1Gb ETH 2Gb == 0.25GB 1 1
Dual 10Gb ETH 20Gb == 2.5GB 5 6

This is without taking into account the roundtrip to access the data, and is counting with unlimited CPU power as this can also become saturated. We can see that we don't need that many SSD to saturate a network. Key point here, try and keep the data as local as possible, once again not traversing the network with the added latency and network limitation.

We can also do calculations on difference on hitting per say a local storage cache in the memory, or hitting a remote storage cache (Per say, SAN controller with caching). I know from the top of my head which are the fastest, key point, once again, keep the data as local as possible.

There are several technologies pin-pointing these issues that are seen in traditional silo datacenters, have you looked at any, and if so what is the reason that these do not fit your needs?
CCP Gun Show
C C P
C C P Alliance
#90 - 2015-10-15 00:05:47 UTC
Disco Dancer Dancing wrote:
Being someone that builds complex, large datacenters for both private and public use on a rather period basis, I'm not that impressed on the path of physical architecture that you are looking at. For whatever reason, why are you looking at a traditional, silo-based solution with storage and compute in different silos and traversing a "slow" FC link to make it work when not hitting the cache in RAM. Any particular reason why you are not looking on a more modern, flexible and scalable platform then the one described in the blogpost?

Seeing that everything not in the RAM-cache have to traverse the FC switch we can quickly give a few numbers on the actual latency and round-trip on several different ways of accessing data on different locations

L1 cache reference 0.5 ns
Branch Mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1KB with Zippy 3,000 ns
Sent 1KB over 1Gbps network 10,000 ns 0.01 ms
Read 4K randomly from SSD 150,000 ns 0.15 ms
Read 1MB sequentially from memory 250,000 ns 0.25 ms
Round trip within datacenter 500,000 ns 0.5 ms
Read 1MB sequentially from SSD 1,000,000 ns 1 ms, 4x memory
Disk seek 10,000,000 ns 10 ms, 20x datacenter round trip
Read 1MB sequentially from disk 20,000,000 ns 20 ms, 80x memory, 20x SSD
Send packet CA -> Netherlands -> CA 150,000,000 ns 150 ms

Looking at the figures, as soon as we start to traverse several layers we add up latency on the whole request, if we need to traverse the FC, to hit the storage nodes, then hit the disk and back, latency can rather quickly add up. Keeping the data as local as possible is the key, mainly in Memory, or as close to the node as possible without traversing the network (Sure, FC is stable, proven and gives a rather low latency, but from other standpoint you could argue that it is dead in the upcoming years as we are moving towards utilizing RDMA over Converged Fabrics or the like).

On another note, if we look on a mainstream enterprise SSD we can find a few figures:
500MB/s Read and 460MB/s Write
If we put these into the following calculation to see when we saturate a traditional storage network:
numSSD = ROUNDUP((numConnections * connBW (in GB/s))/ ssdBW (R or W))

We get the following table:
Network BW SSDs required to saturate network BW
Controller Connectivity Available Network BW Read I/O Write I/O
Dual 4Gb FC 8Gb == 1GB 2 3
Dual 8Gb FC 16Gb == 2GB 4 5
Dual 16Gb FC 32Gb == 4GB 8 9
Dual 1Gb ETH 2Gb == 0.25GB 1 1
Dual 10Gb ETH 20Gb == 2.5GB 5 6

This is without taking into account the roundtrip to access the data, and is counting with unlimited CPU power as this can also become saturated. We can see that we don't need that many SSD to saturate a network. Key point here, try and keep the data as local as possible, once again not traversing the network with the added latency and network limitation.

We can also do calculations on difference on hitting per say a local storage cache in the memory, or hitting a remote storage cache (Per say, SAN controller with caching). I know from the top of my head which are the fastest, key point, once again, keep the data as local as possible.

There are several technologies pin-pointing these issues that are seen in traditional silo datacenters, have you looked at any, and if so what is the reason that these do not fit your needs?


Wow that's one serious question right there , hope you will come to fanfest 2016 to talk about latency Big smile

I just saw this on my mobile and allow me to get you a proper answer in a day or two

Excellent stuff !
BogWopit
Star Frontiers
Brotherhood of Spacers
#91 - 2015-10-15 07:10:13 UTC
Nerdgasm,

Be interesting to see if you get SQL to perform on top of HyperV, seen it done wrong so many times to the detriment of performance.
Steve Ronuken
Fuzzwork Enterprises
Vote Steve Ronuken for CSM
#92 - 2015-10-15 11:06:28 UTC
CCP DeNormalized wrote:
CCP Gun Show wrote:



Oh yes they do ! The entire cluster matters

CCP Denormalized is just lazer focused on the DB machines apparently Big smile


I suppose I care about the rest as well... If not for those others my shiny DB servers would just sit idle all day long :)



But then you've got plenty of time for maintenance tasks! Users just cause trouble for databases!

Woo! CSM XI!

Fuzzwork Enterprises

Twitter: @fuzzysteve on Twitter

Sithausy Naj
The Currents of Space.
#93 - 2015-10-15 11:24:36 UTC
CCP Phantom wrote:
The Tranquility server cluster is a powerful machine, enabling you to create the biggest living universe of science fiction with the most massive spaceship battles mankind has ever seen.

But you know what is even better than Tranquility? Tranquility Tech III!

Our engineers are working hard to fully revamp the server cluster with new hardware, with new storage, with new network connections, with a new location and new software. TQ Tech III will be much better than the already astonishing current TQ server.

Read more about this marvel of technology (including tech specs and pictures) in CCP Gun Show's latest blog Tranquility Tech III.


And all this is planned for very early 2016! EVE Forever!


Hey dears

Is there anyone around that can hop on and chat regarding storage for a while. Are you using Child pools on Storwize V5k/SVC, is SSD used and in what configuration, is it Easy Tier enabled or just standalone allocation, Flash Copy usage? SVC Generation? Is it 32 or 64GB cache model and are you consider using Compression? What primary protocol is used (for SVC - V5k is definately FC but is it 8 or 16 FC Frontend protocol or 10Gbps iSCSI for Flex connection?
Sithausy Naj
The Currents of Space.
#94 - 2015-10-15 11:42:02 UTC
Disco Dancer Dancing wrote:
Being someone that builds complex, large datacenters for both private and public use on a rather period basis, I'm not that impressed on the path of physical architecture that you are looking at. For whatever reason, why are you looking at a traditional, silo-based solution with storage and compute in different silos and traversing a "slow" FC link to make it work when not hitting the cache in RAM. Any particular reason why you are not looking on a more modern, flexible and scalable platform then the one described in the blogpost?




Mate fair enough but you not calculated TCO of overall solution, In memory DB and processing require a little bit different approach and will need to rebuild whole concept of current infrastructure while more traditional gives ability to extended what they have without much changes.

SSD and in memory might be solution while compared to traditional HDDs. As we can see they are going to use IBM SVC that is able to virtualize external storage that means - possibility to add either SSD to Storwize V5000 or whole IBM FlashSystem at FC backend level.

While traditional SSD have standard SAS 2.0\3.0 interface this days, IBM Flash is using Direct PCI-E Flash Modules with FPGA based architecture allowing faster and direct acccess to storage media by itself. SVC allows to disable caching on the side of Storage Hypervisor meaning IO will be passing from Server HBA to Flash System with (basicly) 2 FC hops.
Switch latency ~ 5-25us according to Brocade documentations (If I'm not mistaken)

So there is a lot of things to consider and not ONLY technical
Sithausy Naj
The Currents of Space.
#95 - 2015-10-15 12:55:10 UTC  |  Edited by: Sithausy Naj
Gospadin wrote:
xrev wrote:
Gospadin wrote:
I'm shocked that a system designed to deploy in 2016 is even using rotating drives. That data must be REALLY cold.

It's called auto-tiering. The hot storage blocks reside on the fast SSD's or the internal read cache. When blocks of data aren't touched, they move to slower disks that are still more cost effective if you look to volume for your buck. Compared to Ssd's, hard disks suck at random i/o but serial streams will do just fine.


I know how it works.

It's just interesting to me that TQ's cold data store is satisfied with about 10K IOPS across those disk arrays. (Assuming 200/disk for 10K SAS and about 50% utilization given their expected multipath setup and/or redundancy/parity overhead)



Ouch this is strange consideration.

If you wanna see close numbers lets assume they going to have 8 SSD drives each 800GB capacity in RAID 5 (so one global spare = 9)
And 80 drives 1.2TB each 10k rpm SAS all in RAID 5 = 8 drives for each RAID ( 7+Parity) and 3 Global spare - there we go for basic Storwize V5000 configuration in Dev Blog.

Lets assume all of them are in one pool so have overall capacity available for mapping - 84000 GB (SAS) + 5600 GB (SSD)
Lets say we have one host connected through 4 x 8Gbps FC from server side and 8 x 8Gbps FC from Storage side with allocated usage of capacity at 70000 GB (not gonna push the limits.

According to what Storwize is capable of. Lets not be conservative and load it a little bith with 16KiB block size for transfers and start fairly from 5000 IOPS

Assuming Cache statistic
Read Percentage - 70%
Read Sequential - 20%
Read hit - 60%
Random Read hit - 40%
Sequential Read hit - 20%
Write Percentage - 30%
20% of all writes will be sequential
Seek percentage - 33%
Random Write Efficiency - 35%


At 5000 IOPS we will be here:
Total Service Time: 1.0 ms
Read Service Time: 1.3 ms
Write Service Time: 0.2 ms
Channel Queue Time: 0.0 ms
Processor Utilization for I/O: 1.5 %
Channel Utilization: 1.2 %
Host Adapter Utilization: 1.1 %
SAS Interface Utilization: 3.3 %
Flash Drive Utilization: 2.2 %
SAS 10K Drive Utilization: 9.0 %

While increasing load before drive get 60-70%
it will be close to this metrics:


Service Time with IO rate growth

And

Highest SAS interface utilization with IO rate growth

So you might be right considering 10k as for clear SAS performance (tho it still depend from a lot of factors) But you definitely not right saying 10k IOPS while there is SSD and Tiering involved.

As of Utilization

Utilization overview with IO rate


Edited.
And this is not considering how powerful SVC is that stands on top of this :)

And I'm sure there is more than 70% read :)
ISK IRON BANK
Aliastra
Gallente Federation
#96 - 2015-10-15 14:15:37 UTC
Question is:

Does it play Crysis on Max settings Cool
Disco Dancer Dancing
Doomheim
#97 - 2015-10-15 15:01:15 UTC
Sithausy Naj wrote:
Disco Dancer Dancing wrote:
Being someone that builds complex, large datacenters for both private and public use on a rather period basis, I'm not that impressed on the path of physical architecture that you are looking at. For whatever reason, why are you looking at a traditional, silo-based solution with storage and compute in different silos and traversing a "slow" FC link to make it work when not hitting the cache in RAM. Any particular reason why you are not looking on a more modern, flexible and scalable platform then the one described in the blogpost?




Mate fair enough but you not calculated TCO of overall solution, In memory DB and processing require a little bit different approach and will need to rebuild whole concept of current infrastructure while more traditional gives ability to extended what they have without much changes.

SSD and in memory might be solution while compared to traditional HDDs. As we can see they are going to use IBM SVC that is able to virtualize external storage that means - possibility to add either SSD to Storwize V5000 or whole IBM FlashSystem at FC backend level.

While traditional SSD have standard SAS 2.0\3.0 interface this days, IBM Flash is using Direct PCI-E Flash Modules with FPGA based architecture allowing faster and direct acccess to storage media by itself. SVC allows to disable caching on the side of Storage Hypervisor meaning IO will be passing from Server HBA to Flash System with (basicly) 2 FC hops.
Switch latency ~ 5-25us according to Brocade documentations (If I'm not mistaken)

So there is a lot of things to consider and not ONLY technical

I hear you when it comes to a rewrite on the concept when it comes to processing the DB in memory, but in this case this was not my intention, I'm mainly talking about a storage solution that for most of the part scales better without any limitation in the SAN nodes, SAN network and the like, while also giving increased performance with storage cache in the actual compute node, access to data without traversing a SAN network (Latency is latency, and no matter how high or low it adds up to every transaction).

There are several HCI solutions on the market, giving same or better performance as a high-end SAN, smaller footprint in Us, lower energy-consumption, lower cooling needs, scales linear and you scale only when you need to.

HCI ain't a one-size fits all, henche the questions.
Raithius
Creep Fleet Inc
#98 - 2015-10-15 16:00:52 UTC
/me Prays for an end to "The socket was closed" dc's.
CCP DeNormalized
C C P
C C P Alliance
#99 - 2015-10-15 16:13:42 UTC
Disco Dancer Dancing wrote:
I hear you when it comes to a rewrite on the concept when it comes to processing the DB in memory, but in this case this was not my intention, I'm mainly talking about a storage solution that for most of the part scales better without any limitation in the SAN nodes, SAN network and the like, while also giving increased performance with storage cache in the actual compute node, access to data without traversing a SAN network (Latency is latency, and no matter how high or low it adds up to every transaction).

There are several HCI solutions on the market, giving same or better performance as a high-end SAN, smaller footprint in Us, lower energy-consumption, lower cooling needs, scales linear and you scale only when you need to.

HCI ain't a one-size fits all, henche the questions.


As a DBA who's just recently started to get invovled on the SAN storage side, everything you say its well beyond me :) But it's interesting!

Can you give some real examples of what you are talking about and not just buzz words? :)

Edit: ok, so looking here: http://purestorageguy.com/2015/03/12/hyper-converged-infrastructures-are-not-storage-arrays/

This seems to be how hadoop and these other similar systems work? It's a bunch of servers with local disks that sit behind some shared filesystem to distribute the data cross all the server nodes?

CCP DeNormalized - Database Administrator

Sithausy Naj
The Currents of Space.
#100 - 2015-10-15 18:30:41 UTC
CCP DeNormalized wrote:
Disco Dancer Dancing wrote:
I hear you when it comes to a rewrite on the concept when it comes to processing the DB in memory, but in this case this was not my intention, I'm mainly talking about a storage solution that for most of the part scales better without any limitation in the SAN nodes, SAN network and the like, while also giving increased performance with storage cache in the actual compute node, access to data without traversing a SAN network (Latency is latency, and no matter how high or low it adds up to every transaction).

There are several HCI solutions on the market, giving same or better performance as a high-end SAN, smaller footprint in Us, lower energy-consumption, lower cooling needs, scales linear and you scale only when you need to.

HCI ain't a one-size fits all, henche the questions.


As a DBA who's just recently started to get invovled on the SAN storage side, everything you say its well beyond me :) But it's interesting!

Can you give some real examples of what you are talking about and not just buzz words? :)

Edit: ok, so looking here: http://purestorageguy.com/2015/03/12/hyper-converged-infrastructures-are-not-storage-arrays/

This seems to be how hadoop and these other similar systems work? It's a bunch of servers with local disks that sit behind some shared filesystem to distribute the data cross all the server nodes?


You are right.

In roots, but it can be both bunch of servers or set of storage devices - server with drives, JBOD, NAS, SAN or any type of storage both local or shared or in cloud or in server or anywere.

In terms of in-memory processing it's a little bit different approach. Need RAM only or flash as direct access without any SAN/NAS/DAS storage attached.

If you decide to go with any type of fast storage (to be honest) you can use any behind SVC. Eiher SSD/Flash/or any other implementation - but all of them will need Fibre Chanel connectivity.

You guys, actually, in sweet spot with Storage hypervisor.

HCI is kind of implementation, in case you wanna look for one while staying with IBM ask that guys about Spectrum Scale, they should know it.