These forums have been archived and are now read-only.

The new forums are live and can be found at https://forums.eveonline.com/

EVE Information Portal

 
  • Topic is locked indefinitely.
 

Dev blog: Tranquility Tech III

First post First post
Author
xrev
Brutor Tribe
Minmatar Republic
#101 - 2015-10-15 19:25:37 UTC  |  Edited by: xrev
The general answer on storage questions is "It depends..."

I see a lot of discussion about pure technical stuff, like spindles, raid-levels, cache, hops, latency etc etc. But it really depends on from what station you leave. Let's break it up for a moment (from my point of view)

If you want to avoid latency from computerlayer (servers) to storage array and storage network (San) you could use a number of Ssd's in every server, so that you have enough volume and iops to serve the application without (imo) the small delays. The downside to this, is that it's not very scalable and cost-effective. Not even talking about distributing the data to each server in need of it.

If you want a central storage supply with or without replication to another node, you get a easier to manage solution with higher volume, higher iops etc. where you can serve apropriate chunks to the computing layer. Downside is the mentioned latency and expensive networking. In my experience you choose fibrechannel over iScsi if you want the least amount of latency and network overhead. Yes, iScsi has a higher throughput, but if you take into account the overhead and possible retransmits and temporary storage buffer and recalculation of the packets, it's about the same speed as FC. FC goes just that extra mile for you if you need it.

Then we have the topic of latency; Are those milliseconds/nanoseconds really that important? Yes ofcourse, but of bigger importance is that you manage the full stack from application to storage altogether. You f*ck up the disk alignment? Needed iops go at least times two. Build yourself a fancy sql query that proves to reread the complete table 5 times in a row to get a selection? Your storage won't help you much there. In fact, poorly written software will be able to cripple your fancy storage array in less than no-time. So review the complete stack from application down to storage before blindly focussing complete on the storage array and network.

I haven't seen many examples where customers were able to congest the complete FC network, other than with large datastreams (backups, replication etc). Most issues normally come from too small frames, disk misalignment and fancy sql queries that need to come from disk instead of memory.

Last topic for this post is the question about the chosen storage solution. The chosen solution maybe a classical one but sometimes, that's just what you need. It's proven technology, where the needed expertise is easier to get than the challenging solutions (purestorage (love it btw), Tintri or SolidFire for example). The demands of CCP are high, but not as high like a big financial. The current storage leaders (HP, EMC, IBM, hell even NetApp) are able to reach that demand and like I said, it's proven and easier to get support on than the challengers.

So CCP, way to go looking at the full stack rather than chunk up the several layers.
Disco Dancer Dancing
Doomheim
#102 - 2015-10-15 21:02:42 UTC  |  Edited by: Disco Dancer Dancing
A few interesting discussions going on here, and by the looks of it we have a few people working with storage solutions.

Too answer, HCI do have some alignment in your claim towards Hadoop, or sort of.
While it is a bunch of servers, with local disk (Mainly SSD with Cold-Data on HDD to get the volume), there a few key components to think about when talking about HCI, first, Hyper-Converged means different depending on what you are talking about, storage, compute, networking, but in essence it means that we combine two or more solutions into a single, scalable unit. Looking at datacenters this is mainly Compute and Storage.

This in essence means that once we scale, we scale both Compute and Storage linear (Both IOPS and Capacity). As we no longer have any dependencies to what a SAN controller can handle until it bottlenecks, that are out of concern, neither are we in theory not depended on the SAN-network and when it may become a bottleneck, since when we scale with a unit we also in theory scale the throughput on the "SAN-network" since we still need to have data protection of some sort with blocks in different nodes to tolerate failures.

Depending on vendor, a few utilize data-locality, meaning that they migrate the bits as close to the actual compute as possible, others depend on Low-Latency Connections (RDMA over Ethernet, InfiniBand or the like) as this is already invested in the ToR switches.

We also raised some concern regarding scalable and cost-effective on a solution utilizing local storage layer, which I would say is untrue. HCI by design is built for scalability. You can argue and say that HCI by design is:
- Predictable as every unit adds Compute, Memory, Storage Capacity and Storage IOPS.
- Repeatable as HCI is built from the beginning to be clustered by design you get a "Single Pane of glass" management and monitoring for the whole cluster, including Storage.
- Scalable as we combine together the above, we can predict how we scale with each node and since the solution is built on being clustered and repeatable we can easily scale up without much intervention.

A key point to remember is that the same person keeping track on the compute-layer now also keeps track on the storage-layer. This in most cases means lower TCO.
Just to give a pure figure on what is achievable from a rather standard setup from one of the vendors:
6U, 2 nodes per 2U
Random Read IOPS : 30.000/node
Random Write IOPS : 27.300/node

6 Nodes giving a total of:
Random Read IOPS : 30.000*6 = 180,000 Read IOPS
Random Write IOPS : 27.300*6 =163.800 Write IOPS

What about usable storage? Roughly 70TB counting with a redundancy factor of 2.
Now, as in all solutions, as soon as we start hitting Cold Data, those figures will go down, but in those cases I would claim that calculations on an Daily working-set of data has not been done correctly.

As in all solutions HCI also got a few caveats, one mainly being that workloads in most Enterprises doesn’t look the same, which means that a few needs a lot of memory, a few needs a lot of CPU, and a few needs a lot of capacitor and/or IOPS. But as we try and scale linear, with both CPU, Memory and Storage for each unit something might be skewed. A few vendors then have more storage Heavy options, or vice versa on Memory/CPU. But seeing as in this case the workloads are known, and should be the same over the whole solution, then it should be doable to find a unit that scales well for all aspects.

As said before, HCI doesn’t fit all solutions and both discussed have their pros and cons. But since you from the looks of it can get a smaller solution from the start, with lower TCO, complexity that rather scales when you need and does it with ease, but still give you the redundancy and performance directly from the start but in a smaller suite so to say.

And just to clarify, Tintri, Purestorage, SolidFire and the like are not HCI solutions as a few of them should be seen as AFA solutions etc. GridStore, SimpliVity, Nutanix, EVO:RAIL (I bet that I miss out on several others) should be seen as HCI solutions as they combine several components into one, scalable unit.

*Ninja Edit*
Most HCI solutions are appliances, meaning that they have very low overhead when it comes to handling the whole stack, including automation, self-healing and the like.
Numa Pompilious
Viziam
Amarr Empire
#103 - 2015-10-15 21:05:01 UTC
I currently work with a very similar configuration. HS22 blade centers, IBM FC storage, cisco switching and routing, VMware ... currently working on integrating hypervisor though.

It's sexy ... VMotion is a winner, though I'm unsure if I am going to stay with IBM blade centers ... next upgrade i have allocated is for EMC storage vice IBM ... unless, of course, CCP depletes the worlds supply of SAS drives with TQ TIII



CCP DeNormalized
C C P
C C P Alliance
#104 - 2015-10-16 10:07:15 UTC
Disco Dancer Dancing wrote:
A few interesting discussions going on here, and by the looks of it we have a few people working with storage solutions.



Thanks for crazy details Dancer, I appreciate the time spent in these repliesl!

Can you throw out a ball park $$ figure for that setup?

So can you run windows servers and such on this stuff? Can I run my MS SQL Cluster on top of this? Do I just carve out luns as with a typical SAN and present them to the cluster?

In which case how would that stack look? There would be say 6U of applicances - plus now the hardware for the windows cluster/DB (or does that run on the appliances as well?)

Cheers!

CCP DeNormalized - Database Administrator

Lucian Thorundan
House Of Serenity.
#105 - 2015-10-16 10:43:42 UTC
CCP FoxFour wrote:
Zand Vor wrote:
I'm a super network geek....I really want to know what router, load balancer, and switch platforms you switched to since it sounds like you ditched Cisco.

Oh well, this is a great article and it's awesome to see just a glimpse of how all this infrastructure is designed to work together.

Thank you!


Will ask if they mind sharing said information.


+1 i would be very interested as a networking career nerd as well as to the answer for this, a diagram with no models on it is like a stripper that doesn't take their clothes off.

I presume the answer is F5 LB's as well, but i've seen Netscalers around in a lot of big deployments and they cost out better generally so it may well be that way too.

Also, can the dbAteam please tell CCP Stephanie to change that name to CCP StephsQL (or something more creative than my quick thought)
Disco Dancer Dancing
Doomheim
#106 - 2015-10-16 11:59:28 UTC
CCP DeNormalized wrote:
Disco Dancer Dancing wrote:
A few interesting discussions going on here, and by the looks of it we have a few people working with storage solutions.



Thanks for crazy details Dancer, I appreciate the time spent in these repliesl!

Can you throw out a ball park $$ figure for that setup?

So can you run windows servers and such on this stuff? Can I run my MS SQL Cluster on top of this? Do I just carve out luns as with a typical SAN and present them to the cluster?

In which case how would that stack look? There would be say 6U of applicances - plus now the hardware for the windows cluster/DB (or does that run on the appliances as well?)

Cheers!

Most of the HCI solutions combines Storage and Compute into one unit, and they utilize virtualization on top of it to still have the flexibility of Technologies like vMotion and others. Without going into detail, several vendors have options with either VMware, Hyper-V, KVM etc. So we are not talking about adding an extra layer of hardware above the HCI solution where our SQL and other workloads live since this is already integrated (Although, it is possible from a few vendors to use for an example IBM blades as compute where SQL would live, but then again this wouldn't really be HCI and here a traditional SAN would be a better fit from my Point of View, but during a transition phase this is possible)

So to answer the question, yes your Windows Servers, SQL and the like will reside inside this 6U, 6 Node cluster, with each node having 512GB of RAM along with 28 Cores @ 2.6GHz. They would however have to be virtualized inside either VMware, Hyper-V or the like to be able to utilize the platform as virtualization is a key-factor in most HCI solutions.

As I'm no sales-rep nor from any vendor I can't really give any correct $$$ figures, and I'm not sure how much I can share from those figures I have from vendors, but if I salt them a bit and add a few $$$ we can have a ballpark figure around 620-680k $ for the solution in this discussion. This is with redundancy inside one datacenter, being able to fail both nodes, disks and blocks and option to synchronize to another 6U solution in case of disaster (As an example, Iceland)
To summarize we are talking about per node:
512GB RAM
28 Cores @ 2.6GHz
Random Read IOPS : 30.000
Random Write IOPS : 27.300

Total solution:
3TB RAM
168 Cores @ 2.6GHz
Random Read IOPS : 30.000*6 = 180,000 Read IOPS
Random Write IOPS : 27.300*6 =163.800 Write IOPS
70TB of capacity.

HCI solutions are on some basis a higher CapEx investment if you size a traditional solution and compare the performance and price, but since they include both Compute and Storage, fewer personnel to maintain, less Rackspace, less energy consumption, less cooling needs and less complexity as we piggy-back on the investments done on the ToR switches instead of a dedicated network for the storage layer, combining that gives a lower TCO for most of the part. Also instead of being needed to pat the hardware to make sure it’s okay those guys can instead focus on other core business that helps the company.

Now we haven’t really discussed any specific vendor, technology behind it (Data-locality, compression, deduplication, MapReduce etc.) nor haven’t touched all caveats, pros/cons and the like. There are several vendors out there that I’m sure would happily talk with you, and since you seem to have in-house expertise already on traditional setups you should be able to compare them and find what best fits your needs. But seeing that you know what your workloads are, how they perform and how you need to scale, you should be able to start small and only scale when you need, as an example when Dust hits PC, Valkyrie is released or the like, and if needed you can also scale down as there are no hardware tied together between the blocks as this is done in software instead.
Keep in mind that HCI is by design meant to be clustered, and even the smallest solutions are by design fault-tolerant and gives you the redundancy needed even at the smallest setups.

CCP DeNormalized
C C P
C C P Alliance
#107 - 2015-10-16 12:49:40 UTC
Disco Dancer Dancing wrote:


So to answer the question, yes your Windows Servers, SQL and the like will reside inside this 6U, 6 Node cluster, with each node having 512GB of RAM along with 28 Cores @ 2.6GHz. They would however have to be virtualized inside either VMware, Hyper-V or the like to be able to utilize the platform as virtualization is a key-factor in most HCI solutions.



Great info again DiscoD! Really gives me a good idea of what this HCI is all about, massives thanks for the time spent to share knowledge!

CCP DeNormalized - Database Administrator

Tholuse
Imperium of Suns
#108 - 2015-10-16 16:18:19 UTC  |  Edited by: Tholuse
Good News ... for new Hardware Enviroment Big smile

I was this week in Barcelona in the VMware VMworld 2015 and have talk with many Friends, co-workers and Costumers
was playing EVE. All was happy to the Hardware Change and use VMware Technology for EVE Online.

I have many Costumers was use IBM SVC Technology with VMware vSphere ... was very Fast System.

I Hope the Transformation Steps ... virtualisation from existing Systems works Fine.

Im sure the change to VMware Virtualisation Technology ... was the right Step ... go to new Expirence with EVE

EVE FOREVER

Tholuse
*in real life VCP,VTSP and Storage Guy*


I have Idea ... we make VMware User Group call VMEVE or VEVE
and next year on VMworld Barcelona 2016 make Special EVENT for EVE Players was visit this Event.
Rillek Ratseye
Extropic Industries
The Initiative.
#109 - 2015-10-16 16:43:18 UTC
You need to be very careful with the EasyTier on the Storwizes and SVC.

If I was configuring that stuff I'd dedicate some of the SSD space for the DB.

If you dont do this, when you failover to the secondary SAN, nothing will be tiered right, as the vDiskmirror will do all reads from one v5000 and the other would do writes only. - So the second v5000 would detect hotspots based on write workload only. And you really want it based on read workload. If just the SVC could load balance reads across both mirrored copies...o O (one can only dream of the future!)

Also, why did you opt for the x240 compute node? the x222 node seems more fit for some of the stuff, and you get double density at 28 nodes per flex chassis.
Internal disks or the lack of 4th memmory channel per cpu is the only reason I can imagine. - But they might of course be significant.

And lastly, you bought Lenovo stuff, not IBM! The v5000 is a lenovo product these days. The Flex chassis is Lenovo now. (only the Pure systems are still IBM)

And as someone wrote earlier in the thread, it seems the players have all the knowledge you need to run TQ! I'm personally a VMWare/IBM Storage consultant, and work with Storwizes, SVC's, Flex'es etc. daily.

/Ratseye

xrev
Brutor Tribe
Minmatar Republic
#110 - 2015-10-16 16:49:53 UTC
We should create a corp for all the storage/virtualization workers... IOPS fleets and some iScsi congestion on the gate Pirate

@Ccp, build some inter-station darkfiber connections ;)
Sithausy Naj
The Currents of Space.
#111 - 2015-10-16 21:42:18 UTC
Rillek Ratseye wrote:

And lastly, you bought Lenovo stuff, not IBM! The v5000 is a lenovo product these days. The Flex chassis is Lenovo now. (only the Pure systems are still IBM)


They are not :)
ShyLion
The Scope
Gallente Federation
#112 - 2015-10-20 20:50:48 UTC  |  Edited by: ShyLion
Posting from my cell in Portugal (while on vacations from the US)...

Before finalizing anything, I would suggest a POC using the new Cisco blade chassis', F5 load balancing (LTM and for geographical DNS balancing GTM) and ASM/firewall products, and to top off the storage use EMC's XtremeIO storage. As a systems engineer for over 20 years with experience and certificates in VMware/Redhat virtualization, Dell, IBM, and now Cisco products as Well as automation on the fly with no interruption I believe their is no equal at this point in time. If you have any questions you can contact me. If you have any issues contacting sales reps let me know, I can facilitate vendor contacts. From a player and Engineering perspective, this a great but costly alternative with a lot of flexibility and great redundancy depending on implementing.
bbb2020
Carebears with Attitude
#113 - 2015-10-20 21:29:27 UTC
Don't know if anyone have asked before CCP but can I have your "old" stuff?
Merior
Ouroboros Solutions
Recursive Horizons
#114 - 2015-10-21 04:08:06 UTC
Will the server have a choice of skins? What?
Indahmawar Fazmarai
#115 - 2015-10-21 13:50:36 UTC
Q:
Indahmawar Fazmarai wrote:
Out of curiosity... where will come from all the additional players needed to use/justify such powerful hardware? Straight


A: Those players will hold free to play accounts, of course.

That's why they need such massive hardware for TQ-III even as TQ-II is running at 20% capacity...

(/tinfoil hat)

I accept alternate explanations. Tell me that TQ-III is just the smallest it can be and all the extra power comes from technological evolution and I'll accept it... tell me that it's because of Valkyrie / Gunjack server needs and i'll accept it too...
Indahmawar Fazmarai
#116 - 2015-10-21 13:51:45 UTC
bbb2020 wrote:
Don't know if anyone have asked before CCP but can I have your "old" stuff?


IIRC, they raffled some used TQ blades last Fanfest... Blink
Insane TacoPaco
The Scope
Gallente Federation
#117 - 2015-10-21 16:49:28 UTC
That's a fair amount of SQL Server Enterprise licenses. I bet your MS EA rep loves you guys Blink
CCP Gun Show
C C P
C C P Alliance
#118 - 2015-10-24 20:51:19 UTC
CCP Gun Show wrote:
Disco Dancer Dancing wrote:
Being someone that builds complex, large datacenters for both private and public use on a rather period basis, I'm not that impressed on the path of physical architecture that you are looking at. For whatever reason, why are you looking at a traditional, silo-based solution with storage and compute in different silos and traversing a "slow" FC link to make it work when not hitting the cache in RAM. Any particular reason why you are not looking on a more modern, flexible and scalable platform then the one described in the blogpost?

Seeing that everything not in the RAM-cache have to traverse the FC switch we can quickly give a few numbers on the actual latency and round-trip on several different ways of accessing data on different locations

L1 cache reference 0.5 ns
Branch Mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1KB with Zippy 3,000 ns
Sent 1KB over 1Gbps network 10,000 ns 0.01 ms
Read 4K randomly from SSD 150,000 ns 0.15 ms
Read 1MB sequentially from memory 250,000 ns 0.25 ms
Round trip within datacenter 500,000 ns 0.5 ms
Read 1MB sequentially from SSD 1,000,000 ns 1 ms, 4x memory
Disk seek 10,000,000 ns 10 ms, 20x datacenter round trip
Read 1MB sequentially from disk 20,000,000 ns 20 ms, 80x memory, 20x SSD
Send packet CA -> Netherlands -> CA 150,000,000 ns 150 ms

Looking at the figures, as soon as we start to traverse several layers we add up latency on the whole request, if we need to traverse the FC, to hit the storage nodes, then hit the disk and back, latency can rather quickly add up. Keeping the data as local as possible is the key, mainly in Memory, or as close to the node as possible without traversing the network (Sure, FC is stable, proven and gives a rather low latency, but from other standpoint you could argue that it is dead in the upcoming years as we are moving towards utilizing RDMA over Converged Fabrics or the like).

On another note, if we look on a mainstream enterprise SSD we can find a few figures:
500MB/s Read and 460MB/s Write
If we put these into the following calculation to see when we saturate a traditional storage network:
numSSD = ROUNDUP((numConnections * connBW (in GB/s))/ ssdBW (R or W))

We get the following table:
Network BW SSDs required to saturate network BW
Controller Connectivity Available Network BW Read I/O Write I/O
Dual 4Gb FC 8Gb == 1GB 2 3
Dual 8Gb FC 16Gb == 2GB 4 5
Dual 16Gb FC 32Gb == 4GB 8 9
Dual 1Gb ETH 2Gb == 0.25GB 1 1
Dual 10Gb ETH 20Gb == 2.5GB 5 6

This is without taking into account the roundtrip to access the data, and is counting with unlimited CPU power as this can also become saturated. We can see that we don't need that many SSD to saturate a network. Key point here, try and keep the data as local as possible, once again not traversing the network with the added latency and network limitation.

We can also do calculations on difference on hitting per say a local storage cache in the memory, or hitting a remote storage cache (Per say, SAN controller with caching). I know from the top of my head which are the fastest, key point, once again, keep the data as local as possible.

There are several technologies pin-pointing these issues that are seen in traditional silo datacenters, have you looked at any, and if so what is the reason that these do not fit your needs?


Wow that's one serious question right there , hope you will come to fanfest 2016 to talk about latency Big smile

I just saw this on my mobile and allow me to get you a proper answer in a day or two

Excellent stuff !


I promised a answer in a day or two which i failed to delivery on , i apologize for that

the reasons are the thread took a interesting turn but regardless i have been working with our LENOVO (IBM name is stuck in my head sorry lenovo marketing) yeah been working with our partners and my storage team on the best response to this.

so little more time but i have not forgotten that i owe you all a response
Freelancer117
So you want to be a Hero
#119 - 2015-11-01 22:42:09 UTC
In another 5 years time, please create a "SOL" server and revitalize the New Eden Wormhole so we can visit the Terran System Cool

Eve online is :

A) mining simulator B) glorified chatroom C) spreadsheets online

D) CCP Games Pay to Win at skill leveling, with instant gratification

http://eve-radio.com//images/photos/3419/223/34afa0d7998f0a9a86f737d6.jpg

http://bit.ly/1egr4mF