These forums have been archived and are now read-only.

The new forums are live and can be found at https://forums.eveonline.com/

EVE Information Portal

 
  • Topic is locked indefinitely.
 

Dev Blog: Building a Balanced Universe

First post First post
Author
Suomi Khan
Doomheim
#101 - 2013-12-03 22:30:38 UTC
CCP Prism X wrote:
There's no sense of locality or proximity in WH space so they just get a very dumb but efficient method applied to them.

Read the Devblog and must say, looks like you guys out in a lot of work to make TiDi less frustrating, thank you a lot for that :)

Is it possible for you to make an addition to the Devblog explaining how the load is distributed in w-space by chance? We now know and understand in detail what is happening in known space, but it could be very cool to know how you solve w-space, even though it might be "dumb but efficient" :)

(Also, please tell me if SSC is producing the TiDi so I can have a reason to evict them Smile)
Chainsaw Plankton
FaDoyToy
#102 - 2013-12-03 22:59:34 UTC
CCP Prism X wrote:
I hear this guy listens to really weird music (NSFW) which is probably indicative of his cognitive capacity. Probably not worth reading this!


I'm out

@ChainsawPlankto on twitter

Mara Rinn
Cosmic Goo Convertor
#103 - 2013-12-03 22:59:49 UTC
How is work progressing with the dynamic node reinforcement where you move sols between nodes while players are busy in those sols?

is there any possibility of only loading systems when 'stuff' happens in them? Thus when nothing is due to happen in a system for half an hour you could "unload" the system until that POS reaction has to be serviced, or someone decides to jump into the system. Heck, if you move POS reactions to a separate service (and thus unlink it from the sol simulator) you don't even need to load the sol at all. When I approach the gate between Lanngisi and Tvink, the gate could have some kind of notice attached indicating that Tvink isn't loaded yet, is currently being loaded, or is ready to jump into. As a blockade runner pilot, I might want to stay cloaked, so there might be a mechanism to ask traffic control to insert me in the queue, at which point Tvink would be loaded, the gate would display the appropriate notification, and off we go. What if gates could be de-illuminated to express the state of being "off" due to the remote system not being loaded?

And as a result, load balancing would only be a case of, "someone wants to load this sol, which server has the lowest load? put this sol on that server!" And load balancing would thus be automatic, as would reinforcement (when one server gets heavily loaded, it won't inherit new sols, and existing ones will end up pruned). You could even preemptively "reinforce" nodes by adding a weighting to the "which server is least loaded" algorithm where an SBU adds 1000 virtual players worth of load to the system for balancing purposes, along with the historical load that your pre-balancer is using.

As a bonus, "downtime" could happen opportunistically, except for continuously populated sols for which the downtime counter would be applied as a timer for that sol, similar to the incursion bar: if I'm missioning in Lanngisi which is supposed to get downtime but has had 100 missioners in it for the last 36 hours, I'd see a non-modal warning that this node is going to experience downtime in 20 minutes. Thus I can choose to do something else (e.g.: fly to Hek to buy some stuff off the market). Then when Hek is due to shut down, I fly back to Lanngisi to continue my missioning. Downtime gets done, player gets a downtime-free playing experience. Of course there is the issue of patching the client.

If a node needs servicing, you could remove it from the opportunistic loading queue, shuffle the current sols to different nodes based on load, then you're free to shut the node down to replace it.

And then you could choose to not apply downtime to sols which don't need it because they don't have infrastructure that requires downtime.

Of course you've all been through this already in your various dev chats internally, and removing downtime is probably going to be easier than moving to a "just-in-time" sol loader with dynamic sol rehoming, and I should just shut up.

I'd just like you to consider that AU-TZ players would love downtime operating per-system on a round-robin basis almost as much as removing downtime altogether.
Max Kolonko
Caldari Provisions
Caldari State
#104 - 2013-12-03 23:03:29 UTC
CCP Explorer wrote:
One more thing to mention, as a part of this change then the cluster is starting up 2 minutes faster. See here https://forums.eveonline.com/default.aspx?g=posts&m=3897467#post3897467 and here https://forums.eveonline.com/default.aspx?g=posts&m=3899297#post3899297 for details.



So we have like, what? 6 - 8 minutes of sleep left daily?
Max Kolonko
Caldari Provisions
Caldari State
#105 - 2013-12-03 23:09:05 UTC  |  Edited by: Max Kolonko
Mara Rinn wrote:
How is work progressing with the dynamic node reinforcement where you move sols between nodes while players are busy in those sols?

.... snip.....



Would it be possible to do something like this?:

- freeze game (TIDI 100%) in old node
- spawn new node with 100% TiDi
- Show players info about load balancing or something to let them know they are changing servers
- move people to new node with all player/ship states untouched and make it take as long as it is needed
- all new player system entrences during the node movement is put too new node
- after all players are moved OR certain timer have passed despawn old node and set TiDi to appropriate level on new node
Trillian Stargazer
Perkone
Caldari State
#106 - 2013-12-03 23:09:47 UTC
I am still trying to wrap my head around this comment,

"Time Dilation is all well and good when you‘re blobbing with your space-friends in Nullsec"

If TiDi is good enough for blobbers in Null Sec is should be OK for empire people. When a node goes in to TiDi when its reinforced and there are less people than are in Jita, Something is wrong, Very wrong. When we have 60 people in a fleet and we undock, we get TiDi, when we swap ships, TiDi, when we jump through a gate, TiDi. None of thses should cause TiDi, yet they do.

TiDi was introduced as a bandaid while CCP was working on a Fix. Now you are promoting TiDi as the fix.

So what is it?

Is TiDi the end all fix or is CCP working on fixing the core issues that cause TiDi to begin with?

If TiDi is the fix, you are doing it wrong.
Kossaw
Body Count Inc.
Mercenary Coalition
#107 - 2013-12-03 23:17:57 UTC
CCP Explorer wrote:
The final piece of this puzzle, the intra-node jumps vs. inter-node jumps we ultimately want to solve with Brain in a Box.


We're still waiting for that dev blog mate Blink

WTB : An image in my signature

Sable Blitzmann
24th Imperial Crusade
Amarr Empire
#108 - 2013-12-03 23:23:03 UTC
Max Kolonko wrote:
Mara Rinn wrote:
How is work progressing with the dynamic node reinforcement where you move sols between nodes while players are busy in those sols?

.... snip.....



Would it be possible to do something like this?:

- freeze game (TIDI 100%) in old node
- spawn new node with 100% TiDi
- Show players info about load balancing or something to let them know they are changing servers
- move people to new node with all player/ship states untouched and make it take as long as it is needed
- all new player system entrences during the node movement is put too new node
- after all players are moved OR certain timer have passed despawn old node and set TiDi to appropriate level on new node


I'm sure if it was that easy, they would have already done it.

Sure, to us, it's as simple as moving players to another node. Like walking from one room to another. But programmatically, it might not be feasible or possible without a lot of reworking. Just like switching characters without logging - easy concept, extremely difficult to accomplish with current codebase,
Sable Blitzmann
24th Imperial Crusade
Amarr Empire
#109 - 2013-12-03 23:31:19 UTC
Trillian Stargazer wrote:

If TiDi is good enough for blobbers in Null Sec is should be OK for empire people. When a node goes in to TiDi when its reinforced and there are less people than are in Jita, Something is wrong, Very wrong. When we have 60 people in a fleet and we undock, we get TiDi, when we swap ships, TiDi, when we jump through a gate, TiDi. None of thses should cause TiDi, yet they do.

TiDi was introduced as a bandaid while CCP was working on a Fix. Now you are promoting TiDi as the fix.

So what is it?

Is TiDi the end all fix or is CCP working on fixing the core issues that cause TiDi to begin with?

If TiDi is the fix, you are doing it wrong.


You're equating two completely different scenerios. 2000 people in Jita doing trading on the market does not equal 2000 people on space, activating 10+ modules, 5 drones, the server keeping up with what ship is where, it's speed, it's speed relative to other ships, effects being applied to various ships via points, RR, ecm, and doing damage calculation, all while handling jumping in and out of the star system, POS timers, station timers, all while handling other systems on the same node.

It's no comparison.

I'm sure CCP is continuing to look into lag and bottlenecks. Drones is probably a good thing to look into. Possibly fleet jumps and maneuvers as well (so that it doesn't hhve to loop with each ship, but treat all ships as a whole in a fleet object). But they've said a while ago in a devblog that they're getting to the point where there's not a whole lot of optimizations that are left to be done.

TiDi was never a fix, nor intended to be a fix. It was intended to mask the lag and give the server time to respond to the many different commands that were incoming. it was never a fix.

Nor was a fix ever intended to be soon. I would assume dynamically allocating resources would be as close of a fix as anything. Who knows where that is, or if it's close. That is a massive undertaking and could be another few years maybe.
MeBiatch
GRR GOONS
#110 - 2013-12-03 23:36:03 UTC
CCP Prism X wrote:
I hear this guy listens to really weird music (NSFW) which is probably indicative of his cognitive capacity. Probably not worth reading this!


wtf

There are no stupid Questions... just stupid people... CCP Goliath wrote:

Ugh ti-di pooping makes me sad.

Fix Lag
The Scope
Gallente Federation
#111 - 2013-12-03 23:50:28 UTC
MeBiatch wrote:
wtf


not empty quoting

CCP mostly sucks at their job, but Veritas is a pretty cool dude.

Rn Bonnet
Perkone
Caldari State
#112 - 2013-12-04 01:22:19 UTC
Here is a precise algorithm that runs in O(|V| + |E|) and can deal with heterogeneous nodes:

Take the nodes n1, n2... with to capacity c(n) and the systems s1, s2.. with cost C(s) and calculate a loading factor per node l1 = c(n1) * (ΣC(s)/Σ(c(n)). That is how much load a node of a given capacity should take. You then perform a simple breadth first traversal assigning each system to nodes until the sum of the cost exceeds the loading factor of the node. After this you pull a new node off your stack and begin assigning systems to it. Achieves perfect load distribution across heterogeneous nodes and systems. You do need a sigma to account for the discreet to continuous domain.


Jessica Danikov
Network Danikov
#113 - 2013-12-04 02:20:08 UTC  |  Edited by: Jessica Danikov
GeeShizzle MacCloud wrote:
Jessica Danikov wrote:

That's starting to sound like a Colour problem. Ultimately you want to maximise the number of local nodes without them touching themselves, while minimising the global distribution of any individual colour. This does the best to maximise the high-performance transitions while keeping TiDi relatively local to its cause.


from what i can understand i think he means the opposite... cut up the load balancing down to the final level then split the final level across 2 nodes ensuring maximum contact surface between the 2 servers.

instead of creating a line between the localised cluster of systems you create a chessboard style distribution to maximise the amount of server to server jumps rather than intra server jumps (that are considered inferior to discrete server to server jumps)

you have the localisation at a cluster wide scale, then when u get to the singular node scale it switches to a high diffusion model. the problem would be how to keep the remapping from morphing the diffused local distribution from becoming localised through time.


Yes, that's the naive first solution I was thinking of, but it's a colour problem as 'checkboarding' the graph isn't possible with two colours- you will end up having white adjacent to white nodes (take three systems all connected to each other for the simplest example- White, White, Black). The whole idea of colouring is to minimise the numbers of colours needed, so you're coming up with the minimum number of involved nodes to ensure every edge is a change in node (for most cases, you will not likely need more than 3 colours/nodes).

If you colour the universe map then load balance each colour individually, you should end up with a similar load distribution but with every jump an inter-node jump. The best aspect of this is the colouring only has to be done once.
Lelira Cirim
Doomheim
#114 - 2013-12-04 02:37:55 UTC  |  Edited by: Lelira Cirim
Max Kolonko wrote:
CCP Explorer wrote:
One more thing to mention, as a part of this change then the cluster is starting up 2 minutes faster.

So we have like, what? 6 - 8 minutes of sleep left daily?

If you need sleep go play STO. Their maintenance takes the game down for 3 hours at a time usually. Blink It's supposed to be weekly but emergency maintenance is pretty common in the weeks following an update.

I'd really like Massively to do an infographic of MMO downtime. Maybe give the engineers something to compete over. Big smile

Do not actively tank my patience.

Geofferybg
Canadian Forces Corp
United 4 Nations
#115 - 2013-12-04 02:49:22 UTC
so not sure if anyone has asked this yet but...
If you are load balancing based on the "nodes" or CPU cores could you not have the script run dynamically? like below:

  1. Systems not used i.e. no players logged in have no node assigned (well they have a theoretical node assigned but not loaded upon that node)
  2. As players log in to systems where there is an unusual level of "logging in" (jumping into the system or logging into the game) you move non active systems of that node to a dedicated "reinforced" node. this could be measured using the number of active players (im sure there is a database query that could be run every 15 minutes)
  3. As players jump from there staging system to another system nearby (probably where there going to be attacked/ attacking) they would move of a non reinforced node and onto a reinforced node....


this makes a few assumptions which i can not state from a technical standpoint is always true but may aid in handling load.

  1. that the systems near the staging system are empty (ok so not usually but you never know)
  2. that the reinforced nodes are just kinda sitting there as unused hardware and you are not running a virtual machine cluster that hands off load across multiple virtual servers (i assume this as load balancing is needed at a game level not a "VM" level)
  3. that the solar-system level start-up could be delayed using a command along the lines of " wait till client call "login" at solar-system "id" parse system start-up to "node as defined under normal circumstances"" (yes that's sudo code not actual code and sorry for the number of quotes)
  4. that if the "node as defined under normal circumstances" for start-up of a system is at TIDI activation capacity shift "node as defined under normal circumstances" to reinforced node is a viable option.



And I'm sure if this was a simple exercise it would be done but have to ask. Any dev care to comment as to why the above is not a deploy-able solution? or why there is not a VM layer to the cluster(does it add so much overhead that load-balancing becomes worse??) ?

Abdiel Kavash
Deep Core Mining Inc.
Caldari State
#116 - 2013-12-04 04:20:48 UTC  |  Edited by: Abdiel Kavash
It is (currently) impossible to move a running system from one node to another without first kicking everyone logged in in that system (or possibly even all players on that same node, I'm not sure). But your idea is close to the overall long-term goal.
Trinkets friend
Sudden Buggery
Sending Thots And Players
#117 - 2013-12-04 05:04:28 UTC
So...where does w-space fit in, and how many nodes does it have, and how is load assigned to w-space? it is interesting to me, not so much because I've experienced TiDi in a wormhole, but because it is a bit pot luck sometimes when you jump 30+ people through a wormhole whether you will load grid in 5s or 10s.

Given wormholes are very dynamic connections, especially the S199, A641 and other k-k connections, you can essentially bridge two nodes together which are not usually connected. Does this even matter to the server balancing, as now two disparate systems are directly connected to one another and this ought to influence the algorithm for nearest-neighbour weighting.

Just curious!
Jessica Danikov
Network Danikov
#118 - 2013-12-04 07:10:13 UTC
Being able to migrate solar systems live between nodes would be close to the holy grail of load balancing as you can harden a node incrementally by moving other nodes off it or move a system to a reinforced node when something like Asakai happens.

The other big leap would be the ability to exploit multiple cores/threads per solar system (doesn't help with load balancing so much as make the CPU ceiling and thus the load potential of an individual system much higher == more players in the same system together).

Both of these would be significant technical challenges, potentially a significant re-architecture in the application, so I wouldn't hold my breath for them happening any time in the foreseeable future.
Laserak
The Scope
Gallente Federation
#119 - 2013-12-04 07:11:47 UTC  |  Edited by: Laserak
Login tidi and undock tidi in a staging system is the devil. B-D never forget
LakeEnd
Brutor Tribe
Minmatar Republic
#120 - 2013-12-04 07:48:04 UTC
Nice to see you guys are working on this. However I have few concerns about it.

Disclaimer: I am not completely sure I understood the dev blog, so please correct me if I am way off, but my understanding is that you are still trying to balance the load geographically instead of doing it statistically?

First of all, should you actually stop treating the problem for nullsec and highsec (+lowsec I guess) as similiar? Because the way I see it, highsec system load should be rather predictable and mostly consistent due to tradehubs and missioning centers remaining largely the same. Nullsec however is rather more unpredictable, system load is generated by whim of player coalitions and where they choose to clash that day (timers and random acts of violence).

Related to the randomness of the nullsec load, my second concern is how fast you will iterate this node splitting or will it be set in stone and forgotten about in few months? I mean while galactic south and east are now in turmoil in nullsec, it will be complete different story in six months. What seems now like very heavily loaded area (current null sec alliance staging systems or strategic chokepoint systems we are fighting over) will not be like that in near future.

Third nullsec system load is usually pretty centralized around one region, and if you insist splitting the nodes on hosts geographically, you will always have areas where there are thousands of people moving and fighting while other remote areas are virtually empty. Wont this mean that hosts running the servers of nullsec regions of north are almost idling, supporting the occasional ratter and which ever host gets systems from current conflict regions (Immensea etc) is always going to be heavily overloaded?

How far are you guys from being able to run a system in multiple threads? Is the live remapping of system to a different node even distinct possibility, wouldnt it be possible using virtualization and technology like DRS in vSphere?