These forums have been archived and are now read-only.

The new forums are live and can be found at https://forums.eveonline.com/

EVE General Discussion

 
  • Topic is locked indefinitely.
 

Normalizing spreadsheet data

Author
Miles Winter
The Scope
Gallente Federation
#1 - 2014-05-30 01:36:24 UTC
Wondering if anyone could give me some assistance on this, since I'm rather unfamiliar with excel.

I've got the following data tables:
https://i.imgur.com/oGjGUwx.png

The first three number columns represent scores based on killmail data, the light purple column is an average of those scores.

In the next column over I want to create a normalized distribution of the averaged scores - the averaged data ranges mostly from 70-95; I want to widen that out to a 0-100 range. The numbers in red were added in paint to give you an idea of what I mean.

But I also want to do this while excluding data from items marked with an asterisk. (Mostly they appear to be outliers/non-pvp oriented ships)

eve general, you're my only hope

Once I've got this put together, I'd like to make a post with the full data tables for people to peruse. It's nothing particularly special, just an estimation of spaceship pvp capability based on some collected data from zkillboard.
Noriko Mai
#2 - 2014-05-30 01:38:57 UTC
FURRRRYYY BOOOOOTTT, III SUMMON YOUUUU!!!

"Meh.." - Albert Einstein

Sibyyl
Garoun Investment Bank
Gallente Federation
#3 - 2014-05-30 02:27:59 UTC
Normalization is done through =STANDARDIZE(value,average,standard_deviation)
You'll want a cell (H2) where you calculate mean of all values: =AVERAGE(E:E)
You'll want a cell (H3) where you calculate standard deviation of all values: =STDEV.S(E:E)

Then for each value in column F, ie: F109: =STANDARDIZE(E109,H2,H3)

Joffy Aulx-Gao for CSM. Fix links and OGB. Ban stabs from plexes. Fulfill karmic justice.

Miles Winter
The Scope
Gallente Federation
#4 - 2014-05-30 02:36:20 UTC  |  Edited by: Miles Winter
Sibyyl wrote:
Normalization is done through =STANDARDIZE(value,average,standard_deviation)
You'll want a cell (H2) where you calculate mean of all values: =AVERAGE(E:E)
You'll want a cell (H3) where you calculate standard deviation of all values: =STDEV.S(E:E)

Then for each value in column F, ie: F109: =STANDARDIZE(E109,H2,H3)


This is fantastic! But, it does not appear to skip the asterisk marked fields. Is there syntax for skipping a field, or a way I can block the asterisk field from being read by the standardize function, aside from just deleting it?

Also, this is giving me (mostly) a range of 0-1; with some negative numbers below 0 and positive numbers above 1. Is this what I should be seeing?
Miles Winter
The Scope
Gallente Federation
#5 - 2014-05-30 03:24:02 UTC
I've removed what I considered to be outliers - Ships that are designed for exploration, limited edition ships, interdictors, shuttles, etc.

This is the data that I'm seeing now: https://i.imgur.com/WIarD4L.png
However, I'm having some difficulty actually parsing it, due to a limited understanding of this standardization.

My present assumption is that 0 would correspond to the mean; the hump on the bellcurve. But I can't really figure out just what it means when a ship has a value of say -0.23, or 0.23
Sibyyl
Garoun Investment Bank
Gallente Federation
#6 - 2014-05-30 03:49:04 UTC
Miles Winter wrote:
This is fantastic! But, it does not appear to skip the asterisk marked fields. Is there syntax for skipping a field, or a way I can block the asterisk field from being read by the standardize function, aside from just deleting it?

Replace all instances of E:E with SUMPRODUCT(E:F).. but you will not have a bunch of zeroes in your set of values, so replace your STDEV and AVERAGE functions with something like:

=STDEV.S(IF(SUMPRODUCT(E:F)<>0,SUMPRODUCT(E:F)))

Please test this as I am not sure.. this might return an array function and not what you need.

Quote:
Also, this is giving me (mostly) a range of 0-1; with some negative numbers below 0 and positive numbers above 1. Is this what I should be seeing?

Just process your data beforehand with: {set of original values} * 100 / (MAX{set of original values}). I'm actually not sure this is correct. My brain is not working. Let me think about this..

Joffy Aulx-Gao for CSM. Fix links and OGB. Ban stabs from plexes. Fulfill karmic justice.

Sibyyl
Garoun Investment Bank
Gallente Federation
#7 - 2014-05-30 04:00:51 UTC  |  Edited by: Sibyyl
I don't think you need STANDARDIZE at all. Just try:

F110 = ((E110-MAX(E:E))*100)/(MAX(E:E)-MIN(E:E))

I think all you want is changing those values to 0-100 scale..

Joffy Aulx-Gao for CSM. Fix links and OGB. Ban stabs from plexes. Fulfill karmic justice.

CompleteFailure
DAWGS Corp.
#8 - 2014-05-30 05:26:51 UTC
My brain hurts after reading all that. Internet spreadsheets is srs bsns yo.
Aoife Fraoch
The Scope
Gallente Federation
#9 - 2014-05-30 05:45:21 UTC  |  Edited by: Aoife Fraoch
Having a look at the data you are working with, I am wondering why you feel you need to exclude the asteriked stuff? Especially in the case of the covert ops figures. Outliers are data too, and it seems that you're starting with an assumption of normal distribution?

If you want to force a normal distribution curve with the data you have in the purple column, and maybe without losing some values marked as outliers in what look like very limited sets, perhaps you can consider transformations?

This is a fairly simple tool to use, it doesn't cover all transformations but it should give you enough if you want to take skewed data and force it into a normal distribution: http://vassarstats.net/trans1.html.

Also obligatory wikipedia link: http://en.wikipedia.org/wiki/Data_transformation_(statistics)

EDIT: or you could always go non-parametric and assign ranks, it really depends on what you want to accomplish with the data in the first place.