These forums have been archived and are now read-only.

The new forums are live and can be found at https://forums.eveonline.com/

EVE General Discussion

 
  • Topic is locked indefinitely.
 

Project Discovery: My Thoughts So Far

Author
Krevnos
Back Door Burglars
#1 - 2016-03-13 17:04:15 UTC  |  Edited by: Krevnos
With Project Discovery now well under way, I must say I am surprised by the community's ability to accurately report on a significant portion of samples (for a community of trolls, scammers and schemers, it isn't bad at all!). I am very much enjoying the work thus far, particularly those challenging, niggly slides! I do have some suggestions from my experience thus far:

To Project Discovery coordinators:

1. The hardware the programme is running on needs to be upgraded or a queueing system setup for requests. Spending a significant amount of time on a slide only to find that your interpretation is rejected by the server is remarkably irksome.

2. Providing the cell line for each slide would be very helpful. While most players are not familiar with cell types, as they gain experience they will become familiar with distribution patterns in certain cells. I think the benefits here will outweigh those of 'blinding' in the medium to long term. I don't know many cytologists who look at samples not knowing the cell type. Examples of issues here include cells with small nucleoli, nucleolar number and cytoplasmic form.

3. While the selection of (particularly cytoplasmic) targets is nice, it doesn't quite encompass all options, leading to community confusion. With protein localisation being such a complex area, this was bound to happen from the outset. I have, for example, found cytoplasmic proteins localising toward the nuclear envelope with no obvious binding to Golgi or ER. Further options could also be opened up to experienced screeners.

4. A grading system would be infinitely helpful for experienced cytologists. At present there is no way to indicate protein distribution between nucleus and cytoplasm. A protein, for example, may be more prominent in the nucleoplasm than in the cytoplasm. To help ensure correct application, this feature could be introduced at 'Level 20'. Even if players aren't accurate due to subjectivity, the ratios of FITC binding are likely to be similar.

5. Players are having a hard time using the channels. I often find that obvious nuclear binding has been missed by multiple players (due to FITC being hidden under the Hoechst stain). Helping players more with this in the tutorial will likely resolve the issue.

6. Example images would benefit immensely from the ability to switch channels on and off.

7. A way to flag samples for attention would be good. Many times I have come across images where the community have missed something (like nucleolar rim or fibrillar binding). You could set yourselves a threshold for when you choose to review them.

8. The community are frequently omitting to use 'intercellular variation' where appropriate. Perhaps a couple in the tutorial would help. Actually if I remember correctly one was missed in the tutorial (maybe image 9, but I don't recall).

9. More information on the project would be wonderful! At present, players are forced largely to Google. A webpage with basic information describing what the project is, something about protein localisation and functions for the layperson, what the projected outcomes are and why the research is important would be lovely. It would also be nice to know more about how you are controlling the experiment (eg. analysis by your own screeners and multiple images from the same HCA) and how the data will be filtered and analysed.

10. Feedback. This is sorely missing at present. Being able to review slides which were subsequently scored by an expert would help greatly. These should be categorised by our personal correct and incorrect entries (this part is CCP's area). I understand this will be a significant task to implement, but essential if you are serious about building an adroit community of screeners. The effort now will pay off as fewer slides will require review later on since experienced members of the community will not be making the same errors constantly, thus reaching accurate consensus more often.

11. Stick with it! There will always be teething issues with an ambitious project like this one. The community will improve with experience. Re-testing some of the early slides in a few weeks may return better results. Encouraging and rewarding player accuracy is paramount!


To CCP:

1. The rewards system needs to be non-linear to encourage correct identification. Consider a logarithmic scale. A good cytologists is worth more than 20 button mashers.

2. Accuracy would be better based on more than simply how many are in agreement with the community. The community makes mistakes. Not only this but some players are gaming the system by cancelling out the difficult slides to gain better 'accuracy'. The system for determination would benefit from being more robust and rewarding those who successfully identify the less common markers accurately. Consider working together with Prof. Lundberg's team to improve the system.

3. Really nice work with the interface - I'm impressed!


Well that's my 20 cents worth for today!
Marranar Amatin
Center for Advanced Studies
Gallente Federation
#2 - 2016-03-13 20:42:22 UTC  |  Edited by: Marranar Amatin
I see a problem with the training of the users.

After a submission, you only see the distribution of other users, not what is correct. This will train users to rate similarly, but not necessarily correctly.

We need to see known samples more often, with a clear feedback.

As an example, I have the impression that Cytoplasm is selected too often. One of the last images that I got, had a mostly uniform distribution of green dots over the whole cell structure, including the nucleus.
It looked like a good example of "unspecified" to me, definitly not like cytoplasm, since cytoplasm is not supposed to be in the nucleus.

But Cytoplasm got 75%, unpecified only 25%.

So the lesson that most people will learn from that image is "this is what cytoplasm looks like", since everyone who selected it got a majority. At least I think they are wrong... but maybe its me who is wrong. But neither of us will ever know.

I can later see if I got accuracy increase or not, but not how the picture looked like, only the time information.

I think we need a better evaluation of our work, at least some pictures have to be reviewed by someone who really knows what he is doing, and then we need to get these pictures with the information what we picked and what was correct.

Also as Krevnos already said, there needs to be a higher incentive to get a really good precision. Right now the optimal solution for a high ISK/time ratio probably is to spend only very few time on each image, and just select the most obvious ones. Finding small details is probably not efficient, so the system will train users not do so.

edit:
just got another one of those... several artifacts clearly visible (blue and green at positions where no cells are even close), uniform staining through the whole cells including nucleus... and user distribution is 57% cytoplasm and 42% Vesicles, even though both are not supposed to be in the nucleus. And unspecified got 0%.
Ibutho Inkosi
Doomheim
#3 - 2016-03-13 22:06:20 UTC
My thoughts: The thing is a Pavlovian Device. It's there to study your behavior. It doesn't matter if you're matching up protoplasm, or the variety of stripes on zebras. That's not the object. The object is you...and your pet mouse.

As long as the tale of the hunt is told by the hunter, and not the lion, it will favor the hunter.

Selphentine
Pastafaris
#4 - 2016-03-13 22:24:02 UTC
Marranar Amatin wrote:
I see a problem with the training of the users.

After a submission, you only see the distribution of other users, not what is correct. This will train users to rate similarly, but not necessarily correctly.


Sometimes there are right/wrong answers, marked with an big red x or the green thingy i dont know the english name of.

You can however check if youre right from the gains you get - there is a uniform gain/loss you can find if you study the gains/losses.

More of a Problem is, people seem to think the census is what makes it right or wrong.
Beta Maoye
#5 - 2016-03-14 01:55:25 UTC
I think more than one third of slides are borderline cases. Cytoplasm is difficult to be distinguished from background noise. Nucleus can co-exit with cytoplasm even though the description said cytoplasm is seen throughout the whole cell, except in the nucleus. Nucleoplasm and Nucleoli are possible to appear next to each other. I can't tell the difference between cytoskeleton(microtubule ends) and focal adhesions. Sometimes I cannot decide whether it is aggresome or microtubule organizing center. I have hardtime to choose between cytoskeleton(actin filaments) and cell junctions. Intermediate filaments and endoplasmic reticulum look like twins to me. These are not true or false questions. The classification process is more like an art rather than science. No wonder computer performs badly in this job.

I believe many players want to do it right, but have no idea how to do it. It would be great if more information or training can be provided. I believe our community can be great contributors to this never-seen-before wonderful citizen science project.
Linus Gorp
Ministry of Propaganda and Morale
#6 - 2016-03-14 02:01:29 UTC
http://puu.sh/nEQUA/3ecc27db52.jpg

When you don't know the difference between there, their, and they're, you come across as being so uneducated that your viewpoint can be safely dismissed. The literate is unlikely to learn much from the illiterate.

Purr Purr
Nubzors
#7 - 2016-03-14 14:29:39 UTC
Marranar Amatin wrote:
I see a problem with the training of the users.

After a submission, you only see the distribution of other users, not what is correct. This will train users to rate similarly, but not necessarily correctly.


To show you the correct answer would mean for them to know the correct answer. If they knew it, then there would be no project discovery. That's the point of the project. To discover, to find out. And, depending on the majority of checks, they will then analyse those proteins.

It doesn't matter what others choose. If you choose right, most of the times you will select what others selected as well,
As long as you don't loose accuracy, you are on the right track.
Memphis Baas
#8 - 2016-03-14 15:27:47 UTC
I'd like to have the ability to adjust color saturation, or, if that's not possible, some sort of checkbox for "low levels of staining."

Also, I don't think people realize that if you have super-bright dots or nucleoli or whatever, you're supposed to ignore the faded background staining. If you get a strong signal, ignore the background noise, so to speak.
Krevnos
Back Door Burglars
#9 - 2016-03-14 15:49:55 UTC
Purr Purr wrote:
Marranar Amatin wrote:
I see a problem with the training of the users.

After a submission, you only see the distribution of other users, not what is correct. This will train users to rate similarly, but not necessarily correctly.


To show you the correct answer would mean for them to know the correct answer. If they knew it, then there would be no project discovery. That's the point of the project. To discover, to find out. And, depending on the majority of checks, they will then analyse those proteins.

It doesn't matter what others choose. If you choose right, most of the times you will select what others selected as well,
As long as you don't loose accuracy, you are on the right track.


I believe the issue for many of the slides is not that the experts can not reach consensus but that the volume of slides is simply too much to handle for a scientific team. In this regard guidance for us newbies is essential to ensure that we are generally in consensus. Major issues persist in false positive flagging of background noise and unwashed stain artefact (so many I have seen artefact reported as aggresome. This is not the fault of players - they simply aren't accustomed to looking at fluorescence microscopy.
Marranar Amatin
Center for Advanced Studies
Gallente Federation
#10 - 2016-03-14 18:31:10 UTC
Selphentine wrote:
Sometimes there are right/wrong answers, marked with an big red x or the green thingy i dont know the english name of.

You can however check if youre right from the gains you get - there is a uniform gain/loss you can find if you study the gains/losses.

More of a Problem is, people seem to think the census is what makes it right or wrong.


I didnt see any big red x or green thingy since the first (five?) training images.

And the gain/loss report comes too late. I cant relate that to the image, all I can see is a timestamp. I would have to make screenshot and compare that later... and even then I still would only know if I was correct or not, but in case of wrong not what the correct solution would have been.

And this is probably the reason for your third point. The gain/loss is too late. Its not really feedback that you can relate to the image. The only direct feedback you get is the census. So its only natural that most people think its what makes it right or wrong.
Marranar Amatin
Center for Advanced Studies
Gallente Federation
#11 - 2016-03-14 18:38:02 UTC
Purr Purr wrote:
To show you the correct answer would mean for them to know the correct answer. If they knew it, then there would be no project discovery. That's the point of the project. To discover, to find out. And, depending on the majority of checks, they will then analyse those proteins.

It doesn't matter what others choose. If you choose right, most of the times you will select what others selected as well,
As long as you don't loose accuracy, you are on the right track.


They do not need us because they cant figure out the correct answer, but because they need a way to solve many images. But this only works if we are trained correctly. And for this we need more training samples and better feedback.
If the only feedback is from other users, that also have no good training, then there is no reason to assume that our training will converge to correct solutions. It will probably converge somewhere, but probably more into some kind of pattern that allows a fast consenus, not a correct one.
Bumblefck
Kerensky Initiatives
#12 - 2016-03-14 19:32:36 UTC
A million monkeys bashing away at a million typewriters for a million years will, by the law of averages, produce the complete works of Bumble Shakespeare.

Perfection is a dish best served like wasabi .

Bumble's Space Log

Kaede Hita
K.H. Holding
#13 - 2016-03-14 21:49:29 UTC  |  Edited by: Kaede Hita
I would love to have a more comprehensive tutorial, even something like a test with some pointers on the errors that I made so I can learn where to look to better myself. Right now I'm not sure where to look to analyse my mistakes and grow from them.

So far at analyst rank 8 I usually only see red filaments. Maybe its just random but, again, I'd like to have some pointers.

Anyways I applaud the initiative as a whole and I want to support it.
I would even watch a 1 hour video explaining the classification for 3 dozens samples, or some reviews on the community statistics by a professional so I can have a better understanding of things...

Keep doing things like this pls
Mortlake
Republic Military School
#14 - 2016-03-14 21:55:54 UTC
It's all cytoplasm to me.

Sometimes you hit the bar and sometimes the bar hits you...