Done, Done, and We’re on to the Next One!

The first round of Sunspotter data classification has been completed!

It took around 1,639 registered volunteers (+ an unknown number of anonymous volunteers) 324,465 clicks to compare each image in the set of 12,966 data at least 50 times. So thats an average of 198 clicks/registered volunteer!

Of course in reality, there are those few Sunspotter-aholics that probably did a lion’s share of the work- To everyone that wants to get going on comparing the next set of images, we are working as fast as possible to get the next set of data up!

Example detections using the SMART sunspot group detection algorithm.

Example detections using the SMART sunspot group detection algorithm.

The forthcoming dataset relies on sunspot groups detected by a completely automated algorithm (not relying on human intervention, like the last dataset), called the SolarMonitor Active Region Tracker (SMART; Higgins et al. 2011), and will not be prone to human bias. It will include over two hundred thousand images of sunspot groups. A number of the sunspot groups will overlap with the last dataset, but will be detected and processed in a different way.

As mentioned in a previous post we are using ‘Stereographic Projection‘ techniques to ‘de-smoosh’ the sunspot groups near the solar limb (edge), so that they appear as if they were at the center of the Sun. Stereographic projections are rarely used for images of the Sun (if anyone knows of an example, please tell me!), but they are common in astrophysics). Also, they are commonly used to make maps of the South and North pole of the Earth because although features at the edge of the map become larger by a factor of ~2, they keep shapes the same (the more usual Mercator maps completely distort shape near the poles). Keeping shape the same, or being conformal, is important for Sunspotter, because the shape of a sunspot group is likely to have a big effect on apparent complexity!

In addition to being the first time that anyone has measured the ‘true’ complexity of a large sample sunspot groups, we will now be able to measure the evolution ofsunspot group complexity, and determine how that relates to eruptions.

In the mean time, we will be analysing the previous dataset to determine how complexity relates to other properties of sunspot groups and to solar flare occurrence. Exciting times!

Stay tuned…

A big thank you to all of our volunteers who helped us to complete this awesome dataset!


About Dr. Paul A. Higgins

I am a postdoctoral research fellow in the Astrophysics Research Group at Trinity College Dublin in Ireland. Currently I am a visiting researcher at LMSAL in Palo Alto, CA. I am investigating the causes of solar eruptions. To do this I use image processing and data mining techniques to study the evolution of sunspot groups as they are born, cross the solar disk, producing flares and coronal mass ejections, and then quietly decay and fade away.

7 responses to “Done, Done, and We’re on to the Next One!”

  1. mjtbarrett says :

    Congratulations! Looking forward to the next round 🙂

  2. Art says :

    Was a sunspot compared to 50 other other sunspots once or was a sunspot compared to another sunspot 50 times?

    • Dr. Paul A. Higgins says :

      Good question. Each sunspot was compared to at least 50 other sunspots once. I’m not sure if it could happen that two sunspots could get compared to eachother twice. The comparison pairings are (mostly) random, so I suppose it is possible.

      • Art says :

        I wonder if it would be a better idea to leave the objects for others to classify even after you’ve classified everything in that case. That would mean that if two or three people would compare the same pair of pictures, it would eliminate accidental misclicks and get more accurate data.

        Snapshot Serengeti project works this way. This also means that users have something to do (other than other Zooniverse projects) while the next set of data is under preparation.

  3. Dr. Paul A. Higgins says :

    Hi Art, Thanks for your dedication to the project! It is much appreciated. I think the main reason that we aren’t leaving the data up to be further classified, is because we have already classified and reclassified regions more than necessary to obtain a reliable complexity ranking. Basically we had to stop somewhere so that we did not break one of the Zooniverse rules: make sure people are doing useful work. Also, we have to stop classifying during the analysis phase, as the data set would keep changing while we are plotting the data, etc.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: