Each participant in the Global Data Competition will receive a personal score and a title associated with that score.  The score is a measure of the participant's competence and experience with data science.  The goal of the Global Data Competition is to increase the data science skills of the participants through training, experience and collaboration with others. Rather than having winners and losers, the score each participant receives is meant to help them assess their current skill level and highlight areas for improvement.

The two phases of the Global Data Competition are scored separately.  Each participant will receive the score that their team receives.

Phase 1:

(Phase 1 ended on Sunday, September 13 at midnight UTC-11, but you can still learn from the materials described below and you are encouraged to participate in Phase 2.)

The task for phase 1 of the competition is to take a set of images of Mars and predict whether there is a volcano depicted within each image.  While this may seem intimidating, don't panic.  Some instructional material is provided below to help you get started.


The Data:

The following data set contains all the data that you will need for phase 1 of the competition.


Download and extract the data to find three folders:

  • volno_train = training data of images without volcanos
  • volyes_train = training data of images with volcanos
  • unknown = images that you will use to predict whether they contain a volcano or not


Instructional Material:

Not sure where to start?  Try watching this presentation by Ben Taylor.


Ben has also provided some sample Python code to help you get started.



The scoring was provided by Gridisc.com.  Participants uploaded their prediction file for scoring.  For each unknown image, the participants determined the probability of it being a volcano image.  

Scoring of submissions was based on Area Under the Curve ("AUC") scores. A perfect score would result if the submission contained a probability of one for each volcano image and a probability of zero for each non-volcano image.  While participants could submit a file that only contained probabilities of one or zero, they were penalized for wrong answers.  In order to maximize the score, it was recommended that participants submit the probability (number between one and zero) that each image is a volcano.

This sample submission file was generated from the sample Python code above.  It scored 84.34%.

Participants had 15 chances to upload a submission and were able to see their score for each submission.  The final submission that each individual or team uploaded was their final score for the competition. The leader board was populated as submission were made.

Final points were determined on a bell curve with 900 points possible. The results of Phase 1 can be found under the Results tab above.


Phase 2:

See the link under Phase 2 > Information for details.