Using HPC and AI to Classify Handwritten Digits

This case study explores the MNIST dataset, a benchmark dataset in machine learning and computer vision. MNIST consists of 70,000 grayscale images of handwritten digits (0-9), with 60,000 training examples and 10,000 test examples, each stored as a 28×28 pixel image. Originally introduced by Yann LeCun, Corinna Cortes, and Christopher Burges in the 1990s, MNIST has played a foundational role in the development of neural networks and pattern recognition.

In this case study, we train a convolutional neural network (CNN) on MNIST to demonstrate how deep learning models can automatically extract meaningful features from images. CNNs are particularly effective for image classification tasks because they leverage spatial hierarchies of features, reducing the need for manual feature engineering. By working through this example, participants will gain hands-on experience with deep learning techniques and understand the importance of structured model training for real-world applications.

💡 Tip: Download the case study files here: RAISE-DRI MNIST Case Study

Instructions

  1. Transfer the mnist files to any of the ARC cluster (cedar, narval, beluga, graham).

    On your computer, navigate to the mnist folder and open a terminal.

    Run the following command to transfer the files:

     scp -r mnist USER@whichCLUSTER.alliancecan.ca:~/scratch
    
    • Replace USER with your username.
    • Replace whichCLUSTER with the name of one of the ARC clusters.
  2. Log in to the ARC cluster you selected before.
     ssh USER@whichCLUSTER.alliancecan.ca
    
    • Replace USER with your username
    • Replace whichCLUSTER with the name the ARC cluster you used.
  3. Navigate to the scratch directory.
     cd scratch
     cd mnist
     ls
    
  4. Edit the submit_job.sh script using nano (or any other editor).
    • Fill in your email
    • Fill in your user group (if you are part of multiple groups).
  5. Submit the job script by running the command:
     sbatch submit_job.sh
    

    You may get an error about incorrect line endings. In this case, run the command:

     dos2unix submit_job.sh
    

    This will change the line endings to use Unix line endings.

  6. Check the status of your job by running:
     sq
    
    • PD = pending
    • R = running

    Take note of your job ID, we will need this later to check the output.

  7. Check the output of the job by running:
     cat slurm-XXXX.out
    

    where XXXX is your job ID.

  8. Congratulations! You trained a neural network on a supercomputer!

Next Steps: