Deep Learning Using Python

A Two-Day Workshop

Author

Mark Andrews

This workshop is a hands-on introduction to deep learning in Python using PyTorch. The first day covers the foundations: what a neural network computes, how it is trained, and how to build and train multilayer perceptrons in PyTorch. The second day covers the two dominant modern architectures: convolutional neural networks for image data and transformers for text.

Day One

Session One: Introduction to Artificial Neural Networks

We implement artificial neurons from scratch using NumPy. We define and plot the common activation functions, then build a simple forward pass through a small network by hand. This gives a concrete computational picture of what a neural network is before we move to PyTorch.

Session Two: Training Neural Networks

We introduce loss functions and work through the mechanics of training: forward pass, loss computation, backward pass, and parameter update. We cover gradient descent and its variants, and introduce PyTorch’s autograd for automatic differentiation.

Session Three: Multilayer Perceptrons with PyTorch

We build and train multilayer perceptrons using PyTorch’s nn.Module. The running example is MNIST digit classification. We cover the full training loop, monitoring loss and validation accuracy, and practical use of skorch as a scikit-learn wrapper for PyTorch.

Day Two

Session One: Convolutional Neural Networks

We introduce convolutional layers and build CNNs for image classification in PyTorch. We cover Conv2d, MaxPool2d, and BatchNorm2d, and train a CNN on a real image dataset.

Session Two: Language Models and Transformers

We cover tokenisation, embeddings, and the self-attention mechanism. We work through the transformer architecture piece by piece, building toward the GPT-style decoder used in the following session.

Session Three: Implementing GPT Models

We implement a minimal GPT from scratch in PyTorch and train it on a small text corpus. We then turn to the Hugging Face Transformers library and use pre-trained models for text classification and generation.

Extra Topics

Attention Mechanism Fundamentals

A focused introduction to the attention mechanism and the transformer architecture. We motivate attention through the problems of long-range context in language and images, explain the query/key/value framework, implement scaled dot-product attention, and assemble a transformer block.