Deep Learning Using Python

Author

Mark Andrews

This is a hands-on introduction to deep learning in Python using PyTorch. The topics are modular and can be combined to suit different course lengths and emphases.

Foundations

Introduction to Artificial Neural Networks

We implement artificial neurons from scratch using NumPy. We define and plot the common activation functions, then build a simple forward pass through a small network by hand. This gives a concrete computational picture of what a neural network is before we move to PyTorch.

Training Neural Networks

We introduce loss functions and work through the mechanics of training: forward pass, loss computation, backward pass, and parameter update. We cover gradient descent and its variants, and introduce PyTorch’s autograd for automatic differentiation.

Multilayer Perceptrons with PyTorch

We build and train multilayer perceptrons using PyTorch’s nn.Module. The running example is MNIST digit classification. We cover the full training loop, monitoring loss and validation accuracy, and practical use of skorch as a scikit-learn wrapper for PyTorch.

Architectures

Convolutional Neural Networks

We introduce convolutional layers and build CNNs for image classification in PyTorch. We cover Conv2d, MaxPool2d, and BatchNorm2d, and train a CNN on MNIST.

Language Models

Language Models and Transformers

We cover tokenisation, embeddings, and the self-attention mechanism. We work through the transformer architecture piece by piece, building toward the GPT-style decoder used in the following topic.

Implementing a GPT Language Model

We implement a minimal GPT from scratch in PyTorch and train it on a small text corpus. We then turn to the Hugging Face Transformers library and use pre-trained models for text classification and generation.

Extra Topics

The Attention Mechanism

A focused introduction to the attention mechanism and the transformer architecture. We motivate attention through the problems of long-range context in language and images, explain the query/key/value framework, implement scaled dot-product attention, and assemble a transformer block.