Day One, Session Three

Multilayer Perceptrons with PyTorch

Mark Andrews

PyTorch

PyTorch is a scientific computing library built around automatic differentiation and GPU support.
It is the dominant research framework for deep learning.
The torch.nn module provides building blocks: layers, loss functions, and activations.
Training is explicit — you write the loop, which keeps the mechanics visible.

nn.Module

Every model in PyTorch is a subclass of nn.Module.
__init__ declares the learnable components.
forward defines the computation.
nn.Module handles parameter registration, GPU transfer, saving and loading weights.

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        return self.fc2(x)

nn.Linear

nn.Linear(in_features, out_features) is one fully-connected layer.
Internally it holds a weight matrix \(W\) of shape \((\text{out}, \text{in})\) and a bias vector \(b\).
On input \(x\) of shape \((\text{batch}, \text{in})\), it computes:

\[y = xW^T + b\]

Both \(W\) and \(b\) have requires_grad=True and are updated during training.

Assembling a network

In __init__, declare each layer as an attribute.
In forward, chain them together with activations in between.
Dimensions must be consistent: the output size of each layer must match the input size of the next.
nn.Sequential handles straight feedforward chains without writing forward explicitly.

Data and DataLoader

Dataset wraps a collection of (input, target) pairs.
DataLoader wraps a dataset and serves it in shuffled mini-batches.
torchvision provides standard image datasets including MNIST.
transforms.ToTensor() converts PIL images to float tensors in \([0,1]\) and reorders axes from HWC to CHW.

The training loop

Four steps, always in this order:

optimizer.zero_grad() — clear accumulated gradients.
loss = criterion(model(X), y) — forward pass and compute loss.
loss.backward() — compute gradients.
optimizer.step() — update parameters.

These four lines are the same regardless of model architecture, dataset, or task.

Evaluation

Call model.eval() before measuring accuracy — disables dropout and any other training-specific behaviour.
Wrap inference with torch.no_grad() to suppress gradient tracking, saving memory and time.
preds = model(X).argmax(dim=1) picks the class with the highest logit for each sample.

skorch

skorch wraps PyTorch models in a scikit-learn compatible interface.
Pass the model class (not an instance) to NeuralNetClassifier.
Call .fit(X_train, y_train) — no loop required.
.score(X_test, y_test) evaluates accuracy.
The result slots directly into GridSearchCV, Pipeline, and other scikit-learn tools.