Day One, Session Three

Multilayer Perceptrons with PyTorch

Mark Andrews

PyTorch

  • PyTorch is a scientific computing library built around automatic differentiation and GPU support.
  • It is the dominant research framework for deep learning.
  • The torch.nn module provides building blocks: layers, loss functions, and activations.
  • Training is explicit — you write the loop, which keeps the mechanics visible.

nn.Module

  • Every model in PyTorch is a subclass of nn.Module.
  • __init__ declares the learnable components.
  • forward defines the computation.
  • nn.Module handles parameter registration, GPU transfer, saving and loading weights.
class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        return self.fc2(x)

nn.Linear

  • nn.Linear(in_features, out_features) is one fully-connected layer.
  • Internally it holds a weight matrix \(W\) of shape \((\text{out}, \text{in})\) and a bias vector \(b\).
  • On input \(x\) of shape \((\text{batch}, \text{in})\), it computes:

\[y = xW^T + b\]

  • Both \(W\) and \(b\) have requires_grad=True and are updated during training.

Assembling a network

  • In __init__, declare each layer as an attribute.
  • In forward, chain them together with activations in between.
  • Dimensions must be consistent: the output size of each layer must match the input size of the next.
  • nn.Sequential handles straight feedforward chains without writing forward explicitly.

Data and DataLoader

  • Dataset wraps a collection of (input, target) pairs.
  • DataLoader wraps a dataset and serves it in shuffled mini-batches.
  • torchvision provides standard image datasets including MNIST.
  • transforms.ToTensor() converts PIL images to float tensors in \([0,1]\) and reorders axes from HWC to CHW.

The training loop

Four steps, always in this order:

  1. optimizer.zero_grad() — clear accumulated gradients.
  2. loss = criterion(model(X), y) — forward pass and compute loss.
  3. loss.backward() — compute gradients.
  4. optimizer.step() — update parameters.

These four lines are the same regardless of model architecture, dataset, or task.

Evaluation

  • Call model.eval() before measuring accuracy — disables dropout and any other training-specific behaviour.
  • Wrap inference with torch.no_grad() to suppress gradient tracking, saving memory and time.
  • preds = model(X).argmax(dim=1) picks the class with the highest logit for each sample.

skorch

  • skorch wraps PyTorch models in a scikit-learn compatible interface.
  • Pass the model class (not an instance) to NeuralNetClassifier.
  • Call .fit(X_train, y_train) — no loop required.
  • .score(X_test, y_test) evaluates accuracy.
  • The result slots directly into GridSearchCV, Pipeline, and other scikit-learn tools.