Deep Learning Using R

A Workshop

Author

Mark Andrews

This workshop is a hands-on introduction to deep learning in R using the torch package. The topics below are modular: a given delivery will cover a selection depending on length, audience, and emphasis. The foundations topics are core; the architecture and language model topics can be combined in various ways.

Foundations

Introduction to Neural Networks

We implement artificial neurons from scratch using base R. We define and plot the common activation functions, then build a simple forward pass through a small network by hand. This gives a concrete computational picture of what a neural network is before we move to torch.

Training Neural Networks

We introduce torch tensors and autograd, torch’s automatic differentiation engine. We then cover loss functions, the mechanics of gradient descent, and how torch optimizers implement and extend it. The topic closes with train/validation/test splitting and the two most common regularisation techniques: dropout and weight decay.

Multilayer Perceptrons

We build and train multilayer perceptrons using torch’s nn_module. The running example is MNIST digit classification. We cover the full training loop, monitoring loss and validation accuracy, and practical use of luz as a high-level training interface.

Architectures

Convolutional Neural Networks

We introduce convolutional layers and build CNNs for image classification in torch. We cover nn_conv2d, nn_max_pool2d, and nn_batch_norm2d, train a CNN on MNIST, and visualise the learned filters.

Language Models

Language Models and Transformers

We cover tokenisation, embeddings, and the self-attention mechanism. We work through the transformer architecture piece by piece, building toward the GPT-style decoder used in the following topic.

Implementing a GPT Language Model

We implement a minimal GPT from scratch in torch and train it on a small text corpus. We cover text generation with temperature scaling and top-k sampling.

Extra Topics

The Attention Mechanism

A focused introduction to the attention mechanism and the transformer architecture. We motivate attention through the problems of long-range context in language and images, explain the query/key/value framework, implement scaled dot-product attention from scratch, and assemble a transformer block.

Using Pre-trained Transformer Models

We cover options for accessing large pre-trained models from R. Native R tools (hfhub, tok, safetensors) handle tokenization and weight loading without Python. For the full Hugging Face API we show how to call Python’s transformers library via reticulate.