Deep Learning Using R
A Workshop
This workshop is a hands-on introduction to deep learning in R using the torch package. The topics below are modular: a given delivery will cover a selection depending on length, audience, and emphasis. The foundations topics are core; the architecture and language model topics can be combined in various ways.
Foundations
Introduction to Neural Networks
We implement artificial neurons from scratch using base R. We define and plot the common activation functions, then build a simple forward pass through a small network by hand. This gives a concrete computational picture of what a neural network is before we move to torch.
Training Neural Networks
We introduce torch tensors and autograd, torch’s automatic differentiation engine. We then cover loss functions, the mechanics of gradient descent, and how torch optimizers implement and extend it. The topic closes with train/validation/test splitting and the two most common regularisation techniques: dropout and weight decay.
Multilayer Perceptrons
We build and train multilayer perceptrons using torch’s nn_module. The running example is MNIST digit classification. We cover the full training loop, monitoring loss and validation accuracy, and practical use of luz as a high-level training interface.
Architectures
Convolutional Neural Networks
We introduce convolutional layers and build CNNs for image classification in torch. We cover nn_conv2d, nn_max_pool2d, and nn_batch_norm2d, train a CNN on MNIST, and visualise the learned filters.
Language Models
Language Models and Transformers
We cover tokenisation, embeddings, and the self-attention mechanism. We work through the transformer architecture piece by piece, building toward the GPT-style decoder used in the following topic.
Implementing a GPT Language Model
We implement a minimal GPT from scratch in torch and train it on a small text corpus. We cover text generation with temperature scaling and top-k sampling.
Extra Topics
The Attention Mechanism
A focused introduction to the attention mechanism and the transformer architecture. We motivate attention through the problems of long-range context in language and images, explain the query/key/value framework, implement scaled dot-product attention from scratch, and assemble a transformer block.
Using Pre-trained Transformer Models
We cover options for accessing large pre-trained models from R. Native R tools (hfhub, tok, safetensors) handle tokenization and weight loading without Python. For the full Hugging Face API we show how to call Python’s transformers library via reticulate.