Using Pre-trained Transformer Models

Author

Mark Andrews

Abstract

We cover options for accessing large pre-trained transformer models from R. The torch for R package provides the core infrastructure: models loaded via hfhub and safetensors run entirely in R with no Python dependency. For the full Hugging Face API — pipelines, tokenizers, and the Trainer — we use reticulate to call Python’s transformers library directly from R.

Pre-trained models and why they matter

Training from scratch on a small corpus produces a model that knows only what it saw during training. Pre-trained models such as BERT and GPT-2 have been trained on hundreds of billions of tokens and encode far richer representations of language. Rather than training from scratch, we can start from these representations and either use the model directly for inference or fine-tune it on a domain-specific dataset with very little additional compute.

Native R options

The hfhub package

hfhub provides utilities for downloading model files from the Hugging Face Hub into a local cache. No Python is involved: it downloads JSON configs, weight files, and vocabulary files as plain files.

library(hfhub)

# Download a specific file from the Hub
config_path <- hf_hub_download("distilbert-base-uncased", filename = "config.json")
config_path

The tok package

tok provides R bindings to the Rust tokenizers library, the same fast implementation that Hugging Face uses internally. It supports Byte Pair Encoding, WordPiece, and other schemes, and requires no Python.

library(tok)

# Load a pre-trained tokenizer from the Hub
tokenizer <- tokenizer_from_pretrained("bert-base-uncased")

encoded <- tokenizer$encode("Deep learning is powerful.")
encoded$ids
encoded$tokens

Loading model weights natively

With hfhub to fetch files and safetensors to load weights, it is possible to load a pre-trained model entirely in R. The approach is: define the architecture as an nn_module, download the weights, and map them onto the module’s parameter list. This is the fully native path but requires writing the architecture yourself.

library(hfhub)
library(safetensors)

# Download model weights in safetensors format
weights_path <- hf_hub_download("distilbert-base-uncased", filename = "model.safetensors")

# Load as a named list of tensors
weights <- safe_load_file(weights_path)
names(weights)[1:5]   # inspect top-level weight names

The reticulate bridge

For the full Hugging Face API — pipelines, AutoTokenizer, AutoModel, and Trainer — the most complete option is to call Python’s transformers library from R via reticulate. This requires a Python environment with transformers and torch installed. For course participants who prefer to stay entirely in R, the native options above cover tokenization and weight loading.

Pipelines

pipeline bundles a tokenizer, a pre-trained model, and post-processing into a single callable. Models are downloaded automatically on first use.

library(reticulate)

transformers <- import("transformers")

classifier <- transformers$pipeline("sentiment-analysis")
classifier(list("This course is excellent.", "The explanation was very confusing."))

AutoTokenizer and AutoModel

For more control, load the tokenizer and model separately.

tokenizer <- transformers$AutoTokenizer$from_pretrained("distilbert-base-uncased")
model_hf  <- transformers$AutoModel$from_pretrained("distilbert-base-uncased")

tokens  <- tokenizer("Deep learning is powerful.", return_tensors = "pt")
torch_py <- import("torch")

with(torch_py$no_grad(), {
  output <- model_hf(**tokens)
})

output$last_hidden_state$shape    # [1, seq_len, 768]: one vector per token

last_hidden_state contains the contextualised representation of each token. These representations can be used directly as features for downstream tasks.

Text classification

For classification we load a model with a classification head.

clf <- transformers$AutoModelForSequenceClassification$from_pretrained(
  "distilbert-base-uncased-finetuned-sst-2-english"
)

sentences <- list("The food was excellent.", "The service was slow and disappointing.")
inputs    <- tokenizer(sentences, padding = TRUE, return_tensors = "pt")

with(torch_py$no_grad(), {
  logits <- clf(**inputs)$logits
})

predictions <- as.integer(logits$argmax(dim = 1L)) + 1L
labels      <- c("NEGATIVE", "POSITIVE")
labels[predictions]

Fine-tuning

Pre-trained models can be fine-tuned on a new dataset by continuing training with a small learning rate. The Hugging Face Trainer API handles the training loop, evaluation, and checkpointing.

TrainingArguments <- transformers$TrainingArguments
Trainer           <- transformers$Trainer

args <- TrainingArguments(
  output_dir                  = "./results",
  num_train_epochs            = 3L,
  per_device_train_batch_size = 16L,
  learning_rate               = 2e-5
)

trainer <- Trainer(
  model         = clf,
  args          = args,
  train_dataset = train_dataset,
  eval_dataset  = eval_dataset
)

trainer$train()

Fine-tuning a pre-trained model on a domain-specific dataset typically requires far less data and compute than training from scratch, while achieving substantially better performance.