Day One, Session One

Introduction to Artificial Neural Networks

Mark Andrews

What is an artificial neural network?

An ANN is a trainable mathematical function that maps inputs to outputs.
It is built from simple computational units called artificial neurons arranged in layers.
Each neuron combines its inputs using learned weights and applies a non-linear activation.
Almost all modern AI systems are ANNs.

The artificial neuron

A single neuron takes a vector of inputs \(x\) and produces one scalar output.
The weights \(w\) control how strongly each input influences the output.
The bias \(b\) shifts the activation threshold independently of the input.
The activation function \(\phi\) introduces non-linearity.

\[ \text{output} = \phi\!\left(b + \sum_{k=1}^{K} w_k x_k\right) \]

Diagram of a single neuron

Activation functions

Without non-linearity, stacking layers collapses to a single linear transformation.
Sigmoid: squashes to \((0,1)\); historically common but prone to vanishing gradients.
Tanh: squashes to \((-1,1)\); zero-centred but still saturates.
ReLU: \(\phi(z) = \max(0, z)\); fast to compute and avoids saturation for positive inputs.
ReLU and its variants are the default choice in modern networks.

From neuron to network

A layer contains many neurons all looking at the same inputs in parallel.
Each neuron learns a different weighting of those inputs.
Layers are stacked so that later layers build on representations formed by earlier ones.

Network mathematics

Two-layer network with \(J\) inputs, \(H\) hidden units, \(K\) outputs:

\[ h_j = \phi\!\left(b_j^{(1)} + \sum_{i=1}^{J} W_{ji}^{(1)}\, x_i\right), \quad j = 1, \ldots, H \]

\[ y_k = \phi\!\left(b_k^{(2)} + \sum_{j=1}^{H} W_{kj}^{(2)}\, h_j\right), \quad k = 1, \ldots, K \]

In matrix form: \(h = \phi(W^{(1)}x + b^{(1)})\), then \(y = \phi(W^{(2)}h + b^{(2)})\).

This session

We implement a neuron and the common activation functions from scratch in NumPy.
We run a forward pass through a two-layer network by hand, for a single input and then a batch.
The goal is a clear computational picture of what a network does before we move to PyTorch.