Day One, Session One

Introduction to Artificial Neural Networks

Mark Andrews

What is an artificial neural network?

  • An ANN is a trainable mathematical function that maps inputs to outputs.
  • It is built from simple computational units called artificial neurons arranged in layers.
  • Each neuron combines its inputs using learned weights and applies a non-linear activation.
  • Almost all modern AI systems are ANNs.

The artificial neuron

  • A single neuron takes a vector of inputs \(x\) and produces one scalar output.
  • The weights \(w\) control how strongly each input influences the output.
  • The bias \(b\) shifts the activation threshold independently of the input.
  • The activation function \(\phi\) introduces non-linearity.

\[ \text{output} = \phi\!\left(b + \sum_{k=1}^{K} w_k x_k\right) \]

Diagram of a single neuron

Activation functions

  • Without non-linearity, stacking layers collapses to a single linear transformation.
  • Sigmoid: squashes to \((0,1)\); historically common but prone to vanishing gradients.
  • Tanh: squashes to \((-1,1)\); zero-centred but still saturates.
  • ReLU: \(\phi(z) = \max(0, z)\); fast to compute and avoids saturation for positive inputs.
  • ReLU and its variants are the default choice in modern networks.

From neuron to network

  • A layer contains many neurons all looking at the same inputs in parallel.
  • Each neuron learns a different weighting of those inputs.
  • Layers are stacked so that later layers build on representations formed by earlier ones.

Network mathematics

Two-layer network with \(J\) inputs, \(H\) hidden units, \(K\) outputs:

\[ h_j = \phi\!\left(b_j^{(1)} + \sum_{i=1}^{J} W_{ji}^{(1)}\, x_i\right), \quad j = 1, \ldots, H \]

\[ y_k = \phi\!\left(b_k^{(2)} + \sum_{j=1}^{H} W_{kj}^{(2)}\, h_j\right), \quad k = 1, \ldots, K \]

In matrix form: \(h = \phi(W^{(1)}x + b^{(1)})\), then \(y = \phi(W^{(2)}h + b^{(2)})\).

This session

  • We implement a neuron and the common activation functions from scratch in NumPy.
  • We run a forward pass through a two-layer network by hand, for a single input and then a batch.
  • The goal is a clear computational picture of what a network does before we move to PyTorch.