Activation Functions and MLP

Activation functions add non-linearity, enabling neural networks to learn complex patterns. MLP (Multi-Layer Perceptron) stacks layers of neurons.

Common activation functions

ReLU: fast and widely used in hidden layers.
Sigmoid: maps output to 0-1 range.
Tanh: output between -1 and 1.

MLP architecture

Input → Hidden layer(s) → Output. Hidden layers must use non-linear activations; without them, the stack collapses to a single linear map and cannot solve XOR-like boundaries.

Why tanh over tan

Standard tangent has discontinuities and shoots to infinity at 90° — causing exploding gradients. Hyperbolic tanh is smooth, monotonic, and bounded in [−1, +1].

Output activations and loss pairing (Session 2)

Task	Output activation	Typical loss
Regression	Linear / identity	Mean Squared Error (MSE)
Binary classification	Sigmoid → [0, 1]	Binary cross-entropy (BCE)
Multi-class	Softmax → distribution	Categorical cross-entropy (CCE)

Softmax (the percentage maker)

Raw logits are exponentiated (eliminates negatives, amplifies the winner), then normalized so all class probabilities sum to 1.0. CCE then penalizes confident wrong classes heavily.

flowchart LR I[Input features] --> H1[Hidden + ReLU/tanh] H1 --> H2[Hidden + ReLU/tanh] H2 --> O[Output + softmax/sigmoid]