Activation Functions and MLP

Activation functions add non-linearity, enabling neural networks to learn complex patterns. MLP (Multi-Layer Perceptron) stacks layers of neurons.

Common activation functions

MLP architecture

Input → Hidden layer(s) → Output. Hidden layers must use non-linear activations; without them, the stack collapses to a single linear map and cannot solve XOR-like boundaries.

Why tanh over tan

Standard tangent has discontinuities and shoots to infinity at 90° — causing exploding gradients. Hyperbolic tanh is smooth, monotonic, and bounded in [−1, +1].

Output activations and loss pairing (Session 2)

TaskOutput activationTypical loss
RegressionLinear / identityMean Squared Error (MSE)
Binary classificationSigmoid → [0, 1]Binary cross-entropy (BCE)
Multi-classSoftmax → distributionCategorical cross-entropy (CCE)

Softmax (the percentage maker)

Raw logits are exponentiated (eliminates negatives, amplifies the winner), then normalized so all class probabilities sum to 1.0. CCE then penalizes confident wrong classes heavily.

flowchart LR I[Input features] --> H1[Hidden + ReLU/tanh] H1 --> H2[Hidden + ReLU/tanh] H2 --> O[Output + softmax/sigmoid]