Satin's blog

Activation Function:

Without activation functions, the whole network is linear, because it is just a huge function where each term linearly add together. Imagine f(x) = w1x1 + w2x2 + w3x3. The result is still a linear function, which cannot approximate complex functions. But activation functions bend the model at every end of the layer, after the weights times previous input and the biases sum up.