Neuron: a thing that holds a number
My own intuitive definition:
$$
neurons[i + 1] = sigmoid(neurons[i] \cdot weight[i] + bias[i])
$$
3Blue1Brown’s informal definition:
$$
neuron = sigmoid(a_1 * w_1 + a_2 * w_2 + … + a_n * w_n + bias)
$$
[!Formalizing]
A more compact and mathematical way to write down the formula is:
- Organizing the column of a layer of neurons into a vector: $a^{(0)}$
- Organize the weight connections into a matrix: $W$, which has a shape of [length of layer 1 * length of layer 2]
- Multiply them together using the matrix multiplication property, and add the bias $b$
- Wrap around everything an activation function. E.g. $\sigma$ (sigmoid), $ReLU$ (Rectified Linear Unit), or $Softmax$.
$$
a^{(1)} = \sigma(Wa^{(0)} + b)
$$
Code for MNIST handwritten digit recognition:
1 | class Network: |
[!Shortcut]
To get the shape of a weight matrix:
$$[#\ of\ 2nd\ layer\ neurons,\ #\ of\ 1st\ layer\ neurons]$$
The shape of the input layer is aways a vertical matrix of $$[#\ of\ input\ layer\ neurons,\ 1]$$
Their product $$[#\ 2nd\ layer\ neurons,\ #\ 1st\ layer\ neurons] \ \times\ [#\ 1st\ layer\ neurons,\ 1] $$ gives the second layer the shape $$[#\ 2nd\ layer\ neurons,\ 1]$$
[! Idea]
Define function $layer(x) = ReLU(W * x + b)$,
Then a two-layered network is layer(layer(x)) = ReLU(W2 * ReLU (W1 * x + b) + b)