Neuron: a thing that holds a number

My own intuitive definition:
$$
neurons[i + 1] = sigmoid(neurons[i] \cdot weight[i] + bias[i])
$$
3Blue1Brown’s informal definition:

$$
neuron = sigmoid(a_1 * w_1 + a_2 * w_2 + … + a_n * w_n + bias)
$$

[!Formalizing]
A more compact and mathematical way to write down the formula is:

  1. Organizing the column of a layer of neurons into a vector: $a^{(0)}$
  2. Organize the weight connections into a matrix: $W$, which has a shape of [length of layer 1 * length of layer 2]
  3. Multiply them together using the matrix multiplication property, and add the bias $b$
  4. Wrap around everything an activation function. E.g. $\sigma$ (sigmoid), $ReLU$ (Rectified Linear Unit), or $Softmax$.
    $$
    a^{(1)} = \sigma(Wa^{(0)} + b)
    $$

Code for MNIST handwritten digit recognition:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class Network:
    def __init__(self):
        self.W1 = np.random.randn(16, 784) * 0.01
        self.b1 = np.zeros((16, 1))

        self.W2 = np.random.randn(16, 16) * 0.01
        self.b2 = np.zeros((16, 1))

        self.W3 = np.random.randn(10, 16) * 0.01

        self.b3 = np.zeros((10, 1))

    def feedforward(self, input):
        a1 = relu(np.dot(self.W1, input) + self.b1) # [16 * 784] * [784 * 1] -> [16 * 1]
       
        a2 = relu(np.dot(self.W2, a1) + self.b2) # [16 * 16] * [16 * 1] -> [16 * 1]

        a3 = softmax(np.dot(self.W3, a2) + self.b3) # [10 * 16] * [16 * 1] -> [10 * 1]

        return a3


network = Network()

output = network.feedforward(train_images[0].reshape(784, 1))

print("Output probabilities:", output.flatten())

[!Shortcut]
To get the shape of a weight matrix:
$$[#\ of\ 2nd\ layer\ neurons,\ #\ of\ 1st\ layer\ neurons]$$
The shape of the input layer is aways a vertical matrix of $$[#\ of\ input\ layer\ neurons,\ 1]$$
Their product $$[#\ 2nd\ layer\ neurons,\ #\ 1st\ layer\ neurons] \ \times\ [#\ 1st\ layer\ neurons,\ 1] $$ gives the second layer the shape $$[#\ 2nd\ layer\ neurons,\ 1]$$

[! Idea]
Define function $layer(x) = ReLU(W * x + b)$,
Then a two-layered network is layer(layer(x)) = ReLU(W2 * ReLU (W1 * x + b) + b)