使用 NumPy 实现神经网络进行 MNIST 分类
概述
我们将要实现的是一个简单的由两层全连接层组成的神经网络,输出层使用 softmax 激活函数并使用交叉熵损失函数。
模型权重的更新使用 SGD。
完整代码见:https://github.com/songquanpeng/CV-algorithms/blob/master/ml_models/MLP.py
模型的初始化
除了初始化模型的权重和偏置, 我们还要为每个要学习的参数分别记录其梯度信息。
一般这部分是优化器要存的,我们这里简单起见直接让模型类去存该值。
同时,为了后续的反向传播, 我们还需要记录各层的未经激活的输出和经过激活的输出。
def __init__(self, in_dim, middle_dim, out_dim):
self.weights = [
np.random.randn(in_dim, middle_dim),
np.random.randn(middle_dim, out_dim)
]
self.biases = [
np.zeros(middle_dim),
np.zeros(out_dim)
]
self.weights_grad = deepcopy(self.weights)
self.biases_grad = deepcopy(self.biases)
self.zero_grad()
self.unactivated_outputs = []
self.activated_outputs = []
前向传播的实现
非常直接,没啥好说的,注意两点吧:
- 我们需要记录各层的未经激活的输出和经过激活的输出,尤其注意我们需要存下来网络的输入 x;
- 最后一层的激活函数是 softmax。
def forward(self, x):
feature = x
self.unactivated_outputs = []
self.activated_outputs = [x]
for i, (weight, bias) in enumerate(zip(self.weights, self.biases)):
output = feature @ weight + bias
self.unactivated_outputs.append(output)
if i != len(self.weights) - 1: # middle layer's activation
feature = sigmoid(output)
else: # last layer's activation
feature = softmax(output)
self.activated_outputs.append(feature)
return feature
反向传播的实现
基本思路就是按照链式法则反向层层推导权重的梯度。
我们这里首先求 loss 对各层未经激活的输出的梯度, 然后以此为基础求出该层的权重和偏置的梯度。
这里面比较麻烦的就是 softmax 和交叉熵的偏导的推导了,推导出的公式倒是挺简单。 我这里给出了两种实现:
def softmax(x):
ex = np.exp(x)
sum_ex = np.sum(ex, keepdims=True)
return ex / sum_ex
def d_softmax(x):
# https://towardsdatascience.com/derivative-of-the-softmax-function-and-the-categorical-cross-entropy-loss-ffceefc081d1
softmax_x = softmax(x)
diag_softmax_x = np.diag(softmax_x)
matrix = np.outer(softmax_x, softmax_x.T) # the outer product
jacobi_matrix = - matrix + diag_softmax_x
return jacobi_matrix
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def d_sigmoid(x):
sigmoid_x = sigmoid(x)
return sigmoid_x * (1 - sigmoid_x)
def cross_entropy(y, y_hat):
res = - y * np.log(y_hat)
res = res.sum()
return res
def d_cross_entropy(y, y_hat):
return - y / y_hat
def d_softmax_cross_entropy(y, y_hat):
# https://blog.csdn.net/jasonleesjtu/article/details/89426465
return y_hat - y
def d_softmax_cross_entropy2(y, y_hat, z):
"""
z is the last layer's unactivated output
"""
dce_ds = d_cross_entropy(y, y_hat)
ds_dz = d_softmax(z)
res = []
for i in range(len(z)):
tmp = dce_ds * ds_dz.T[i]
res.append(tmp.sum())
res = np.array(res)
return res
最终的反向传播代码:
def backward(self, y, y_hat):
# z_last is the last layer's unactivated output
z_last = self.unactivated_outputs[-1]
# delta_last is the gradient of the last layer's unactivated output
delta_last = d_softmax_cross_entropy(y, y_hat)
# delta_last2 = d_softmax_cross_entropy2(y, y_hat, z_last)
# assert delta_last == delta_last2, "the two ways should give the same result"
weight_last = self.weights[-1]
# z_middle the middle layer's unactivated output
z_middle = self.unactivated_outputs[-2]
delta_middle = weight_last @ delta_last * d_sigmoid(z_middle)
self.weights_grad[-1] += np.outer(self.activated_outputs[-2], delta_last)
self.biases_grad[-1] += delta_last
self.weights_grad[-2] += np.outer(self.activated_outputs[-3], delta_middle)
self.biases_grad[-2] += delta_middle
随机梯度下降优化器的实现
def update(self, learning_rate=0.0001):
for weight, grad in zip(self.weights, self.weights_grad):
weight -= learning_rate * grad
for bias, grad in zip(self.biases, self.biases_grad):
bias -= learning_rate * grad
模型训练框架的实现
def train(args):
mlp = MLP(args.in_dim, args.middle_dim, args.out_dim)
dataloader = get_MNIST_loader()
all_losses = []
for epoch in range(args.epoch_num):
losses = []
for i, (x, y) in tqdm(enumerate(dataloader), total=len(dataloader)):
x = x.flatten().numpy()
tmp = np.zeros(10)
tmp[y] = 1
y = tmp
y_hat = mlp(x)
loss = cross_entropy(y, y_hat)
losses.append(loss)
# print(f"epoch: {epoch:06} iter: {i:06} loss: {loss:.6f}")
mlp.zero_grad()
mlp.backward(y, y_hat)
mlp.update()
losses = np.array(losses).mean()
print(f"Epoch mean loss: {losses:.6f}")
all_losses.append(losses)
跑起来是这样的:
100%|██████████| 60000/60000 [00:51<00:00, 1174.17it/s]
Epoch mean loss: 4.694023
100%|██████████| 60000/60000 [00:46<00:00, 1279.99it/s]
Epoch mean loss: 2.559349
100%|██████████| 60000/60000 [00:47<00:00, 1271.22it/s]
Epoch mean loss: 2.002460
100%|██████████| 60000/60000 [00:54<00:00, 1103.59it/s]
Epoch mean loss: 1.714876
100%|██████████| 60000/60000 [00:52<00:00, 1138.53it/s]
Epoch mean loss: 1.531857
100%|██████████| 60000/60000 [00:51<00:00, 1162.67it/s]
Epoch mean loss: 1.400958
100%|██████████| 60000/60000 [00:50<00:00, 1193.08it/s]
Epoch mean loss: 1.300468
100%|██████████| 60000/60000 [00:50<00:00, 1191.12it/s]
Epoch mean loss: 1.219341
100%|██████████| 60000/60000 [00:51<00:00, 1163.73it/s]
Epoch mean loss: 1.151839
100%|██████████| 60000/60000 [00:49<00:00, 1204.92it/s]
Epoch mean loss: 1.095015
可以看到模型在收敛了。
Links: numpy-mnist-network