变量定义

循环神经网络 RNN

rnn-cell-back.png

反向传播

rnn-backprop.png

偏微公式

$$ \begin{align*}\displaystyle a^{\langle t \rangle} &= \tanh(W_{ax} x_{\langle t \rangle} + W_{aa} a^{\langle t-1 \rangle} + b_{a})\tag{-} \\[8pt]\displaystyle \frac{\partial \tanh(x)} {\partial x} &= 1 - \tanh^2(x) \tag{-} \\[8pt]\displaystyle {\partial \tanh} &= \partial a_{next} * ( 1 - \tanh^2(W_{ax}x^{\langle t \rangle}+W_{aa} a^{\langle t-1 \rangle} + b_{a})) \tag{0} \\[8pt]\displaystyle {\partial W_{ax}} &= \partial \tanh \cdot x^{\langle t \rangle T}\tag{1} \\[8pt]\displaystyle \partial W_{aa} &= \partial \tanh \cdot a^{\langle t-1 \rangle T}\tag{2} \\[8pt]\displaystyle \partial b_a& = \sum_{batch}\partial \tanh\tag{3} \\[8pt]\displaystyle \partial x^{\langle t \rangle} &= { W_{ax}}^T \cdot \partial \tanh\tag{4} \\[8pt]\displaystyle \partial a_{prev} &= { W_{aa}}^T \cdot \partial \tanh\tag{5}\end{align*} $$

def rnn_cell_backward(da_next, cache):
   """
   Implements the backward pass for the RNN-cell (single time-step).

   Arguments:
    da_next -- Gradient of loss with respect to next hidden state (n_a, m)

   Returns:
    gradients -- python dictionary containing:
	    dx -- Gradients of input data, of shape (n_x, m)
      da_prev -- Gradients of previous hidden state, of shape (n_a, m)
      dWax -- Gradients of input-to-hidden weights, of shape (n_a, n_x)
      dWaa -- Gradients of hidden-to-hidden weights, of shape (n_a, n_a)
      dba -- Gradients of bias vector, of shape (n_a, 1)
   """

    (a_next, a_prev, xt, parameters) = cache
    Wax = parameters["Wax"]
    Waa = parameters["Waa"]
    Wya = parameters["Wya"]
    ba = parameters["ba"]
    by = parameters["by"]
    
    z = np.dot(Wax, xt) + np.dot(Waa, a_prev) + ba
    dtanh = da_next * (1 - np.tanh(z) ** 2)
    dxt = np.dot(Wax.T, dtanh)
    dWax = np.dot(dtanh, xt.T)
    da_prev = np.dot(Waa.T, dtanh)
    dWaa = np.dot(dtanh, a_prev.T)
    dba = np.sum(dtanh, axis=-1)

长短期记忆网络 LSTM

Untitled

遗忘门 $\mathbf{f}_{t}$

<aside> ℹ️ 遗忘门:包含 $0$ 或 $1$ 的张量

公式

$$ \mathbf{f}_f^{\langle t \rangle}=\sigma(\mathbf{W}h \mathbf{h}{t-1} + \mathbf{W}_x\mathbf{x}_t + \mathbf{b}_f) \quad \begin{align*} \mathbf{W}_h \in \mathbb{R}^{n_h\times n_h} \\ \mathbf{W}_x \in \mathbb{R}^{n_h\times n_x} \\ \end{align*} $$

为了方便计算,$\mathbf{W}_h, \mathbf{W}_x$ 结合为$\mathbf{W}_f$

$$ \mathbf{f}_t = \sigma(\mathbf{W}f\cdot[\mathbf{h}{t-1}, \mathbf{x}_t] + \mathbf{b}f) \quad \begin{align*} [\mathbf{h}{t-1}, \mathbf{x}_t] &\in \mathbb{R}^{n_h + n_x}\\ \mathbf{W}_f &\in \mathbb{R}^{n_h \times (n_h + n_x)}\end{align*} $$