Protip here: Do it by hand, though when you start getting tired of typing latex you can switch to IDE and let github copilot complete it(it will mostly be incorrect) and then you can go in and fix its mistakes, it still saves a bunch of time. For example:
```
The importtant thing is that the derivative needs to be computed from a number of elements
logits = h@W+b
= DL/Dlogit * W^T -------------> This is the final gradient and note that it is a matrix multiplication or a projection of the logit gradient on the Weight layer
[h12 h22] [DL/DLogit21, DL/DLogit22]]
[h13 h23]
]
= h^T @ DL/DLogit -------------> This is the final gradient and note that it is a matrix multiplication or a projection of the logit gradient on the hidden layer.
``` The importtant thing is that the derivative needs to be computed from a number of elements logits = h@W+b
logits = h@W+b
h =
h11 h12 h13
h21 h21 h23
W = w11 w12
w21 w22
w31 w32
b = b1, b2 and
Logit11 = h11w11+ h12w21 + h13w31 + b1 - Eq.1
Logit12 = h11w12+ h12w22 + h13w32 + b2 - Eq.2
Logit21 = h21w11+ h22w21 + h23w31 + b1 - Eq.3
Logit22 = h21w12+ h22w22 + h23w32 + b2 - Eq.4
DL/Dh11 = DL/DLogit11 * DLogit11/Dh11 + DL/DLogit12 * DLogit12/Dh11 = DL/Dlogit11 * w11 + DL/DLogit12 * w12
DL/Dh12 = DL/DLogit11 * DLogit11/Dh12 + DL/DLogit12 * DLogit12/Dh12 = DL/Dlogit11 * w21 + DL/DLogit12 * w22
DL/Dh13 = DL/DLogit11 * DLogit11/Dh13 + DL/DLogit12 * DLogit12/Dh13 = DL/Dlogit11 * w31 + DL/DLogit12 * w32
DL/Dh21 = DL/DLogit21 * DLogit21/Dh21 + DL/DLogit22 * DLogit22/Dh21 = DL/Dlogit21 * w11 + DL/DLogit22 * w12
DL/Dh22 = DL/DLogit21 * DLogit21/Dh22 + DL/DLogit22 * DLogit22/Dh22 = DL/Dlogit21 * w21 + DL/DLogit22 * w22
DL/Dh23 = DL/DLogit21 * DLogit21/Dh23 + DL/DLogit22 * DLogit22/Dh23 = DL/Dlogit21 * w31 + DL/DLogit22 * w32
DL/Dh = [
] = [[DL/Dlogit11 DL/Dlogit12], @ [[w11 w21 w31], = DL/Dlogit * W^T -------------> This is the final gradient and note that it is a matrix multiplication or a projection of the logit gradient on the Weight layerNow lets compute DL/dW
DL/dW = DL/DLogit * DLogit/DW
DL/DW11 = DL/DLogit11 * DLogit11/DW11 + DL/Dlogit21 * Dlogit21/DW11 = DL/DLogit11 * h11 + DL/Dlogit21 * h21
DL/DW12 = DL/DLogit12 * DLogit12/DW12 + DL/Dlogit22 * Dlogit22/DW12 = DL/DLogit12 * h11 + DL/Dlogit22 * h21
DL/DW21 = DL/DLogit11 * DLogit11/DW21 + DL/Dlogit21 * Dlogit21/DW21 = DL/DLogit11 * h12 + DL/Dlogit21 * h22
DL/DW22 = DL/DLogit12 * DLogit12/DW22 + DL/Dlogit22 * Dlogit22/DW22 = DL/DLogit12 * h12 + DL/Dlogit22 * h22
DL/DW31 = DL/DLogit11 * DLogit11/DW31 + DL/Dlogit21 * Dlogit21/DW31 = DL/DLogit11 * h13 + DL/Dlogit21 * h23
DL/DW32 = DL/DLogit12 * DLogit12/DW32 + DL/Dlogit22 * Dlogit22/DW32 = DL/DLogit12 * h13 + DL/Dlogit22 * h23
DL/DW = [[h11 h21] @ [[DL/DLogit11, DL/DLogit12]
```