At the heart of backpropagation is an expression for the partial derivative ∂C/∂w of the cost function C with respect to any weight w (or bias b) in the network. The expression tells us how quickly the cost changes when we change the weights and biases.
[Mini Project: How to program a GPU? | CUDA C/C++](https://youtu.be/GetaI7KhbzM) : there are links to blogs that explains matrix multiplication optimization in GPU |