Neural Networks and the derivatives

December 29, 2024

Yesterday I took my first lesson of the math behind LLMs (Large Language Models). The course is free and it's on youtube, tought by Andrej Karpathy, former OpenAI, Tesla and Stanford AI expert.

The course is both very practical and very computer science-oriented, in fact the first lesson - which is 2.5 hours and i only got through the first hour in one afternoon, after pausing, taking notes, checking things, etc - was a huge recap of derivatives with python.

I am getting all of what I am studying, except the why. Luckily this is math that I was very familiar with when I was in high school, so my buried knowledge is slowly resurfacing, and there's the pleasant addition of code to demonstrate that things work.

Let's go back to the why. I have the feeling that derivatives, the chain rule, the backpropagation are the tools that we'll use through the course to achieve the Neural Network knowledge. However, I've not yet seen how we're going to apply this knowledge. I am sure that another post in the future will show the "a-ha!".

So, I have the feeling that I should've finished the first lesson instead of writing this post, but I was so excited to be part of the AI movement, that I couldn't resist in shouting to the world. I don't want to be an AI user only; I want to understand how it works and why.

If you want to join the group that studies the subject, let me know.