A project to build and train a Large Language Model (LLM) from scratch, implementing core components and training procedures to understand how modern language models work.
The final goal is to train a complete LLM from scratch, scaling to whatever size your hardware allows. This project focuses on understanding the fundamentals of transformer architectures, tokenization, training loops, and model optimization.
This project is a learning exercise to understand LLMs at a fundamental level. The implementation will prioritize clarity and educational value over optimization.