llm-from-scratch

LLM from Scratch

A project to build and train a Large Language Model (LLM) from scratch, implementing core components and training procedures to understand how modern language models work.

🎯 Objective

The final goal is to train a complete LLM from scratch, scaling to whatever size your hardware allows. This project focuses on understanding the fundamentals of transformer architectures, tokenization, training loops, and model optimization.

This project is a learning exercise to understand LLMs at a fundamental level. The implementation will prioritize clarity and educational value over optimization.

Prerequisites

Python 3.8+
CUDA-capable GPU (recommended for training)
Sufficient RAM/VRAM for your target model size

🛠️ Planned Components (not finalized)

Tokenizer implementation (BPE/WordPiece)
Transformer architecture (attention, feed-forward, layer norm)
Positional encoding
Training loop with gradient accumulation
Data loading and preprocessing pipeline
Model checkpointing and resuming
Inference engine
Model quantization (for deployment)

📚 Resources

Attention Is All You Need - Original Transformer paper
The Illustrated Transformer - Visual guide to transformers
minGPT - Minimal GPT implementation reference

This site is open source. Improve this page.