Coming Soon Project Aryabhatta: A new era of AI education.

30 Papers Live • Open Source

30 Foundational AI Papers in 30 Days

From RNNs to Transformers — complete implementations you can run, learn from, and build upon. Every paper, every line of code, explained.

"

If you really learn all of these, you'll know 90% of what matters today

— Ilya Sutskever

Explore Papers View on GitHub

30 PAPERS LIVE

350+ ACTIVE USERS

17+ COUNTRIES

42K+ LINES OF CODE

The Collection

30 Papers, Fully Implemented

Each paper comes with deep explanations, clean code, visualizations, and exercises. Click any card to explore.

The Unreasonable Effectiveness of RNNs

Character-level language models that generate Shakespeare, code, and music

NumPy 5 exercises

Understanding LSTM Networks

Gates, memory cells, and learning long-term dependencies

PyTorch 5 exercises

Recurrent Neural Network Regularization

Dropout, layer norm, and preventing overfitting in sequence models

PyTorch 5 exercises

Minimizing Description Length

Compression as the key to intelligence and model selection

Python 5 exercises

The MDL Principle Tutorial

Two-part codes, prequential MDL, and normalized maximum likelihood

Python 5 exercises

The First Law of Complexodynamics

Information equilibration and evolutionary dynamics in complex systems

Python 5 exercises

The Coffee Automaton

Cellular automata, chaos theory, and emergent behavior

Python 5 exercises

ImageNet Classification with CNNs

AlexNet — the paper that sparked the deep learning revolution

PyTorch 5 exercises

Deep Residual Learning (ResNet)

Skip connections enabling 1000+ layer networks

PyTorch 5 exercises

Identity Mappings in ResNets

Pre-activation and improved gradient flow

PyTorch 5 exercises

Multi-Scale Context with Dilated Convolutions

Exponentially expanding receptive fields without resolution loss

PyTorch 5 exercises

Dropout

A simple way to prevent neural networks from overfitting

PyTorch 5 exercises

Attention Is All You Need

The Transformer architecture that revolutionized AI

PyTorch 5 exercises

The Annotated Transformer

Line-by-line PyTorch implementation with explanations

PyTorch 5 exercises

Bahdanau Attention (NMT)

The original attention mechanism before Transformers

PyTorch 5 exercises

Order Matters (Seq2Seq for Sets)

Teaching networks when order doesn't matter in inputs

PyTorch 5 exercises

Neural Turing Machines

Differentiable external memory with content-based addressing

PyTorch 5 exercises

Pointer Networks

Attention as output — pointing at input elements for variable-size combinatorial problems

PyTorch 5 exercises

Relational Reasoning

Learning relationships between objects (Sort-of-CLEVR, Relation Networks)

PyTorch 7 exercises

Relational RNNs

Memory as a set of interacting slots — solving problems LSTMs can't

PyTorch 5 exercises

Neural Message Passing

Unifying graph neural networks — messages, updates, and readouts for molecular prediction

PyTorch 5 exercises

Deep Speech 2

End-to-end speech recognition — replacing the entire ASR pipeline with a single neural network

PyTorch 5 exercises

Variational Lossy Autoencoder

Solving posterior collapse by limiting the decoder's receptive field

PyTorch 5 exercises

GPipe: Efficient Training of Giant Neural Networks

Scaling models beyond memory limits with pipeline parallelism and micro-batching

PyTorch 5 exercises

Scaling Laws for Neural Language Models

The "Physics of AI" — predicting the performance of massive models using power laws

PyTorch 5 exercises

Kolmogorov Complexity

The math of compression and randomness — defining intelligence as the ability to find patterns

Python 5 exercises

Machine Super Intelligence

Universal Intelligence (Upsilon) — formalizing the IQ of any agent across all computable environments

Python 5 exercises

CS231n: CNNs for Visual Recognition

Conv layers, pooling, and FC networks from scratch — the complete CNN architecture guide

NumPy 5 exercises

Proximal Policy Optimization

The algorithm behind RLHF and ChatGPT — simple, stable policy gradients via clipping

PyTorch 5 exercises

Deep RL from Human Feedback

Teaching AI via preferences — the breakthrough that brought us ChatGPT

PyTorch 5 exercises

Coming Next

Bonus Papers The Language Model Revolution (BERT, GPT-2, GPT-3...)

The Roadmap

Your 30-Day Journey

From sequence models to modern architectures — a complete curriculum.

01

The Foundations

Complete

Days 1-7 · RNNs, LSTMs, regularization, compression, and complexity theory

RNNs LSTMs Dropout MDL Complexity Automata

02

The Deep Learning Explosion

Complete

Days 8-12 · CNNs, residual learning, and the vision revolution

AlexNet ResNet ResNet V2 Dilated Conv Dropout

03

The Transformer Era

Complete

Days 13-16 · Attention mechanisms and sequence-to-sequence learning

Attention Annotated Transformer Bahdanau Order Matters

04

Specialized Architectures

Complete

Days 17-22 · Memory networks, graphs, reasoning, and speech

Neural Turing Machines Pointer Networks Relational Reasoning Relational RNNs MPNNs Deep Speech 2

05

Generative Models & Scale

Complete

Days 23-28 · VAEs, Scaling, Kolmogorov Complexity, SMI, and CNNs

VLAE GPipe Scaling Laws Kolmogorov Complexity Machine Super Intelligence CS231n CNNs

06

Modern Foundations

Complete

Days 29-30 · PPO and the path to ChatGPT (RLHF)

Proximal Policy Optimization (PPO) RLHF

The Language Model Revolution

Coming

Bonus Papers · BERT, GPT-2, GPT-3, Chinchilla

Why 30u30

Learn by Building

Every paper comes with complete, runnable implementations. No "left as an exercise" — we build everything from scratch so you truly understand how these systems work.

Production Code

Clean, documented, runs everywhere

5 Exercises Per Paper

With complete solutions

Interactive Notebooks

Run and experiment live

Deep Explanations

Theory meets practice

transformer.py

class MultiHeadAttention(nn.Module):
    """Multi-head attention from scratch."""
    
    def __init__(self, d_model, n_heads):
        super().__init__()
        self.d_k = d_model // n_heads
        self.W_q = nn.Linear(d_model, d_model)
        self.W_k = nn.Linear(d_model, d_model)
        self.W_v = nn.Linear(d_model, d_model)
        
    def forward(self, x):
        # Compute attention scores
        scores = Q @ K.transpose(-2, -1)
        scores = scores / math.sqrt(self.d_k)
        attn = F.softmax(scores, dim=-1)
        return attn @ V

Ready to Master the Foundations?

Join the journey. Learn AI the right way — by building from scratch.

Star on GitHub Start Learning