Coming Soon Project Aryabhatta: A new era of AI education.
30 Papers Live • Open Source

30 Foundational AI Papers in 30 Days

From RNNs to Transformers — complete implementations you can run, learn from, and build upon. Every paper, every line of code, explained.

"

If you really learn all of these, you'll know 90% of what matters today

— Ilya Sutskever
30 PAPERS LIVE
350+ ACTIVE USERS
17+ COUNTRIES
42K+ LINES OF CODE

30 Papers, Fully Implemented

Each paper comes with deep explanations, clean code, visualizations, and exercises. Click any card to explore.

Day 01

The Unreasonable Effectiveness of RNNs

Character-level language models that generate Shakespeare, code, and music

NumPy 5 exercises
Day 02

Understanding LSTM Networks

Gates, memory cells, and learning long-term dependencies

PyTorch 5 exercises
Day 03

Recurrent Neural Network Regularization

Dropout, layer norm, and preventing overfitting in sequence models

PyTorch 5 exercises
Day 04

Minimizing Description Length

Compression as the key to intelligence and model selection

Python 5 exercises
Day 05

The MDL Principle Tutorial

Two-part codes, prequential MDL, and normalized maximum likelihood

Python 5 exercises
Day 06

The First Law of Complexodynamics

Information equilibration and evolutionary dynamics in complex systems

Python 5 exercises
Day 07

The Coffee Automaton

Cellular automata, chaos theory, and emergent behavior

Python 5 exercises
Day 08

ImageNet Classification with CNNs

AlexNet — the paper that sparked the deep learning revolution

PyTorch 5 exercises
Day 09

Deep Residual Learning (ResNet)

Skip connections enabling 1000+ layer networks

PyTorch 5 exercises
Day 10

Identity Mappings in ResNets

Pre-activation and improved gradient flow

PyTorch 5 exercises
Day 11

Multi-Scale Context with Dilated Convolutions

Exponentially expanding receptive fields without resolution loss

PyTorch 5 exercises
Day 12

Dropout

A simple way to prevent neural networks from overfitting

PyTorch 5 exercises
Day 13

Attention Is All You Need

The Transformer architecture that revolutionized AI

PyTorch 5 exercises
Day 14

The Annotated Transformer

Line-by-line PyTorch implementation with explanations

PyTorch 5 exercises
Day 15

Bahdanau Attention (NMT)

The original attention mechanism before Transformers

PyTorch 5 exercises
Day 16

Order Matters (Seq2Seq for Sets)

Teaching networks when order doesn't matter in inputs

PyTorch 5 exercises
Day 17

Neural Turing Machines

Differentiable external memory with content-based addressing

PyTorch 5 exercises
Day 18

Pointer Networks

Attention as output — pointing at input elements for variable-size combinatorial problems

PyTorch 5 exercises
Day 19

Relational Reasoning

Learning relationships between objects (Sort-of-CLEVR, Relation Networks)

PyTorch 7 exercises
Day 20

Relational RNNs

Memory as a set of interacting slots — solving problems LSTMs can't

PyTorch 5 exercises
Day 21

Neural Message Passing

Unifying graph neural networks — messages, updates, and readouts for molecular prediction

PyTorch 5 exercises
Day 22

Deep Speech 2

End-to-end speech recognition — replacing the entire ASR pipeline with a single neural network

PyTorch 5 exercises
z
Day 23

Variational Lossy Autoencoder

Solving posterior collapse by limiting the decoder's receptive field

PyTorch 5 exercises
Day 24

GPipe: Efficient Training of Giant Neural Networks

Scaling models beyond memory limits with pipeline parallelism and micro-batching

PyTorch 5 exercises
Day 25

Scaling Laws for Neural Language Models

The "Physics of AI" — predicting the performance of massive models using power laws

PyTorch 5 exercises
p
Day 26

Kolmogorov Complexity

The math of compression and randomness — defining intelligence as the ability to find patterns

Python 5 exercises
Day 27

Machine Super Intelligence

Universal Intelligence (Upsilon) — formalizing the IQ of any agent across all computable environments

Python 5 exercises
Day 28

CS231n: CNNs for Visual Recognition

Conv layers, pooling, and FC networks from scratch — the complete CNN architecture guide

NumPy 5 exercises
Day 29

Proximal Policy Optimization

The algorithm behind RLHF and ChatGPT — simple, stable policy gradients via clipping

PyTorch 5 exercises
Day 30

Deep RL from Human Feedback

Teaching AI via preferences — the breakthrough that brought us ChatGPT

PyTorch 5 exercises

Coming Next

Bonus Papers The Language Model Revolution (BERT, GPT-2, GPT-3...)

Your 30-Day Journey

From sequence models to modern architectures — a complete curriculum.

01

The Foundations

Complete

Days 1-7 · RNNs, LSTMs, regularization, compression, and complexity theory

RNNs LSTMs Dropout MDL Complexity Automata
02

The Deep Learning Explosion

Complete

Days 8-12 · CNNs, residual learning, and the vision revolution

AlexNet ResNet ResNet V2 Dilated Conv Dropout
03

The Transformer Era

Complete

Days 13-16 · Attention mechanisms and sequence-to-sequence learning

Attention Annotated Transformer Bahdanau Order Matters
04

Specialized Architectures

Complete

Days 17-22 · Memory networks, graphs, reasoning, and speech

Neural Turing Machines Pointer Networks Relational Reasoning Relational RNNs MPNNs Deep Speech 2
05

Generative Models & Scale

Complete

Days 23-28 · VAEs, Scaling, Kolmogorov Complexity, SMI, and CNNs

VLAE GPipe Scaling Laws Kolmogorov Complexity Machine Super Intelligence CS231n CNNs
06

Modern Foundations

Complete

Days 29-30 · PPO and the path to ChatGPT (RLHF)

Proximal Policy Optimization (PPO) RLHF

The Language Model Revolution

Coming

Bonus Papers · BERT, GPT-2, GPT-3, Chinchilla

Learn by Building

Every paper comes with complete, runnable implementations. No "left as an exercise" — we build everything from scratch so you truly understand how these systems work.

Production Code

Clean, documented, runs everywhere

5 Exercises Per Paper

With complete solutions

Interactive Notebooks

Run and experiment live

Deep Explanations

Theory meets practice

transformer.py
class MultiHeadAttention(nn.Module):
    """Multi-head attention from scratch."""
    
    def __init__(self, d_model, n_heads):
        super().__init__()
        self.d_k = d_model // n_heads
        self.W_q = nn.Linear(d_model, d_model)
        self.W_k = nn.Linear(d_model, d_model)
        self.W_v = nn.Linear(d_model, d_model)
        
    def forward(self, x):
        # Compute attention scores
        scores = Q @ K.transpose(-2, -1)
        scores = scores / math.sqrt(self.d_k)
        attn = F.softmax(scores, dim=-1)
        return attn @ V

Ready to Master the Foundations?

Join the journey. Learn AI the right way — by building from scratch.