Machine Learning from Scratch
ML from scratch is a student-led tutorial / seminar series initiated by Johannes Bill and others from Jan Drugowitsch Lab at Harvard Medical School. The objective is to teach neuroscience students to learn cutting edge machine learning models by implementing them.
I started participating from 2022, and I prepared the tutorial and led a few seminars in it!
Class Materials:
From Transformer to LLM: Architecture, Training and Usage
Transformer Tutorial Series
In this session, we walked through the architecture, training and applications of transformers (slides), the lecture slides covered
- The basic principles of NLP
- Basics of attention mechanism and transformer
- Training language models (language modelling objective)
- Usage of pretrained models (finetuning vs prompting)
- Application of transformer beyond language (vision, audio, music, image generation, game&control)
Jupyter Notebook Tutorial Series
We prepared this series of jupyter notebooks for you to gain hands-on experience about transformers from their architecture to the training and usage.
- Fundamentals of Transformer and Language modelling
- Understanding Attention & Transformer from Scratch
In this tutorial, you will manually implement attention mechanism, and GPT model from scratch to gain a deeper understanding of their structure. - Language modelling and pretrained transformers
In this notebook you will look into the architectures of pretrained transformer (GPT / BERT), and then train a GPT2 model to "speak" the simplified English constructed with Context Free Generative Grammar, and observe the learning of syntactical rule and word meaning.
- Understanding Attention & Transformer from Scratch
- Beyond Language:
in the following notebooks, we will demonstrate the flexibility of the transformer model by- Learn to do arithmetics by sequence modelling.
In this notebook, you will train a GPT2 on arithmetic dataset, and let it learn to do arithmetics (partially) by next token prediction. - Image generation by sequence modelling.
In this notebook, you will train a GPT2-like transformer for generative modelling of MNIST images, by predicting the sequence of patches in an image. - Audio signal classification (~ 20 min)
In this notebook, you will train a transformer on Spoken MNIST dataset, and classify the audio sequences. - Image classification (~ 30 min)
In this notebook, you will train a transformer on images -- formated as a sequence of patches, and predict the identity of the image. - Music generation by sequence modelling. (Difficult, training takes hrs)
In this notebook, you will train a transformer to predict next note in a music dataset consists of piano rolls. By doing so it could be used to generate classic piano music.
- Learn to do arithmetics by sequence modelling.
- Using Large Language Model
Finally we will get a glimpse at the LLMs, by using OpenAI APIs to achieve some useful things- OpenAI API and Chat with PDF :
In this notebook, you will use the OpenAI API and langchain to build a bot that can chat with a given document e.g. scientific paper . (replicating the functionality of Chat with PDF)
- OpenAI API and Chat with PDF :
- Official Github repo
Related material
- Attention & Transformers
- Usage of LLM
Understanding Stable Diffusion from "Scratch"
In this session, we walked through all the building blocks of Stable Diffusion (slides / PPTX attached), including
- Principle of Diffusion models.
- Model score function of images with UNet model
- Understanding prompt through contextualized word embedding
- Let text influence image through cross attention
- Improve efficiency by adding an autoencoder
- Large scale training.
We prepared the Colab notebooks for you to
- Playing with Stable Diffusion and inspecting the internal architecture of the models. (Open in Colab)
- Build your own Stable Diffusion UNet model from scratch in a notebook. (with < 300 lines of codes!) (Open in Colab)
- Build a Diffusion model (with UNet + cross attention) and train it to generate MNIST images based on the "text prompt". (Open in Colab)
- Github Repo. Official Github Page
In the end, we trained, a tiny-tiny diffusion model to generate MNIST digits from numbers
... and a tiny diffusion model to generate faces from facial attributes on CelebA dataset
Related material
- Diffusion model in general
- Stable diffusion
- Annotated & simplified code: U-Net for Stable Diffusion (labml.ai)
- Illustrations: The Illustrated Stable Diffusion – Jay Alammar
- Attention & Transformers
Mathematical Foundation of Diffusion Generative Models
In this tutorial, we covered the mathematical foundation of diffusion generative models. We aim to give you a solid understanding of
- The score function as the gradient to data distribution
- Score function enables the reversal of forward diffusion process
- Learning the score function by denoising score matching (and its equivalence to explicit score matching)
- Approximate the score function with a neural network.
- Sampling from diffusion models.
with concrete examples in low dimension data (2d) and apply them to high dimensional data (point cloud or images).
Jupyter / Colab Notebook tutorial series
- Theory tutorial: Mathematical Fundation Open in Colab Notebook
Day 1 Coding tutorial: Diffusion, Reverse Diffusion and Score function Open in Colab notebook
In this tutorial you will gain more intuition about the score functions by examining the analytical score function of a general class of distribution: Gaussian Mixture model.
You will empirically validate that the exact score functions enabled the reversal of diffusion process and recovers the original data distribution.
Day 2 Coding tutorial: Denoising Score Matching, and Train Neural Network to Approximate Score Open in Colab Notebook
In this notebook, you will build a toy neural network model to learn the score function in a few different ways: by supervised learning on the exact score, by denoising score matching from the data samples. You will empirically validate that these two methods can both approximate the score of data and be used to recover the original data distribution in reverse diffusion.
- Solutions to the coding exercises: Colab notebook