Machine Learning from Scratch

Year offered: 2021

Link: Course Website

ML from scratch is a student-led tutorial / seminar series initiated by Johannes Bill and others from Jan Drugowitsch Lab at Harvard Medical School. The objective is to teach neuroscience students to learn cutting edge machine learning models by implementing them.

I started participating from 2022, and I prepared the tutorial and led a few seminars in it!

Class Materials:

From Transformer to LLM: Architecture, Training and Usage

Transformer Tutorial Series

In this session, we walked through the architecture, training and applications of transformers (slides), the lecture slides covered

The basic principles of NLP
Basics of attention mechanism and transformer
Training language models (language modelling objective)
Usage of pretrained models (finetuning vs prompting)
Application of transformer beyond language (vision, audio, music, image generation, game&control)

Jupyter Notebook Tutorial Series

We prepared this series of jupyter notebooks for you to gain hands-on experience about transformers from their architecture to the training and usage.

Fundamentals of Transformer and Language modelling
- Understanding Attention & Transformer from Scratch
  In this tutorial, you will manually implement attention mechanism, and GPT model from scratch to gain a deeper understanding of their structure.
- Language modelling and pretrained transformers
  In this notebook you will look into the architectures of pretrained transformer (GPT / BERT), and then train a GPT2 model to "speak" the simplified English constructed with Context Free Generative Grammar, and observe the learning of syntactical rule and word meaning.
Beyond Language:
in the following notebooks, we will demonstrate the flexibility of the transformer model by
- Learn to do arithmetics by sequence modelling.
  In this notebook, you will train a GPT2 on arithmetic dataset, and let it learn to do arithmetics (partially) by next token prediction.
- Image generation by sequence modelling.
  In this notebook, you will train a GPT2-like transformer for generative modelling of MNIST images, by predicting the sequence of patches in an image.
- Audio signal classification (~ 20 min)
  In this notebook, you will train a transformer on Spoken MNIST dataset, and classify the audio sequences.
- Image classification (~ 30 min)
  In this notebook, you will train a transformer on images -- formated as a sequence of patches, and predict the identity of the image.
- Music generation by sequence modelling. (Difficult, training takes hrs)
  In this notebook, you will train a transformer to predict next note in a music dataset consists of piano rolls. By doing so it could be used to generate classic piano music.
Using Large Language Model
Finally we will get a glimpse at the LLMs, by using OpenAI APIs to achieve some useful things
- OpenAI API and Chat with PDF :
  In this notebook, you will use the OpenAI API and langchain to build a bot that can chat with a given document e.g. scientific paper . (replicating the functionality of Chat with PDF)
Official Github repo

Related material

Understanding Stable Diffusion from "Scratch"

In this session, we walked through all the building blocks of Stable Diffusion (slides / PPTX attached), including

Principle of Diffusion models.
Model score function of images with UNet model
Understanding prompt through contextualized word embedding
Let text influence image through cross attention
Improve efficiency by adding an autoencoder
Large scale training.

We prepared the Colab notebooks for you to

Playing with Stable Diffusion and inspecting the internal architecture of the models. (Open in Colab)
Build your own Stable Diffusion UNet model from scratch in a notebook. (with < 300 lines of codes!) (Open in Colab)
Build a Diffusion model (with UNet + cross attention) and train it to generate MNIST images based on the "text prompt". (Open in Colab)
Github Repo. Official Github Page

In the end, we trained, a tiny-tiny diffusion model to generate MNIST digits from numbers

... and a tiny diffusion model to generate faces from facial attributes on CelebA dataset

Related material

Diffusion model in general
- What are Diffusion Models? | Lil'Log
- Generative Modeling by Estimating Gradients of the Data Distribution | Yang Song
Stable diffusion
- Annotated & simplified code: U-Net for Stable Diffusion (labml.ai)
- Illustrations: The Illustrated Stable Diffusion – Jay Alammar
Attention & Transformers
- The Illustrated Transformer – Jay Alammar

Mathematical Foundation of Diffusion Generative Models

In this tutorial, we covered the mathematical foundation of diffusion generative models. We aim to give you a solid understanding of

The score function as the gradient to data distribution
Score function enables the reversal of forward diffusion process
Learning the score function by denoising score matching (and its equivalence to explicit score matching)
Approximate the score function with a neural network.
Sampling from diffusion models.

with concrete examples in low dimension data (2d) and apply them to high dimensional data (point cloud or images).

Jupyter / Colab Notebook tutorial series

Theory tutorial: Mathematical Fundation Open in Colab Notebook
Day 1 Coding tutorial: Diffusion, Reverse Diffusion and Score function Open in Colab notebook
In this tutorial you will gain more intuition about the score functions by examining the analytical score function of a general class of distribution: Gaussian Mixture model.
You will empirically validate that the exact score functions enabled the reversal of diffusion process and recovers the original data distribution.
Day 2 Coding tutorial: Denoising Score Matching, and Train Neural Network to Approximate Score Open in Colab Notebook
In this notebook, you will build a toy neural network model to learn the score function in a few different ways: by supervised learning on the exact score, by denoising score matching from the data samples. You will empirically validate that these two methods can both approximate the score of data and be used to recover the original data distribution in reverse diffusion.
Solutions to the coding exercises: Colab notebook

Attachments