#  Machine Learning from Scratch 

 





 Year offered:  2021 

|

 Link: [Course Website](https://github.com/DrugowitschLab/ML-from-scratch-seminar) 

 

 

 

ML from scratch is a student-led tutorial / seminar series initiated by [Johannes Bill](https://gershmanlab.com/people/johannes.html) and others from [Jan Drugowitsch Lab](https://drugowitschlab.hms.harvard.edu/) at Harvard Medical School. The objective is to teach neuroscience students to learn cutting edge machine learning models by implementing them.

I started participating from 2022, and I prepared the tutorial and led a few seminars in it!

---

## Class Materials:

### From Transformer to LLM: Architecture, Training and Usage

## Transformer Tutorial Series

 ![Attention](/sites/g/files/omnuum11221/files/AttentionSchematics_white-01.png)

 

In this session, we walked through the architecture, training and applications of transformers ([slides](/file_url/168)), the lecture slides covered

- The basic principles of NLP
- Basics of attention mechanism and transformer
- Training language models (language modelling objective)
- Usage of pretrained models (finetuning vs prompting)
- Application of transformer beyond language (vision, audio, music, image generation, game&amp;control)

**Jupyter Notebook Tutorial Series**

We prepared this series of jupyter notebooks for you to gain hands-on experience about transformers from their architecture to the training and usage.

- **Fundamentals of Transformer and Language modelling**
    - [Understanding Attention &amp; Transformer from Scratch](https://colab.research.google.com/drive/1ZuhA6khlWm57WGZ8i38JH-gc5aJrvpvs?usp=sharing)  
        In this tutorial, you will manually implement attention mechanism, and GPT model from scratch to gain a deeper understanding of their structure.
    - [Language modelling and pretrained transformers](https://colab.research.google.com/drive/1zZYzAopL__LW4glruSF9lnZYlEmSVI8j?usp=sharing)  
        In this notebook you will look into the architectures of pretrained transformer (GPT / BERT), and then train a GPT2 model to "speak" the simplified English constructed with [Context Free Generative Grammar](https://en.wikipedia.org/wiki/Context-free_grammar), and observe the learning of syntactical rule and word meaning.
- **Beyond Language:**   
    in the following notebooks, we will demonstrate the flexibility of the transformer model by 
    - [Learn to do arithmetics by sequence modelling.](https://colab.research.google.com/drive/1vO71-o-8-3IrOe44Ha0nsHmUsEGVSC37?usp=sharing)  
        In this notebook, you will train a GPT2 on arithmetic dataset, and let it learn to do arithmetics (partially) by next token prediction.
    - [Image generation by sequence modelling](https://colab.research.google.com/drive/1UHlEbepqdvk68cYV1fvkmWl2TBKXfm8E?usp=sharing).  
        In this notebook, you will train a GPT2-like transformer for generative modelling of MNIST images, by predicting the sequence of patches in an image.
    - [Audio signal classification](https://colab.research.google.com/drive/1O4XHOJyOu3_lyaPHAKJM_XTztrAb7VFP?usp=sharing) (~ 20 min)  
        In this notebook, you will train a transformer on Spoken MNIST dataset, and classify the audio sequences.
    - [Image classification ](https://colab.research.google.com/drive/1JDQQlLMGzo675AfrtkFn1kbuADtVemJz?usp=sharing) (~ 30 min)  
        In this notebook, you will train a transformer on images -- formated as a sequence of patches, and predict the identity of the image.
    - [Music generation by sequence modelling.](https://colab.research.google.com/drive/14zpzLpR4UBIzEQmeaXlMv_mDFYIv3Vht?usp=sharing) (Difficult, training takes hrs)  
        In this notebook, you will train a transformer to predict next note in a music dataset consists of piano rolls. By doing so it could be used to generate classic piano music.
- **Using Large Language Model**  
    Finally we will get a glimpse at the LLMs, by using OpenAI APIs to achieve some useful things
    - [OpenAI API and Chat with PDF](https://colab.research.google.com/drive/19mYEyavBhOnAbEQJQuztXAxWxyYbsQzi?usp=sharing) :  
        In this notebook, you will use the OpenAI API and langchain to build a bot that can chat with a given document e.g. scientific paper . (replicating the functionality of [Chat with PDF](https://www.chatpdf.com/))
- Official [Github repo](https://github.com/Animadversio/TransformerFromScratch)

 ![ChatPDF](/sites/g/files/omnuum11221/files/ChatPDF_Schematics.png)

 

## Related material 

- Attention &amp; Transformers
    - [The Illustrated Transformer – Jay Alammar](/home)
    - [The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar ](http://jalammar.github.io/illustrated-gpt2/)
    - [The Annotated Transformer (harvard.edu)](https://nlp.seas.harvard.edu/2018/04/03/attention.html)
    - [Annotated code of Transformers (labml.ai)](https://nn.labml.ai/transformers/index.html)
- Usage of LLM
    - [Finetuning vs Prompting | Hungyi Lee's Youtube Channel](https://youtu.be/F58vJcGgjt0)

### Understanding Stable Diffusion from "Scratch"

   ![diffusion_proc1.gif](/sites/g/files/omnuum11221/files/styles/hwp_1_1__720x720_scale/public/binxuw/files/diffusion_proc1.jpg?itok=WhaRdum8) 

 

In this session, we walked through all the building blocks of Stable Diffusion ([slides ](/file_url/136)/ [PPTX ](/file_url/140)attached), including

- Principle of Diffusion models.
- Model score function of images with **UNet model**
- Understanding prompt through **contextualized word embedding**
- Let text influence image through **cross attention**
- Improve efficiency by adding an **autoencoder**
- Large scale training.

   ![Stable Diffusion model overview](/sites/g/files/omnuum11221/files/styles/hwp_1_1__960x960_scale/public/binxuw/files/stablediffusion_overview.jpg?itok=fUHE_Ws8) 

 

We prepared the Colab notebooks for you to

- Playing with Stable Diffusion and inspecting the internal architecture of the models. ([Open in Colab](https://colab.research.google.com/drive/1TvOlX2_l4pCBOKjDI672JcMm4q68sKrA?usp=sharing))
- Build your own Stable Diffusion UNet model from scratch in a notebook. (with &lt; 300 lines of codes!) ([Open in Colab](https://colab.research.google.com/drive/1mm67_irYu3qU3hnfzqK5yQC38Fd5UFam?usp=sharing))
- Build a Diffusion model (with UNet + cross attention) and train it to generate MNIST images based on the "text prompt". ([Open in Colab](https://colab.research.google.com/drive/1Y5wr91g5jmpCDiX-RLfWL1eSBWoSuLqO?usp=sharing))
- [Github Repo.](https://github.com/Animadversio/DiffusionFromScratch) [Official Github Page](https://github.com/DrugowitschLab/ML-from-scratch-seminar/tree/master/StableDiffusion)

In the end, we trained, a tiny-tiny diffusion model to generate MNIST digits from numbers

   ![Conditional Diffusion digit 4](/sites/g/files/omnuum11221/files/styles/hwp_1_1__360x360_scale/public/binxuw/files/conddiffusion_digits4.png?itok=BBnp0UsR) 

 

... and a tiny diffusion model to generate faces from facial attributes on CelebA dataset

   ![UNet Sample of Faces](/sites/g/files/omnuum11221/files/styles/hwp_1_1__720x720_scale/public/binxuw/files/samples_unet_sd_face_99.png?itok=lI9ONOrM) 

 

**Related material**

- Diffusion model in general
    - [What are Diffusion Models? | Lil'Log ](https://lilianweng.github.io/posts/2021-07-11-diffusion-models/)
    - [Generative Modeling by Estimating Gradients of the Data Distribution | Yang Song ](https://yang-song.net/blog/2021/score/)
- Stable diffusion
    - Annotated &amp; simplified code: [U-Net for Stable Diffusion (labml.ai)](https://nn.labml.ai/diffusion/stable_diffusion/model/unet.html)
    - Illustrations: [The Illustrated Stable Diffusion – Jay Alammar](https://jalammar.github.io/illustrated-stable-diffusion/)
- Attention &amp; Transformers
    - [The Illustrated Transformer – Jay Alammar](/home)

### Mathematical Foundation of Diffusion Generative Models

 ![Diffusion schematics](/sites/g/files/omnuum11221/files/binxuw/files/diffusion_schematics.png)

 

In this tutorial, we covered the mathematical foundation of diffusion generative models. We aim to give you a solid understanding of

- The score function as the gradient to data distribution
- Score function enables the reversal of forward diffusion process
- Learning the score function by denoising score matching (and its equivalence to explicit score matching)
- Approximate the score function with a neural network.
- Sampling from diffusion models.

with concrete examples in low dimension data (2d) and apply them to high dimensional data (point cloud or images).

## Jupyter / Colab Notebook tutorial series

- **Theory tutoria**l: Mathematical Fundation [Open in Colab Notebook](https://colab.research.google.com/drive/1aSQTgoqmyqGpLI9q7IRDlXXeMdAG-E4X?usp=sharing)
- **Day 1 Coding tutorial**: Diffusion, Reverse Diffusion and Score function [Open in Colab notebook](https://colab.research.google.com/drive/1dol5AXz_oNkFZMrwpDyK6MYnOB4ayEQU?usp=sharing)  
    In this tutorial you will gain more intuition about the score functions by examining the *analytical score function* of a general class of distribution: **Gaussian Mixture model**.   
    You will empirically validate that the exact score functions enabled the reversal of diffusion process and recovers the original data distribution.
    
       ![gmm_score_decompose](/sites/g/files/omnuum11221/files/styles/hwp_1_1__720x720_scale/public/binxuw/files/score_function_gmm_decomposition_square.png?itok=VpXArnMi)
- **Day 2 Coding tutorial**: Denoising Score Matching, and Train Neural Network to Approximate Score [Open in Colab Notebook](https://colab.research.google.com/drive/16ZNcxNo7DJh1yZfFa2ombVd_uZqFTcQe?usp=sharing)  
    In this notebook, you will build a toy neural network model to learn the score function in a few different ways: by supervised learning on the exact score, by **denoising score matching** from the data samples. You will empirically validate that these two methods can both approximate the score of data and be used to recover the original data distribution in reverse diffusion.
    
       ![recovery_distribution](/sites/g/files/omnuum11221/files/styles/hwp_1_1__720x720_scale/public/binxuw/files/diffusion_recover_distribution.png?itok=maius2_2)
- **Solutions** to the coding exercises: [Colab notebook](https://colab.research.google.com/drive/1e2LXHvvufA3thNvdmsEZdLAI5xkjaLVl?usp=sharing)



 

 

---

 Attachments- [  picture\_as\_pdf  mlfs\_tutorial\_nlp\_transformer\_ssl\_updated.pdf ](/sites/g/files/omnuum11221/files/binxuw/files/mlfs_tutorial_nlp_transformer_ssl_updated.pdf)
- [  picture\_as\_pdf  stable\_diffusion\_a\_tutorial.pdf ](/sites/g/files/omnuum11221/files/binxuw/files/stable_diffusion_a_tutorial.pdf)
- [  picture\_as\_pdf  stable\_diffusion\_a\_tutorial.pptx ](/sites/g/files/omnuum11221/files/binxuw/files/stable_diffusion_a_tutorial.pptx)
 
---