COMP 6211I: Trustworthy Machine Learning [Spring 2023]

Monday, 13:30-14:50 @ Room 6591

Friday, 9:00-10:20 @ Room 6591




This is an intensive graduate seminar on Trustworthy machine learning. The course covers different topics in emerging research areas related to the broader study of security and privacy in machine learning. Students will learn about attacks against computer systems leveraging machine learning, as well as defense techniques to mitigate such attacks.


The course assumes students already have a basic understanding of machine learning. Students will familiarize themselves with the emerging body of literature from different research communities investigating these questions. The class is designed to help students explore new research directions and applications. Most of the course readings will come from both seminal and recent papers in the field.

Grading Policy


A 1 page summary of reading assigned is due each class (starting from week 2 and onwards). A physical copy should be turned in before the beginning of class. The summary should cover the following: (a) what did the papers do well?, (b) where did the papers fall short?, (c) what did you learn from these papers?, and (d) what questions do you have about the papers?

Research Projects

Students are required to do a project in this class. The goal of the course project is to provide the students an opportunity to explore research directions in trustworthy machine learning. The project should be related to the course content. An expected project consists of

Tentative Schedule and Material

Fri 3/2Overview of Trustworthy Machine Learninglecture_0  
Mon 6/2Machine learning basics part 1lecture_1  
Fri 10/2Machine learning basics part 2lecture_2  
Mon 13/2Machine learning basics part 3lecture_3  
Fri 17/2Machine learning basics part 4lecture_4  
Mon 20/2Exam   
Fri 24/2Test-time intergrity (attack)slidesWhite-box attack:
Goodfellow et al., Explaining and Harnessing Adversarial Examples
Carlinin and Wagner, Towards Evaluating the Robustness of Neural Networks
Moosavi-dezfooli et al., Universal adversarial perturbations
Hard-label black-box attack:
Brendel et al., Decision-based adversarial attacks: reliable attacks against black-box machine learning models
Cheng et al., Query-efficient hard-label black-box attack: an optimization-based approach
Chen et al., HopSkipJumpAttack: A Query-Efficient Decision-Based Attack
Mon 27/2Test-time intergrity (defense)slidesMadry et al., Towards Deep Learning Models Resistant to Adversarial Attacks
Wong et al., Fast is better than Free: Revisiting Adversarial Training
Zhang et al., Theoretically Principled Trade-off between Robustness and Accuracy
Fri 3/3Training-time intergrity (backdoor attack)slidesLiu et al., Trojaning Attack on Neural Networks
Shafahi et al., Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks
Gu et al., BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain
Mon 6/3Training-time intergrity (defense)slidesWang et al., Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks
Wang et al., Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases
Fri 10/3Test-time intergrity (verification) part 1slidesEric and Kolter, Provable defenses against adversarial examples via the convex outer adversarial polytope
Zhang et al., Efficient Neural Network Robustness Certification with General Activation Functions
Zhang et al., General Cutting Planes for Bound-Propagation-Based Neural Network Verification
Mon 13/3Test-time intergrity (verification) part 2slidesCohen et al., Certified Adversarial Robustness via Randomized Smoothing 
Fri 17/3Training-time intergrity (poisoning attack) Koh and Liang, Understanding Black-box Predictions via Influence Functions
Carlini and Terzis, Poisoning and Backdooring Contrastive Learning
Carlini, Poisoning the Unlabeled Dataset of Semi-Supervised Learning
Mon 20/3Confidentiality (data) attackslidesCarlini et al., Extracting Training Data from Large Language Models
Kahla et al., Label-Only Model Inversion Attacks via Boundary Repulsion
Fri 24/3Privacy attacksslidesShokri et al., Membership Inference Attacks against Machine Learning Models
Fredrikson et al., Model inversion attacks that exploit confidence information and basic countermeasures
Choquette-Choo et al., Label-Only Membership Inference Attacks
Mon 27/3Confidentiality (model)slidesJagielski et al., High Accuracy and High Fidelity Extraction of Neural Networks
Tramer et al., Stealing Machine Learning Models via Prediction APIs
Fri 31/3Confidentiality defenseslidesHuang et al., Unlearnable Examples: Making Personal Data Unexploitable
Maini, Dataset Inference: Ownership Resolution in Machine Learning
Mon 3/4FairnessslidesZhao et al., Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints
Dwork et al., Fairness Through Awareness
Caliskan et al., Semantics derived automatically from language corpora contain human-like biases
Fri 7/4Study break   
Mon 10/4Study break   
Fri 14/4Differential privacy part IslidesDwork et al., Calibrating Noise to Sensitivity in Private Dat Analysis
Abadi et al., Deep Learning with Differential Privacy
Mon 17/4Differential privacy part IIslidesPapernot et al., Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data
Mironov, Renyi Differential Privacy
Fri 21/4Interpretability (XAI) part 1slidesSimonyan et al., Deep inside convolutional networks: Visualising image classication models and saliency maps
Selvaraju et al., Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Mon 24/4Interpretability (XAI) part 2slidesRibeiro et al., “Why Should I Trust You?”: Explaining the Predictions of Any Classifier
Lundberg and Lee, A unified approach to interpreting model predictions
Fri 4/28Safety Athalye et al., Synthesizing Robust Adversarial Examples
Xu et al., Adversarial T-shirt! Evading Person Detectors in A Physical World
Mon 1/5Labor day   
Fri 5/5UncertaintyslidesGuo et al., On Calibration of Modern Neural Networks
Minderer et al., Revisiting the Calibration of Modern Neural Networks
Mon 8/5Project Presentation   


There is no required textbook for this course. Some recommended readings are