IST 597: Trustworthy Machine Learning [Spring 2026]
Overview
Announcements
Description
This is an intensive graduate seminar on Trustworthy machine learning. The course covers different topics in emerging research areas related to the broader study of security and privacy in machine learning. Students will learn about attacks against computer systems leveraging machine learning, as well as defense techniques to mitigate such attacks.
Prerequisites
The course assumes students already have some basic understanding of machine learning. Students will familiarize themselves with the emerging body of literature from different research communities investigating security&privacy questions in machine learning. The class is designed to help students explore new research directions and applications. Most of the course readings will come from both seminal and recent papers in the field.
Grading Policy
Grades will be computed based on the following factors:
- Paper presentation (30%)
- Paper summaries (10%)
- Class notes & participation (10%)
- Research project (50%)
Final grade cutoff:
- A [93%, 100%]
- A- [90%, 93%)
- B+ [87%, 90%)
- B [83%, 87%)
- B- [80%, 83%)
- C+ [77%, 80%)
- C [70%, 77%)
- D [60%, 70%)
- F [0%, 60%)
Assignments
- Reading summary: A 1 page summary of reading assigned is due each class (starting from week 2 and onwards). A physical copy should be turned in before the beginning of class. The summary should cover the following: (a) what did the papers do well?, (b) where did the papers fall short?, (c) what did you learn from these papers?, and (d) what questions do you have about the papers?
- Paper presentation: starting from week 2, each student will present the papers assigned for reading each week. The student may choose an appropriate format (e.g., slides, interactive demos or code tutorials, …) for this presentation with the only requirements being that the presentation should (a) involve the class in active discussions, (b) cover all papers assigned for reading, and (c) last no more than 1h including discussions.
- Class notes: Another team of students will be charged with writing notes synthesizing the content of the presentation and class discussion.
Research Projects
Students are required to do a project in this class. The goal of the course project is to provide the students an opportunity to explore research directions in trustworthy machine learning. The project should be related to the course content. An expected project consists of
- A novel and sound solution to an interesting problem
- Comprehensive literature review and discussion
- Thorough theoretical/experimental evaluation and comparisons with existing approaches
Late Submission Policy
- All reports are due on Thursday at 11:59 pm (EST).
- Students can submit late with the penalty of 25% deduction for every 12 hours late (up to 2 days).
- After 2 days, no more late submission is allowed.
- Extensions can be granted for special cases (email the instructor)
Tentative Schedule and Material
Date | Topic | Slides | Readings&links | Assignments |
---|
| Overview of Trustworthy Machine Learning | lecture_0 | | |
| Machine learning basics part 1 | lecture_1 | | |
| Machine learning basics part 2 | lecture_2 | | |
| Machine learning basics part 3 | lecture_3 | | |
| Machine learning basics part 4 | lecture_4 | | |
| Generative Models | | | |
| Test-time intergrity (attack) | slides | White-box attack: • Goodfellow et al., Explaining and Harnessing Adversarial Examples • Carlinin and Wagner, Towards Evaluating the Robustness of Neural Networks • Moosavi-dezfooli et al., Universal adversarial perturbations Hard-label black-box attack: • Brendel et al., Decision-based adversarial attacks: reliable attacks against black-box machine learning models • Cheng et al., Query-efficient hard-label black-box attack: an optimization-based approach • Chen et al., HopSkipJumpAttack: A Query-Efficient Decision-Based Attack | |
| Test-time intergrity (defense) | slides | • Madry et al., Towards Deep Learning Models Resistant to Adversarial Attacks • Wong et al., Fast is better than Free: Revisiting Adversarial Training • Zhang et al., Theoretically Principled Trade-off between Robustness and Accuracy | |
| Training-time intergrity (backdoor attack) | slides | • Liu et al., Trojaning Attack on Neural Networks • Shafahi et al., Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks • Gu et al., BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain | |
| Training-time intergrity (defense) | slides | • Wang et al., Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks • Wang et al., Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases | |
| Test-time intergrity (verification) part 1 | slides | • Eric and Kolter, Provable defenses against adversarial examples via the convex outer adversarial polytope • Zhang et al., Efficient Neural Network Robustness Certification with General Activation Functions Option: • Zhang et al., General Cutting Planes for Bound-Propagation-Based Neural Network Verification | |
| Test-time intergrity (verification) part 2 | slides | • Cohen et al., Certified Adversarial Robustness via Randomized Smoothing | |
| Training-time intergrity (poisoning attack) | | • Koh and Liang, Understanding Black-box Predictions via Influence Functions • Carlini and Terzis, Poisoning and Backdooring Contrastive Learning • Carlini, Poisoning the Unlabeled Dataset of Semi-Supervised Learning | |
| Confidentiality (data) attack | slides | • Carlini et al., Extracting Training Data from Large Language Models • Kahla et al., Label-Only Model Inversion Attacks via Boundary Repulsion • Shokri et al., Membership Inference Attacks against Machine Learning Models • Fredrikson et al., Model inversion attacks that exploit confidence information and basic countermeasures • Choquette-Choo et al., Label-Only Membership Inference Attacks | |
| Confidentiality (model) | slides | • Jagielski et al., High Accuracy and High Fidelity Extraction of Neural Networks • Tramer et al., Stealing Machine Learning Models via Prediction APIs | |
| Confidentiality defense | slides | • Huang et al., Unlearnable Examples: Making Personal Data Unexploitable • Maini, Dataset Inference: Ownership Resolution in Machine Learning | |
| Fairness | slides | • Zhao et al., Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints • Dwork et al., Fairness Through Awareness • Caliskan et al., Semantics derived automatically from language corpora contain human-like biases | |
| Watermarking | | | |
| Differential privacy part I | slides | • Dwork et al., Calibrating Noise to Sensitivity in Private Dat Analysis • Abadi et al., Deep Learning with Differential Privacy | |
| Differential privacy part II | slides | • Papernot et al., Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data • Mironov, Renyi Differential Privacy | |
| Interpretability (XAI) part 1 | slides | • Simonyan et al., Deep inside convolutional networks: Visualising image classication models and saliency maps • Selvaraju et al., Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization | |
| Interpretability (XAI) part 2 | slides | • Ribeiro et al., “Why Should I Trust You?”: Explaining the Predictions of Any Classifier • Lundberg and Lee, A unified approach to interpreting model predictions | |
| LLM security | | | |
| Unlearning | | | |
| Uncertainty | slides | • Guo et al., On Calibration of Modern Neural Networks • Minderer et al., Revisiting the Calibration of Modern Neural Networks | |
| Project Presentation | | | |
References
There is no required textbook for this course. Some recommended readings are
- Deep Learning (by Ian Goodfellow, Yoshua Bengio, Aaron Courville)
- Adversarial Robustness for Machine Learning (By Pin-Yu Chen and Cho-Jui Hsieh )