BMI / CS 771 Learning Based Methods for Computer Vision

Fall 2025, MW 2:30PM - 3:45PM, 3345 Engineering Hall
Instructor: Yin Li

Computer Vision, art by kirkh.deviantart.com

Course Description

The course addresses the problems of representation and reasoning for large amounts of visual data, including images and videos, medical imaging data, and their associated tags or text descriptions. We will introduce deep learning in the context of computer vision and cover topics on visual recognition using deep models, such as image classification, object detection, human pose estimation, action recognition, 3D understanding, and medical image analysis. The course emphasizes the design of vision and learning algorithms and models, as well as their practical implementations.

Prerequisites

Students are strongly encouraged to have knowledge of computer vision (such as CS 566), or machine learning (such as CS 540), or medical image analysis (such as BMI/CS 567). In addition, the following skills are necessary for this course:

Programming: Students should have basic proficiency in programming (Python). Projects are to be completed and graded in Python. Our teaching team will support questions about Python.
Math: Linear algebra, vector calculus, and probability theory.

Textbook

(Optional) Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville (free electronic copy available on the website).
(Optional) Computer Vision: Algorithms and Applications by Rick Szeliski (free electronic copy available on the website).

Canvas and Piazza

We will use Canvas for slides and assignments releases, and for homework submission and grading.
We will use Piazza for online discussion (also linked to Canvas). Students are encouraged to post questions on the discussion board so that others can learn from those questions.

Requirements

Students will be responsible for participating in class and on Piazza, completing 4 homework assignments, and completing 1 project.

Grading

The final grade will be made up from

48%: 4 homework assignments (mini-projects) that involve programming
40%: 1 course project with several milestones
10%: Single-page course write-up
2%: Piazza participation

Most of the assignments and projects are team-based. We do not allow late homework assignments or projects. However, each student has three "late days" for the whole course. That is to say, the first 24 hours after the due date and time counts as 1 day, up to 48 hours is two and 72 for the third late day. These late days are intended to cover unexpected clustering of due dates, travel commitments, interviews, hackathons, etc. Don't ask for extensions to due dates because we are already giving you a pool of late days to manage yourself.

Homework Assignments

The course will consist of 4 homework assignments. All assignments except the first one are team based. Teams of 2-3 students are preferred. Permission from the instructor is needed for a single-person team.

Please post all questions on Piazza so that others may learn from those questions as well. Do not email the teaching team directly with homework questions. All homeworks are to be submitted on Canvas by midnight on the due date. Late submissions should be emailed to the instructor.

Couse Project

The final project is research-oriented. It can be a pure computer vision project or an application of existing vision methods in the student's own research area. Students are expected to implement one (or more) related research papers, or think of some interesting new ideas and implement them using the techniques discussed in class. A team of 2-3 students are encouraged. Permission from the instructor is needed for a single-person team.

There will be four checkpoints for the final project: a project proposal, an intermediate milestone report, a final project report and a project presentation. The details are listed below.

Project Proposal (5%): This will be a single-page document. You will explain what problem you are trying to solve, why you want to solve it, what are the possible steps to the solution, and how do you plan to evaluate your solution.
Project Mid-Term Report (5%): This will be a single-page summary of current progress, including your current results, the difficulties that arose during the implementation, and how your proposal may have changed in light of current progress.
Project Final Report (15%): The final report will be a four-page document. You will describe the motivation of the project, the previous literate, your method and the results. You can reuse the materials that are presented in your proposal / mid-term report. Please include your source code in the submission.
Project Presentation (15%, in class): Each team will be allocated a 12-min slot in class. This slot includes a 10-min presentation and a 2-min QA session.

Course Write-up

The course write-up will be a document that captures your reflection on the course work, e.g., what you have learned, what are the most interesting findings in the course. The write-up must be completed individually.

Academic Integrity

This course follows the University of Wisconsin-Madison Code of Academic Integrity. Unless specifically authorized by the instructor, all coursework is to be done by the student working alone. Violations of the rules will not be tolerated.

Students are permitted and encouraged to discuss ideas with others. However, the core components of each assignment / project are expected to be implemented by individual student or team. Code, except for starter code / helper code that isn't related to the core componets, should not be posted publicly to Piazza.

Students are permitted and encouraged to use artificial intelligence (AI) tools and applications (such as ChatGPT, Copilot, DALL-E, etc.) as they support the learning objectives of this course. Please be aware students are responsible for the information submitted based on an AI query (i.e. ensure that the AI generated results do not contain misinformation or unethical content). Students must acknowledge the use of AI to conform to this course's expectations.

Contact Info and Office Hours

If possible, please use Piazza to ask questions and seek clarifications before emailing the instructor.

Yin: yin[dot]li[at]wisc[dot]edu

Office Hours

11:00 am - 12:30 pm Tuesday and Wednesday
Morgridge Hall 6538 (walk-in) or via Zoom (by appointment only)

Appointments can be also scheduled outside of normal office hours. Please send me an email if you plan to so.

Syllabus

Class Date	Topic	Slides	Reading	Assignment
Computer Vision Meets Machine Learning
Wed, Sep 3	Course Introduction / Introduction to Visual Recognition	See Canvas		Sign up for Piazza
Mon, Sep 8	Data Driven Paradigm for Computer Vision		Paper 1, 2
Wed, Sep 10	Image Processing using Python / Introduction to PyTorch (Optional Tutorial)		PyTorch Tutorial	Homework 1 out
Deep Learning
Mon, Sep 15	Introduction to Neural Networks Part I		Ch 6 Goodfellow et al.
Wed, Sep 17	Introduction to Neural Networks Part II		Ch 6 Goodfellow et al.
Mon, Sep 22	Convolutional Neural Networks: Theory		Ch 9 Goodfellow et al.
Wed, Sep 24	Convolutional Neural Networks: Practice		Paper 1, 2, 3	Homework 1 due
Mon, Sep 29	Deep Learning on the Cloud (Tutorial)			Homework 2 out
Wed, Oct 1	Recurrent Neural Networks and Transformers: Part I		Ch 10 Goodfellow et al.
Mon, Oct 6	Recurrent Neural Networks and Transformers: Part II		Paper 1, 2
Wed, Oct 8	Advanced Training: Part I		Ch 8 Goodfellow et al.
Mon, Oct 13	Advanced Training: Part II		Ch 8 Goodfellow et al.	Project Proposal Due
Visual Recognition
Wed, Oct 15	Image Classification and Adversarial Samples		Paper 1, 2
Mon, Oct 20	Object Detection & Instance Segmentation: Part I (Virtual)		Ch 6.3.3 Szeliski Paper 1, 2	Instructor traveling Homework 2 due
Wed, Oct 22	Object Detection & Instance Segmentation: Part II (Virtual)		Paper 1, 2	Instructor traveling Homework 3 out
Mon, Oct 27	Semantic Segmentation & Dense Image Labeling: Part I		Ch 6.4-6.4.3 Szeliski Paper 1, 2
Wed, Oct 29	Semantic Segmentation & Dense Image Labeling: Part II		Paper 1, 2, 3
Mon, Nov 3	Beyond Classification: Vision & Language: Part I		Ch 6.6 Szeliski Paper 1, 2
Wed, Nov 5	Beyond Classification: Vision & Language: Part II		Paper 1, 2
Mon, Nov 10	Image and Text Generation: Part I		Paper 1, 2
Wed, Nov 12	Image and Text Generation: Part II		Paper 1, 2	Project Mid-term Report due
Mon, Nov 17	Action Recognition and Video Understanding		Ch 6.5 Szeliski Paper 1, 2
Wed, Nov 19	3D Scene Understanding: Part I		Ch 13.3, 13.4 Szeliski Paper 1, 2	Homework 3 due / Homework 4 out on Nov 21
Mon, Nov 24	3D Scene Understanding: Part II		Ch 14.6 Szeliski Paper 1, 2
Wed, Nov 26	Medical Image Analysis: Part I (virutal)		Paper	Happy Thanksgiving!
Mon, Dec 1	Medical Image Analysis: Part II		Paper 1, 2
Project Presentations
Wed, Dec 3	Project Presentations			Homework 4 due on Dec 5
Mon, Dec 8	Project Presentations
Wed, Dec 10	Project Presentations and Course Wrap-up			Project report due Course write-up due
	Final Exam Period - not used

Acknowledgments

The materials from this class rely significantly on slides prepared by other instructors, especially many slides are modified from those of Abhinav Gupta, Svetlana Lazebnik and Alexei A. Efros, who in turn uses materials from many people. We also thank the support from Google Cloud Education Program.