BMI / CS 771 Learning Based Methods for Computer Vision

Fall 2023, MW 2:30PM - 3:45PM, 2534 Engineering Hall
Instructor: Yin Li

TA: Cameron Ruggles

Computer Vision, art by kirkh.deviantart.com

Course Description

The course addresses the problems of representation and reasoning for large amounts of visual data, including images and videos, medical imaging data, and their associated tags or text descriptions. We will introduce deep learning in the context of computer vision, and cover topics on visual recognition using deep models, such as image classification, object detection, human pose estimation, action recognition, 3D understanding, and medical image analysis. The course emphasizes the design of vision and learning algorithms and models, as well as their practical implementations.

Prerequisites

Students are strongly encouraged to have knowledge of computer vision or machine learning (such as CS 540) or medical image analysis (such as BMI/CS 567). In addition, the following skills are necessary for this class:

Textbook

Canvas and Piazza

Requirements

Students will be responsible for participating in class and on piazza, completing 4 homework assignments, and completing a project.

Grading

The final grade will be made up from Most of the assignments and projects are team based. We do not allow late homework assignments or projects. However, each student has three "late days" for the whole course. That is to say, the first 24 hours after the due date and time counts as 1 day, up to 48 hours is two and 72 for the third late day. These late days are intended to cover unexpected clustering of due dates, travel commitments, interviews, hackathons, etc. Don't ask for extensions to due dates because we are already giving you a pool of late days to manage yourself.

Homework Assignments

The course will consist of 4 homework assignments. All assignments except the first one are team based. Teams of 2-3 students are preferred. Additional permission from the instructor is needed for a single person team.

Please post all questions on Piazza so that others may learn from those questions as well. Do not email the professor or TA directly with homework questions. All homeworks are to be submitted on Canvas by midnight on the due date. Late submissions should be emailed to the TA (and carbon the instructor).

Couse Project

The final project is research-oriented. It can be a pure computer vision project or an application of existing vision methods in the student's own research area. Students are expected to implement one (or more) related research papers, or think of some interesting novel ideas and implement them using the techniques discussed in class. A team of 2-3 students are encouraged. Permission from the instructor is needed for a single person team.

There will be four checkpoints for the final project: a project proposal, an intermediate milestone report, a final project report and a project presentation. The details are listed below.

Course Write-up

The course write-up will be a document that captures your reflection of the course work, e.g., what you have learned, what are the most interesting findings in the course. The write-up must be completed individually.

Academic Integrity

This course follows the University of Wisconsin-Madison Code of Academic Integrity. Unless specifically authorized by the instructor, all coursework is to be done by the student working alone. Violations of the rules will not be tolerated.

Students are permitted and encouraged to discuss ideas with others. However, the core components of each assignment / project are expected to be implemented by individual student or team. Code, except for starter code / helper code that isn't related to the core componets, should not be posted to Piazza.

Use of generative AI models (such as ChatGPT) is allowed. Students must acknowledge the use in the assignment / project.

Contact Info and Office Hours

If possible, please use Piazza to ask questions and seek clarifications before emailing the instructor or TA. Office Hours Appointments can be also scheduled outside of normal office hours. Please send us an email if you plan to so.

Syllabus

Class Date Topic Slides Reading Assignment
Computer Vision Meets Machine Learning
Wed, Sep 6 Course Introduction / Introduction to Visual Recognition See Canvas Sign up for Piazza
Mon, Sep 11 Data Driven Paradigm for Computer Vision Paper 1, 2
Wed, Sep 13 Image Processing using Python (Tutorial led by TA)
Optional Homework 1 out
Deep Learning
Mon, Sep 18 Introduction to Neural Networks Ch 6 Goodfellow et al.
Wed, Sep 20 Convolutional Neural Networks: Theory Ch 9 Goodfellow et al.
Mon, Sep 25 Convolutional Neural Networks: Practice Paper 1, 2, 3
Wed, Sep 27 PyTorch and Cloud Computing (Tutorial led by TA)
Homework 1 due
Mon, Oct 2 Recurrent Neural Networks and Transformers: Part I Ch 10 Goodfellow et al.
Wed, Oct 4 Recurrent Neural Networks and Transformers: Part II Paper 1, 2
Mon, Oct 9 Advanced Training Ch 8 Goodfellow et al. Homework 2 out
Visual Recognition
Wed, Oct 11 Image Classification and Adversarial Samples: Part I Paper 1, 2
Mon, Oct 16 Image Classification and Adversarial Samples: Part II Paper 1, 2 Project Proposal due
Wed, Oct 18 Object Detection & Instance Segmentation: Part I Ch 6.3.3 Szeliski
Paper 1, 2
Mon, Oct 23 Object Detection & Instance Segmentation: Part II Paper 1, 2
Wed, Oct 25 Semantic Segmentation and Dense Image Labeling Ch 6.4-6.4.3 Szeliski
Paper 1, 2
Mon, Oct 30 Human Pose Estimation Paper 1, 2 Homework 2 due
Homework 3 out
Wed, Nov 1 Beyond Classification: Vision & Language Ch 6.6 Szeliski
Paper 1, 2, 3
Mon, Nov 6 Action Recognition and Video Understanding: Part I Ch 6.5 Szeliski
Paper 1, 2
Wed, Nov 8 Action Recognition and Video Understanding: Part II Paper 1, 2 Project Mid-term Report due
Mon, Nov 13 3D Scene Understanding: Part I (virtual) Ch 13.3, 13.4 Szeliski
Paper 1, 2
Instructor traveling
Wed, Nov 15 3D Scene Understanding: Part II (virtual) Ch 14.6 Szeliski
Paper 1
Instructor traveling
Mon, Nov 20 Medical Image Analysis: Part I Paper Homework 3 due
Wed, Nov 22 No Class; Happy Thanksgiving!
Mon, Nov 27 Medical Image Analysis: Part II Paper 1, 2
Wed, Nov 29 Deep Generative Models: Part I (VAEs and Diffusion Models) Paper 1, 2 Homework 4 out
Mon, Dec 4 Deep Generative Models: Part II (VAEs and GANs) Paper
Wed, Dec 6 Self-supervised Visual Representation Learning Paper 1, 2, 3
Project Presentations
Mon, Dec 11 Project Presentations
Wed, Dec 13 Project Presentations and Course Wrap-up Project report due
Course write-up due
Homework 4 due
Final Exam Period - not used

Acknowledgments

The materials from this class rely significantly on slides prepared by other instructors, especially many slides are modified from those of Abhinav Gupta, Svetlana Lazebnik and Alexei A. Efros, who in turn uses materials from many people. Each slide set contains acknowledgments. Feel free to use these slides for academic or research purposes, but please maintain all acknowledgments.