BMI / CS 771 Learning Based Methods for Computer Vision
Fall 2023, MW 2:30PM - 3:45PM, 2534 Engineering Hall
TA: Cameron Ruggles
Instructor: Yin Li
Course DescriptionThe course addresses the problems of representation and reasoning for large amounts of visual data, including images and videos, medical imaging data, and their associated tags or text descriptions. We will introduce deep learning in the context of computer vision, and cover topics on visual recognition using deep models, such as image classification, object detection, human pose estimation, action recognition, 3D understanding, and medical image analysis. The course emphasizes the design of vision and learning algorithms and models, as well as their practical implementations.
PrerequisitesStudents are strongly encouraged to have knowledge of computer vision or machine learning (such as CS 540) or medical image analysis (such as BMI/CS 567). In addition, the following skills are necessary for this class:
- Programming: Students should have basic proficiency in programming (Python). Projects are to be completed and graded in Python. Our teaching team will support questions about Python.
- Math: Linear algebra, vector calculus, and probability theory.
- (Optional) Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville (free electronic copy available at the website).
- (Optional) Computer Vision: Algorithms and Applications by Rick Szeliski (free electronic copy available at the website).
Canvas and Piazza
- We will use Canvas for slides and assignments releases, and for homework submission and grading.
- We will use Piazza for online discussion (also linked to Canvas). Students are encouraged to post questions on the discussion board so that others can learn from those questions.
RequirementsStudents will be responsible for participating in class and on piazza, completing 4 homework assignments, and completing a project.
GradingThe final grade will be made up from
- 48% 4 homework assignments (mini-projects) that involve programming
- 40% 1 course project with several milestones
- 10% Single-page course write-up
- 2% Piazza participation
Homework AssignmentsThe course will consist of 4 homework assignments. All assignments except the first one are team based. Teams of 2-3 students are preferred. Additional permission from the instructor is needed for a single person team.
Please post all questions on Piazza so that others may learn from those questions as well. Do not email the professor or TA directly with homework questions. All homeworks are to be submitted on Canvas by midnight on the due date. Late submissions should be emailed to the TA (and carbon the instructor).
Couse ProjectThe final project is research-oriented. It can be a pure computer vision project or an application of existing vision methods in the student's own research area. Students are expected to implement one (or more) related research papers, or think of some interesting novel ideas and implement them using the techniques discussed in class. A team of 2-3 students are encouraged. Permission from the instructor is needed for a single person team.
There will be four checkpoints for the final project: a project proposal, an intermediate milestone report, a final project report and a project presentation. The details are listed below.
- Project Proposal (5%): This will be a single-page document. You will explain what problem you are trying to solve, why you want to solve it, and what are the possible steps to the solution.
- Project Mid-Term Report (5%): This will be a single-page brief summary of current progress, including your current results, the difficulties that arise during the implementation, and how your proposal may have changed in light of current progress.
- Project Final Report (15%): The final report will be a four-page document. You will describe the motivation of the project, the previous literate, your method and the results. You can reuse the materials that are presented in your proposal / mid-term report. Please include your source code in the submission.
- Project Presentation (15%, in class): Each team will be allocated a 12-min slot in class. This slot includes a 10-min presentation and a 2-min QA session.
Course Write-upThe course write-up will be a document that captures your reflection of the course work, e.g., what you have learned, what are the most interesting findings in the course. The write-up must be completed individually.
Academic IntegrityThis course follows the University of Wisconsin-Madison Code of Academic Integrity. Unless specifically authorized by the instructor, all coursework is to be done by the student working alone. Violations of the rules will not be tolerated.
Students are permitted and encouraged to discuss ideas with others. However, the core components of each assignment / project are expected to be implemented by individual student or team. Code, except for starter code / helper code that isn't related to the core componets, should not be posted to Piazza.
Use of generative AI models (such as ChatGPT) is allowed. Students must acknowledge the use in the assignment / project.
Contact Info and Office HoursIf possible, please use Piazza to ask questions and seek clarifications before emailing the instructor or TA.
- Yin: yin[dot]li[at]wisc[dot]edu
- Cameron: ruggles2[at]wisc[dot]edu
- Yin, 12:30pm - 2:30pm Thursday (in-person at MSC 6730 or over Zoom by appointment)
- Cameron, 10:00am - 12:00pm Wednesday (in person at MSC 6749 or over Zoom by appointment)
|Wed, Sep 6||Course Introduction / Introduction to Visual Recognition||See Canvas||Sign up for Piazza|
|Mon, Sep 11||Data Driven Paradigm for Computer Vision||Paper 1, 2|
|Wed, Sep 13||Image Processing using Python (Tutorial led by TA)||Optional||Homework 1 out|
|Mon, Sep 18||Introduction to Neural Networks||Ch 6 Goodfellow et al.|
|Wed, Sep 20||Convolutional Neural Networks: Theory||Ch 9 Goodfellow et al.|
|Mon, Sep 25||Convolutional Neural Networks: Practice||Paper 1, 2, 3|
|Wed, Sep 27||PyTorch and Cloud Computing (Tutorial led by TA)||Homework 1 due|
|Mon, Oct 2||Recurrent Neural Networks and Transformers: Part I||Ch 10 Goodfellow et al.|
|Wed, Oct 4||Recurrent Neural Networks and Transformers: Part II||Paper 1, 2|
|Mon, Oct 9||Advanced Training||Ch 8 Goodfellow et al.||Homework 2 out|
|Wed, Oct 11||Image Classification and Adversarial Samples: Part I||Paper 1, 2|
|Mon, Oct 16||Image Classification and Adversarial Samples: Part II||Paper 1, 2||Project Proposal due|
|Wed, Oct 18||Object Detection & Instance Segmentation: Part I||
Ch 6.3.3 Szeliski
Paper 1, 2
|Mon, Oct 23||Object Detection & Instance Segmentation: Part II||Paper 1, 2|
|Wed, Oct 25||Semantic Segmentation and Dense Image Labeling||
Ch 6.4-6.4.3 Szeliski
Paper 1, 2
|Mon, Oct 30||Human Pose Estimation||Paper 1, 2||Homework 2 due
Homework 3 out
|Wed, Nov 1||Beyond Classification: Vision & Language||
Ch 6.6 Szeliski
Paper 1, 2, 3
|Mon, Nov 6||Action Recognition and Video Understanding: Part I||
Ch 6.5 Szeliski
Paper 1, 2
|Wed, Nov 8||Action Recognition and Video Understanding: Part II||Paper 1, 2||Project Mid-term Report due|
|Mon, Nov 13||3D Scene Understanding: Part I (virtual)||
Ch 13.3, 13.4 Szeliski
Paper 1, 2
|Wed, Nov 15||3D Scene Understanding: Part II (virtual)||
Ch 14.6 Szeliski
|Mon, Nov 20||Medical Image Analysis: Part I||Paper||Homework 3 due|
|Wed, Nov 22||No Class; Happy Thanksgiving!|
|Mon, Nov 27||Medical Image Analysis: Part II||Paper 1, 2|
|Wed, Nov 29||Deep Generative Models: Part I (VAEs and Diffusion Models)||Paper 1, 2||Homework 4 out|
|Mon, Dec 4||Deep Generative Models: Part II (VAEs and GANs)||Paper|
|Wed, Dec 6||Self-supervised Visual Representation Learning||Paper 1, 2, 3|
|Mon, Dec 11||Project Presentations|
|Wed, Dec 13||Project Presentations and Course Wrap-up||Project report due
Course write-up due
Homework 4 due
|Final Exam Period - not used|