BMI 826 Learning Based Methods in Computer Vision
Fall 2021, MW 2:30PM - 3:45PM, 2309 Engineering Hall
TA: Fangzhou Mu
Instructor: Yin Li
Course DescriptionThe course focuses on the problems of representation and reasoning for large amounts of visual data. These data include images and videos, medical imaging data, and their associated tags or text. The majority of these problems stems from computer vision and machine learning. The content of the course is organized into two main sections. The first section introduces deep learning in the context of computer vision, including its theory, models, practice and systems. The second part covers topics on visual recognition, such as image classification, object detection, human pose estimation, action recognition, 3D understanding, and medical image analysis. The course emphasizes on the design of vision and learning algorithms and models, as well as their practical implementations.
Discussion GroupWe will use Piazza. Please post all of your questions on the discussion board so that others may learn from your questions as well. Do not email the professor or TA directly with homework questions.
PrerequisitesStudents are strongly encouraged to have knowledge of computer vision (CS 766) or medical image analysis (BMI/CS 767). No prior experience with machine learning is assumed, although previous knowledge of basic machine learning concepts will be helpful. The following skills are necessary for this class:
- Programming: Students should have basic proficiency in programming (Python). Projects are to be completed and graded in Python. TA's will support questions about Python.
- Math: Linear algebra, vector calculus, and probability theory.
GradingYour final grade will be made up from
- 48% 4 homework assignments (mini-projects) that involve programming
- 40% 1 course project with several milestones
- 12% Single-page course write-up
These late days are intended to cover unexpected clustering of due dates, travel commitments, interviews, hackathons, etc. Don't ask for extensions to due dates because we are already giving you a pool of late days to manage yourself.
Homework AssignmentsThe course will consist of 4 homework assignments. All assignments except the first one are team based. Teams of 2 students are preferred. In your submission, please clearly identify the contribution of all the team members. Please note that members in the same group might not necessarily get the same grade.
Please post all of your questions on Piazza so that others may learn from your questions as well. Do not email the professor or TA directly with homework questions.
All homeworks are to be submitted by midnight on the due date. All files should be included in a zip file named hwX_yourNetID.zip (where X is the homework number) and uploaded to Canvas. Late submissions should be emailed to the TA (and carbon the instructor). Please attach the zip file in your email.
All starter code and assignments will be in Python with the use of various third party libraries. We will make an effort to support MacOS, Windows, and Linux. The course includes a quick python tutorial (optional) and assumes you have enough familiarity with procedural and object-oriented programming languages to complete the projects.
ProjectsThe final project is research-oriented. It can be a pure computer vision project or an application of vision methods in the student's own research area. You are expected to implement one (or more) related research papers, or think of some interesting novel ideas and implement them using the techniques discussed in class. Students are encouraged to propose their own project topics. You should work on the project in groups of 2-3. In your submission, please clearly identify the contribution of both group members.
There will be four checkpoints for the final project: a project proposal, an intermediate milestone report, a final project report and a project presentation. The details are listed below.
- Project Proposal (5%): This will be a single-page document. You will explain what problem you are trying to solve, why you want to solve it, and what are the possible steps to the solution.
- Project Mid-Term Report (5%): This will be a single-page brief summary of current progress, including your current results, the difficulties that arise during the implementation, and how your proposal may have changed in light of current progress.
- Project Final Report (15%): The final report will be a four-page document. You will describe the motivation of the project, the previous literate, your method and the results. You can reuse the materials that are presented in your proposal / mid-term report. Please include your source code in the submission.
- Project Presentation (15%, in class): Each team will be allocated a 15-min slot in class. This slot includes a 12-min presentation and a 3-min QA session.
Course Write-upThe course write-up will be a document that captures your reflection of the course work, e.g., what you have learned, what are the most interesting findings in the course. The write-up must be completed individually. It can be submited as a PDF file or a link to a webpage.
Academic IntegrityThis course follows the University of Wisconsin-Madison Code of Academic Integrity. Unless specifically authorized by the instructor, all coursework is to be done by the student working alone. Violations of the rules will not be tolerated.
You are permitted and encouraged to discuss ideas with other students. However, you are expected to implement the core components of each assignment / project on your own. You should not view or edit anyone else's code. You should not post code to Piazza, except for starter code / helper code that isn't related to the core project.
Contact Info and Office HoursIf possible, please use Piazza to ask questions and seek clarifications before emailing the instructor or TA.
- Yin: yin[dot]li[at]wisc[dot]edu
- Fangzhou: fmu2[at]wisc[dot]edu
- Yin, Thursday 1-5pm, by appointment only (MSC 6730).
- Fangzhou, Friday 1-2pm, by appointment only (MSC 6729).
|Wed, Sep 8||Introduction to Visual Recognition||See Canvas||Sign up for Piazza|
|Mon, Sep 13||Theories of Visual Perception|
|Wed, Sep 15||Data Driven Paradigm for Computer Vision||Paper 1, 2|
|Mon, Sep 20||Image Processing using Python (Tutorial)||Homework 1 out|
|Wed, Sep 22||Introduction to Neural Networks||Ch 6, Deep Learning|
|Mon, Sep 27||Convolutional Neural Networks: Theory||Ch 9, Deep Learning|
|Wed, Sep 29||Convolutional Neural Networks: Practice||Paper 1, 2, 3|
|Mon, Oct 4||Recurrent Neural Networks and Transformers: Part I||Ch 10, Deep Learning||Homework 1 due|
|Wed, Oct 6||Recurrent Neural Networks and Transformers: Part II||Paper|
|Mon, Oct 11||Advanced Training||Ch 8, Deep Learning||Project Proposal Due|
|Wed, Oct 13||Image Classification and Adversarial Samples||Paper 1, 2|
|Mon, Oct 18||Object Detection & Instance Segmentation: Part I||Paper 1, 2, 3||Homework 2 out|
|Wed, Oct 20||Object Detection & Instance Segmentation: Part II||Paper 1, 2, 3|
|Mon, Oct 25||Semantic Segmentation and Dense Image Labeling||Paper 1, 2|
|Wed, Oct 27||Human Pose Estimation||Paper 1, 2|
|Mon, Nov 1||Beyond Classification: Vision & Language||Paper 1, 2|
|Wed, Nov 3||Action Recognition and Video Understanding: Part I||Paper 1, 2||Homework 2 due
Homework 3 out
|Mon, Nov 8||Action Recognition and Video Understanding: Part II||Paper 1, 2|
|Wed, Nov 10||3D Scene Understanding||Paper 1, 2|
|Mon, Nov 15||Mid-term Project Presentation||Mid-term report due|
|Wed, Nov 17||Generating Images and Beyond||Paper 1, 2|
|Mon, Nov 22||Medical Image Analysis: Part I||Paper||Homework 3 due|
|Wed, Nov 24||No Class; Happy Thanksgiving!|
|Mon, Nov 29||Medical Image Analysis: Part II||Paper||Homework 4 out|
|Wed, Dec 1||Self-supervised Visual Representation Learning||Paper 1, 2|
|Mon, Dec 6||First Person Vision|
|Wed, Dec 8||Project Presentations||Homework 4 due at Dec 10|
|Mon, Dec 13||Project / Demo Presentations|
|Wed, Dec 15||Project Presentations and Course Wrap-up||Project report due|
|Final Exam Period - not used|