BMI 826 / CS 838 Learning Based Methods in Computer Vision

Fall 2019, MW 2:30PM - 3:45PM, 3355 Engineering Hall
Instructor: Yin Li

TA: Zixuan Huang

Computer Vision, art by

Course Description

The course focuses on the problems of representation and reasoning for large amounts of visual data. These data include images and videos, medical imaging data and their associated tags or text. The majority of these problems stems from computer vision and machine learning. The content of the course is organized into two main sections. The first section introduces deep learning in the context of computer vision, including its theory, models, practice and systems. In the second part, we will cover topics on visual recognition, such as image classification, object detection, human pose estimation, action recognition, 3D understanding and medical image recognition.

Discussion Group

We will use Piazza. Please post all of your questions on the discussion board so that others may learn from your questions as well. Do not email the professor or TA directly with homework questions.


Students are strongly encouraged to have knowledge of computer vision (CS 766) or medical image analysis (BMI/CS 767). No prior experience with machine learning is assumed, although previous knowledge of basic machine learning concepts will be helpful. The following skills are necessary for this class:


Your final grade will be made up from Most of the assignments and projects are team based. And we do not allow late homework assignments or projects. However, you have three "late days" for the whole course. That is to say, the first 24 hours after the due date and time counts as 1 day, up to 48 hours is two and 72 for the third late day.

These late days are intended to cover unexpected clustering of due dates, travel commitments, interviews, hackathons, etc. Don't ask for extensions to due dates because we are already giving you a pool of late days to manage yourself.

Homework Assignments

The course will consist of 3 homework assignments. The second and third assignments are team based. Teams of 2 students are preferred. In your submission, please clearly identify the contribution of all the team members. Please note that members in the same group will not necessarily get the same grade.

Please post all of your questions on Piazza so that others may learn from your questions as well. Do not email the professor or TA directly with homework questions.

All homeworks are to be submitted by midnight on the due date. All files should be included in a zip file named (where X is the homework number) and uploaded to Canvas. Late submissions should be emailed to the TA (and carbon the instructor). Please attach the zip file in your email.

All starter code and assignments will be in Python with the use of various third party libraries. We will make an effort to support MacOS, Windows, and Linux. The course includes a quick python tutorial (optional) and assumes you have enough familiarity with procedural and object-oriented programming languages to complete the projects.


The final project is research-oriented. It can be a pure vision project or an application of vision techniques in the student's own research area. You are expected to implement one (or more) related research papers, or think of some interesting novel ideas and implement them using the techniques discussed in class. Students are encouraged to propose their own project topics. You should work on the project in groups of 2-3. In your submission, please clearly identify the contribution of both group members.

There will be four checkpoints for the final project: a project proposal, an intermediate milestone report, a final project report and a project presentation. The details are listed below.

Course Write-up

The course write-up will be a document that captures your reflection of the course work, e.g., what you have learned, what are the most interesting findings in the course. The write-up must be completed individually. It can be submited as a PDF file or a link to a webpage.

Academic Integrity

This course follows the University of Wisconsin-Madison Code of Academic Integrity. Unless specifically authorized by the instructor, all coursework is to be done by the student working alone. Violations of the rules will not be tolerated.

You are permitted and encouraged to discuss ideas with other students. However, you are expected to implement the core components of each assignment / project on your own. You should not view or edit anyone else's code. You should not post code to Piazza, except for starter code / helper code that isn't related to the core project.

Contact Info and Office Hours

If possible, please use Piazza to ask questions and seek clarifications before emailing the instructor or TA. Office Hours


Class Date Topic Slide Reading Assignment
Computer Vision Meets Machine Learning
Wed, Sep 4 Introduction to Visual Recognition See Canvas Sign up for Piazza
Mon, Sep 9 Theories of Visual Perception
Wed, Sep 11 Image Processing using Python
Tutorial led by TA
Homework 1 out
Mon, Sep 16 Data Driven Paradigm Paper 1, 2
Deep Models for Visual Learning
Wed, Sep 18 Introduction to Neural Networks Ch 6, Deep Learning
Mon, Sep 23 Convolutional Neural Networks: Theory Ch 9, Deep Learning
Wed, Sep 25 Convolutional Neural Networks: Practice Paper 1, 2, 3
Mon, Sep 30 Recurrent Neural Networks Ch 10, Deep Learning Homework 1 due
Wed, Oct 2 Advanced Training Ch 8, Deep Learning
Mon, Oct 7 Deep Learning Systems (Tutorial) Quiz 1
Visual Recognition
Wed, Oct 9 Image Classification & Adversarial Samples Paper 1, 2 Project proposal due
Homework 2 out
Mon, Oct 14 Object Detection & Instance Segmentation: Part I Paper 1, 2, 3
Wed, Oct 16 Object Detection & Instance Segmentation: Part II Paper 1, 2, 3
Mon, Oct 21 Semantic Segmentation Paper 1 2
Wed, Oct 23 Human Pose Estimation Paper 1, 2
Mon, Oct 28 Beyond Classification: Vision & Language Paper 1, 2
Wed, Oct 30 Action Recognition Paper 1, 2 Homework 2 due
Mon, Nov 4 3D Scene Understanding Paper 1, 2
Wed, Nov 6 Deep Generative Models: Part I Paper 1, 2 Mid-term report due
Homework 3 out
Mon, Nov 11 Deep Generative Models: Part II Paper 1, 2
Wed, Nov 13 Medical Image Recognition Paper
Mon, Nov 18 Deep Learning for Medical Imaging
(Guest lecture: Prof. Guanghong Chen)
Wed, Nov 20 Introduction to Deep Reinforcement Learning Paper
Mon, Nov 25 Self-supervised Visual Learning Paper 1, 2 Quiz 2
Wed, Nov 27 No Class; Happy Thanksgiving!
Mon, Dec 2 First Person Vision
Project Presentations
Wed, Dec 4 Project Presentations HW3 due at Dec 6th
Mon, Dec 9 Project / Demo Presentations
Wed, Dec 11 Project Presentations and Course Wrap-up Project report due
Final Exam Period - not used


We thank Google Cloud for providing the computing resources. The materials from this class rely significantly on slides prepared by other instructors, especially many slides are modified from those of Abhinav Gupta, Svetlana Lazebnik and Alexei A. Efros, who in turn uses materials from many people. Each slide set contains acknowledgments. Feel free to use these slides for academic or research purposes, but please maintain all acknowledgments.