BMI 826 / CS 838 Learning Based Methods in Computer Vision

Spring 2019, MW 1:05PM - 2:20PM, 3534 Engineering Hall
Instructor: Yin Li

TA: Zixuan Huang

Computer Vision, art by kirkh.deviantart.com

Course Description

The course focuses on the problems of representation and reasoning for large amounts of visual data. These data include images and videos, medical imaging data and their associated tags or text. The majority of these problems stems from computer vision and machine learning. The content of the course is organized into two main sections. The first section introduces deep learning in the context of computer vision, including its theory, models, practice and systems. In the second part, we will cover topics on visual recognition, such as image classification, object detection, human pose estimation, action recognition, 3D understanding and medical image recognition.

Discussion Group

We will use Piazza. Please post all of your questions on the discussion board so that others may learn from your questions as well. Do not email the professor or TA directly with homework questions.

Prerequisites

Students are strongly encouraged to have knowledge of computer vision (CS 766) or medical image analysis (BMI/CS 767). No prior experience with machine learning is assumed, although previous knowledge of basic machine learning concepts will be helpful. The following skills are necessary for this class:

Programming: Students should have basic proficiency in programming (Python). Projects are to be completed and graded in Python. TA's will support questions about Python.
Math: Linear algebra, vector calculus, and probability.

Grading

Your final grade will be made up from

45% 3 homework assignments (mini-projects) that involve programming
40% 1 course project with several milestones
5% Single-page course write-up
10% 2 in-class quizzes

Most of the assignments and projects are team based. And we do not allow late homework assignments or projects. However, you have three "late days" for the whole course. That is to say, the first 24 hours after the due date and time counts as 1 day, up to 48 hours is two and 72 for the third late day.

These late days are intended to cover unexpected clustering of due dates, travel commitments, interviews, hackathons, etc. Don't ask for extensions to due dates because we are already giving you a pool of late days to manage yourself.

Homework Assignments

The course will consist of 3 homework assignments. The second and third assignments are team based. Teams of 2 students are strongly preferred. In your submission, please clearly identify the contribution of all the team members. Please note that members in the same group will not necessarily get the same grade.

Please post all of your questions on Piazza so that others may learn from your questions as well. Do not email the professor or TA directly with homework questions.

All homeworks are to be submitted by midnight on the due date. Include all the files in a zip file named hwX_yourNetID.zip (where X is the homework number) and upload the zip file to Canvas. Late submissions should be emailed to the TA (and carbon the instructor). Please attach the zip file in your email.

All starter code and assignments will be in Python with the use of various third party libraries. We will make an effort to support MacOS, Windows, and Linux. The course includes a quick python tutorial (optional) and assumes you have enough familiarity with procedural and object-oriented programming languages to complete the projects.

Projects

The final project is research-oriented. It can be a pure vision project or an application of vision technique in the student's own research area. You are expected to implement one (or more) related research papers, or think of some interesting novel ideas and implement them using the techniques discussed in class. Students are encouraged to propose their own project topics. You should work on the project in groups of 2-3. In your submission, please clearly identify the contribution of both group members.

There will be four checkpoints: a project proposal, an intermediate milestone report, a final project report and a project presentation. The details are listed below.

Project Proposal (5%, Due Mar 11th): This will be a single-page document. You will explain what problem you are trying to solve, why you want to solve it, and what are the possible steps to the solution.
Project Mid-Term Report (5%, Due Apr 3rd): This will be a single-page brief summary of current progress, including your current results, the difficulties that arise during the implementation, and how your proposal may have changed in light of current progress.
Project Final Report (15%, Due May 1st): The final report will be a four-page document. You will describe the motivation of the project, the previous literate, your method and the results. You can reuse the materials that are presented in your proposal / mid-term report. Please include your source code in the submission.
Project Presentation (15%, in class): Each team will be allocated a 15-min slot in class. This slot includes a 12-min presentation and a 3-min QA session.

Academic Integrity

This course follows the University of Wisconsin-Madison Code of Academic Integrity. Unless specifically authorized by the instructor, all coursework is to be done by the student working alone. Violations of the rules will not be tolerated.

You are permitted and encouraged to discuss ideas with other students. However, you are expected to implement the core components of each assignment / project on your own. You should not view or edit anyone else's code. You should not post code to Piazza, except for starter code / helper code that isn't related to the core project.

Contact Info and Office Hours

If possible, please use Piazza to ask questions and seek clarifications before emailing the instructor or TA.

Yin: yin[dot]li[at]wisc[dot]edu
Zixuan: zhuang356[at]wisc[dot]edu

Office Hours

Yin, email me for appointments (MSC 6730).

Syllabus

Class Date	Topic	Slide	Reading	Assignment
Computer Vision Meets Machine Learning
Wed, Jan 23	No Class, Instructor Travel
Mon, Jan 28	Introduction to Visual Recognition (Guest Lecture by Prof. Vikas Singh)	See Canvas		Sign up for Piazza
Wed, Jan 30	Class Cancelled, Campus Partial Closure
Mon, Feb 4	Image Processing in Python Tutorial led by the TA			Homework 1 out
Wed, Feb 6	Data Driven Paradigm (Guest Lecture by Prof. Yingyu Liang)		Paper 1, 2
Mon, Feb 11	No Class, Instructor Travel
Deep Models for Visual Learning
Wed, Feb 13	Introduction to Deep Learning Systems (PyTorch/TensorFlow) Led by the TA
Mon, Feb 18	No Class, Instructor Travel			Homework 1 due
Wed, Feb 20	Introduction to Computational Cameras (Guest Lecture by Prof. Mohit Gupta)		Paper
Mon, Feb 25	Introduction to Neural Networks
Wed, Feb 27	Convolutional Neural Networks: Theory
Mon, Mar 4	Convolutional Neural Networks: Practice		Paper 1, 2, 3
Wed, Mar 6	Recurrent Neural Networks		Reading	Quiz 1
Visual Recognition
Mon, Mar 11	Image Classification		Paper	Project proposal due Homework 2 out
Wed, Mar 13	Object Detection: Part I		Paper 1, 2
Mon, Mar 18	No class, Spring Recess
Wed, Mar 20	No class, Spring Recess
Mon, Mar 25	Object Detection: Part II		Paper 1, 2
Wed, Mar 27	Semantic Segmentation		Paper 1, 2
Mon, Apr 1	Human Pose Estimation		Paper 1, 2	Homework 2 due
Wed, Apr 3	Beyond Classification: Vision & Language		Paper 1, 2
Mon, Apr 8	Action Recognition		Paper 1, 2	Project mid-term report due
Wed, Apr 10	3D Scene Understanding		Paper 1, 2
Mon, Apr 15	Deep Generative Models: Part I		Paper 1, 2	Homework 3 out
Wed, Apr 17	Deep Generative Models: Part II		Paper 1, 2
Fri, Apr 19	Introduction to Visual Perception
Mon, Apr 22	Medical Image Recognition		Paper
Wed, Apr 24	Introduction to Deep Reinforcement Learning		Paper	Quiz 2
Project Presentations
Mon, Apr 29	Project Presentations			Homework 3 due
Wed, May 1	Project Presentations
Fri, May 3	Project Presentations and Course Wrap-up			Project report due
	Final Exam Period - not used

Acknowledgments

The materials from this class rely significantly on slides prepared by other instructors, especially many slides are modified from those of Abhinav Gupta, Svetlana Lazebnik and Alexei A. Efros, who in turn uses materials from many people. Each slide set contains acknowledgments. Feel free to use these slides for academic or research purposes, but please maintain all acknowledgments.