Eccv 2014 Latex Template Assignment


CS2770: Computer Vision, Spring 2017

Location: Sennott Square 5313
Time: Tuesday and Thursday, 9:30am - 10:45am
Instructor: Adriana Kovashka (email: kovashka AT cs DOT pitt DOT edu; use "CS2770" at the beginning of the subject line)
Office: Sennott Square 5325
Office hours: Tuesday and Thursday, 3:30pm - 5:30pm
TA: Keren Ye (email: yekeren AT cs DOT pitt DOT edu; use "CS2770" at the beginning of the subject line)
TA's office hours: Tuesday 4pm - 6pm and Wednesday 10am - 12pm, Sennott Square 5501


Course description: In this class, students will learn about modern computer vision. The first part of the course will cover fundamental concepts such as image formation and filtering, edge detection, texture description, feature extraction and matching, grouping and clustering, model fitting, and combining multiple views. A crash course in machine learning will be included, in preparation for the second course chapter on visual recognition. We will study classic and modern approaches in object detection, deep learning, mid-level representations, active, transfer, and unsupervised learning, tracking, and human pose and activity recognition. The format will include lectures, homework assignments, exams, and a course project.

Prerequisites: CS1501 and MATH 0280 (or equivalent). The expectation is that you can program and analyze the efficiency and performance of programs. Further, some experience with linear algebra (matrix and vector operations) is recommended. For some parts of the course, it would be useful to remember basic calculus and how to compute derivatives. Some experience with probability and statistics would also be useful.

Piazza: Sign up for it here. Note that we will use Piazza for two main purposes: (1) for announcements, and (2) for classmate-to-classmates discussion of homework problems, etc. The instructor will monitor Piazza infrequently. The time when you should ask the instructor or TA questions is during office hours.

Programming languages: For homework assignments, you can use Matlab or Python. For the course project, you can use any language of your choice.




Grading will be based on the following components:
  • Homework assignments (3 assignments x 10% each = 30%)
  • Paper reviews (5%)
  • Course project (30%)
  • Midterm and final exam (20% midterm + 10% final = 30%)
  • Participation (5%)

Homework Submission Mechanics

You will submit your homework using CourseWeb. Navigate to the CourseWeb page for CS2770, then click on "Assignments" (on the left) and the corresponding homework ID. Your written answers should be a single .pdf/.doc/.docx file. Your source code should be a single zip file (also including images/results if requested). Name the file YourFirstName_YourLastName.[extension]. Please comment your code! Homework is due at 11:59pm on the due date. Grades will be posted on CourseWeb.

Paper reviews

During the second part of the course, you will be asked to write 1-page paper reviews for three papers (of your choice) from the assigned reading. Paper reviews are due at 11:59pm on CourseWeb, and should follow the same naming conventions as homework assignments. Please submit only .pdf or .doc/.docx files, zipped in a .zip file. Please answer the following questions in each paper review:
  1. Summarize what this paper aims to do, and what its main contribution is.
  2. Summarize the proposed approach.
  3. Summarize the experimental validation of the approach.
  4. What are three advantages of the proposed approach?
  5. What are three disadvantages or weaknesses of the approach or experimental validation?
  6. Suggest one possible extension of this approach, i.e. one idea for future work.


There will be one in-class midterm exam, and a final exam which will focus on material from the latter part of the course. There will be no make-up exams unless you or a close relative is seriously ill!


Students are expected to regularly attend the class lectures, and should actively engage in in-class discussions. Attendance will not be taken, but keep in mind that if you don't attend, you cannot participate. You can actively participate by, for example, responding to the instructor's or others' questions, asking questions or making meaningful remarks and comments about the lecture, and answering others' questions on Piazza. You are also encouraged to bring in relevant articles you saw in the news.

Late Policy

On your programming assignments only, you get 3 "free" late days counted in minutes, i.e., you can submit a total of 72 hours late. For example, you can submit one homework 12 hours late, and another 60 hours late. Once you've used up your free late days, you will incur a penalty of 25% from the total assignment credit possible for each late day. A late day is anything from 1 minute to 24 hours. Note this policy does not apply to components of the project.

Collaboration Policy and Academic Honesty

You will do your work (exams and homework) individually. The only exception is the project, which can be done in pairs. The work you turn in must be your own work. You are allowed to discuss the assignments with your classmates, but do not look at code they might have written for the assignments, or at their written answers. You are also not allowed to search for code on the internet, use solutions posted online unless you are explicitly allowed to look at those, or to use Matlab's or Python's implementation if you are asked to write your own code. When in doubt about what you can or cannot use, ask the instructor! Plagiarism will cause you to fail the class and receive disciplinary penalty. Please consult the University Guidelines on Academic Integrity.

Note on Disabilities

If you have a disability for which you are or may be requesting an accommodation, you are encouraged to contact both your instructor and Disability Resources and Services (DRS), 140 William Pitt Union, (412) 648-7890,, (412) 228-5347 for P3 ASL users, as early as possible in the term. DRS will verify your disability and determine reasonable accommodations for this course.

Note on Medical Conditions

If you have a medical condition which will prevent you from doing a certain assignment, you must inform the instructor of this before the deadline. You must then submit documentation of your condition within a week of the assignment deadline.

Statement on Classroom Recording

To ensure the free and open discussion of ideas, students may not record classroom lectures, discussion and/or activities without the advance written permission of the instructor, and any such recording properly approved in advance can be used solely for the student's own private use.



A project can be:
  • Type A: design of a new method for an existing problem or an application of techniques we studied in class (or another method) to a new problem that we have not discussed in class
  • Type B: experimental comparison of a number of existing techniques on a known problem and detailed discussion and analysis of the results
  • Type C: an extensive literature review and analysis on one of the topics covered in class
Milestones for the project:
  • Proposal -- Not for a grade but should be submitted. Aim for at least 2 pages. The better thought-out this is, the more feedback the instructor can give. Also think about what data and/or code you will use.
  • Draft (10% of final grade) -- This should be like a final version of your final report, and should have all sections that your final report would have (although some of them will be incomplete yet), showing as much progress as you can. The expectation is that at this point you have done 1/3 or 1/2 of the required work for this project.
  • Presentation (5% of final grade) -- Aim to be clear, enthusiastic and concise. You need to submit (on CourseWeb) the presentation file on the day of your presentation for the instructor's reference.
  • Final report (15% of final grade) -- Use the CVPR latex template. The final report should resemble a conference paper and should include (as applicable) clear problem definition and argumentation of why this problem is important, overview of related work, detailed explanation of the approach, well-motivated experimental evaluation, including setup description, and a description of what each team member did.
General rules:
  • Students are encouraged to work in groups of two for their final project. The only exception is the literature review, which can only be done by students working individually.
  • The project should include some amount of novelty.
  • You are encouraged to use any external expertise you might have (e.g. biology, physics, etc.) so that your project makes the best use of areas you know well, and is as interesting as possible.
  • Combining your final project for this class and another class is generally permitted, but the project proposal and final report should clearly outline what part of the work was done to get credit in this class, and the instructor should approve the proposed breakdown of work between this and another class.
  • The final report should be self-contained, i.e. the instructor should not have to read any other papers to understand what you did.
  • All project written items are due at 11:59pm on CourseWeb.
If you are proposing a new problem or a new solution to an existing problem:
  • The project should include some amount of novelty. For example, you cannot just re-implement an existing paper or project. You should come up with a new method, or apply an existing method for a new problem.
  • Do not rely on data collection to be the novel component of your work. If you are proposing to tackle a new problem, you might need to collect data, but while this is a contribution, it will not be enough to earn a good project grade. You still have to come up with a solid method idea, i.e. your project has to have sufficient technical novelty.
  • You must show that your method is in some sense better (quantitatively) than at least some relatively recent existing methods. For example, you can show that your method achieves superior accuracy in some prediction task compared to prior methods, or that it achieves comparable accuracy but is faster. This outcome is not guaranteed to come out the way you intended during the limited timespan of a course project, so whether or not your outperform the state of the art will only be a small component of your grade. Further, if you propose a sufficiently interesting method, rather than an extremely simple method, it will be less of a problem if your method does not outperform other existing approaches to the problem.
  • Each of the following components will be graded: how well you introduced and motivated the problem in your presentation and final report; how well you researched and presented the relevant work in the area you are tackling; how technically solid and novel your method is; how well you experimentally tested your method, and analytically discussed your experimental findings; how well you were able to draw conclusions from your work and discuss potential future work to further improve on the problem you proposed to tackle.
  • You are allowed to use existing code for known methods, but again, notice that your project is expected to be a significant amount of work and not just a straight-up run of some package.
  • This type of project has the highest chance of turning into a published workshop or conference paper.
  • Even for this type of project, you should present a very brief literature review during your presentation, so your classmates know the space in which you are working.
  • A good source for learning about what work has been done in your domain of interest are search engines, Google Scholar, and
If you are proposing a literature review, or are proposing to experimentally compare existing solutions to a known problem:
  • What you are proposing to do should not already have been done in another published paper (including papers on
  • You have to properly introduce and motivate the problem you chose to study (i.e. why it is important, and why it is challenging).
  • For experimental comparisons, you still need to present a detailed literature review for the topic at hand. You must review and include detailed descriptions (in your final report) of at least 10 papers. If code is not available for most of the papers you chose to implement, you need to experimentally compare at least 3 papers. In your implementation of papers without code, you do not have to follow the papers in every detail, but your implementation should be faithful to the paper you are implementing "in spirit". You should implement (rather than use existing code) for at least one of the methods you compare against; e.g. you might use code for 3 papers and implement 1 additional paper. Make sure to include a careful justification why these 3 are the ones you chose to implement. Make sure to include a detailed analysis of the strengths and weaknesses of each paper you chose to compare, based on both the published papers, as well as the experimental findings you collected over the course of the project.
  • For literature reviews, your final report should include at least 20 references. It should show a sensible organization of these references, and at least one paragraph containing details about each paper, including at least 5+ sentences describing the method in each of the referenced works. Make sure to describe both the technical details, and the experimental techniques used in each of the papers you present. Make sure to discuss some strengths and weaknesses of each paper you include in your review. Also include a synthesis/summary of what has been accomplished in the community on the problem you chose to study, grouped by the themes of the papers, and what future work might be.
  • Literature reviews can only be done in teams of one.
For sources of ideas:
  • Look at the datasets and tasks below.
  • Read some paper abstracts on this page.
  • Look at the topics in the programs of some of the recent computer vision conferences: CVPR 2016 (with papers downloadable here), ECCV 2016, and ICCV 2015.


DateChapterTopicReadingsLecture slidesDue
1/5BasicsIntroductionSzeliski Sec. 1.1-1.2pptxpdf
1/10Intro (cont'd)
Linear algebra
Intro to recognition
1/17Support vector machinesBishop PRML Sec. 1.1,
Bishop PRML Sec. 7.1
1/19Neural networksKarpathy Module 1pptxpdfHW1 out
1/24Neural networks (cont'd)
1/26Convolutional neural networksKarpathy Module 2, Krizhevsky NIPS 2012, Zeiler ECCV 2014pptxpdf
1/31Understanding convolutional neural networkspptxpdf
2/2Low-level tasksFilters and textureSzeliski Sec. 3.2, 10.5, 4.1.1pptxpdf
2/9Feature detection and descriptionSzeliski Sec. 4.1, Grauman/Leibe Ch. 3; feature survey Ch. 1,3.2,7; SIFT paper by David LowepptxpdfHW1 due, HW2 out
2/16Feature matchingSzeliski Sec. 14.3; Grauman/Leibe Sec. 4.2; Video Googlepptxpdf
2/21Edge detection, segmentation and clusteringSzeliski Sec. 4.2, 5.3-4; Hariharan CVPR 2015pptxpdf
2/23HW2 due, HW3 out
2/28Hough transform and RANSACSzeliski Sec. 4.3.2, 6.1.4; Grauman/Leibe Sec. 5.2pptxpdf
3/2Midterm exam
3/7Spring break (no class)
3/14Multiple viewsSzeliski Sec 2.1, 3.6.1, 7.2, 11.1.1; Grauman/Leibe Sec. 5.1pptxpdfproject proposal due
3/16High-level tasksObject recognition and detectionSzeliski Sec. 14.1, 14.4; Grauman/Leibe Sec. 8, 9,, 10.3.3, 11.1,2,5;
Viola Jones CVPR 2001, Felzenszwalb PAMI 2010, Girshick CVPR 2014
3/23HW3 due
3/28Sequential predictions; Recurrent neural networksblog1, blog2,
Karpathy CVPR 2015,
Wu CVPR 2016
4/4Motion: Tracking, pose and actionsShotton CVPR 2011,
Laptev CVPR 2008,
Pirsiavash CVPR 2012
4/11Unsupervised learning, active learning, our researchDoersch ICCV 2015, Lee ICCV 2013, Branson ECCV 2010pptxpdf

project draft due
4/13paper reviews due
4/18Final exam
4/20ProjectsProject presentations (schedule)
4/27project final report due



This course was inspired by the following courses: Tutorials: Some datasets: Some code of interest:
  • LIBSVM (by Chih-Chung Chang and Chih-Jen Lin)
  • SVM Light (by Thorsten Joachims)
  • VLFeat (feature extraction, tutorials and more, by Andrea Vedaldi)
  • Caffe (deep learning code by Yangqing Jia et al.)

CS698N (3-0-0-0-9) - Recent Advances in Computer Vision

Semester: Sem I, July - Nov 2016
Instructor: Gaurav Sharma, CSE, IITK
Contact: grv AT ([CS698N] in email subject; o/w ignored)
Lectures: Wed and Fri, 10h30--12h00
Office hours: Wed 15h00--16h00 or by appointment
TA: Saptarshi Gan (sapgan AT


In this course, we will look at a subset of topics in the following exciting sub-areas of research in Computer Vision. This list of topic is adaptable depending on the level and interests of the students actually taking the course.

  • Human Analysis eg. actions, pose estimation, facial analysis, attribute recognition, pedestrian detection
  • Language and Vision eg. image captioning, visual question answering
  • Image segmentation eg. semantic segmentation and multi resolution edge estimation, instance segmentation

There will be a significant project component -- you are expected to mainly learn by doing.

I will also try to organize ~4 guest lectures (probably) over video conferencing where researchers who are actually working on the forefront of a researh problem will present their recent work.

Department's anti-cheating policy is applicable on your participation in the course.


  • Project (total 50%)

    • State-of-the-art presentation 10%
    • Proposal 10%
    • Mid-sem progress presentation 10%
    • Final report and presentation/demo 20%

  • Assignments (total 30%)

    • 1 page extended abstract of a research paper 10%
    • Review a research paper 10%
    • Seminar presentation 10%

  • End-sem 20%


  • Final project submissions are due on the 07th November 2016, 23:59h IST
  • Assignment 3 is due on the 31st October 2016, 23:59h IST
  • Assignment 2 is due on the 21st October 2016, 23:59h IST
  • Assignment 1 is due on the 20th August 2016, 23:59h IST
  • Dear auditors please send me an email to add your email id to the course related emails.

Guest Lectures

The following people have kindly agreed to give guest lectures. The lecture schedules are to be decided (unless given). Scroll to the end of the page for details about the talks which have been scheduled so far.
  • Chetan Arora, IIIT Delhi. Topic: Activity Recognition in First Person Videos [details]
  • Hakan Bilen, University of Oxford. Topic: Weakly supervised object detection [details]
  • Makarand Tapaswi, University of Toronto. Topic: Understanding Stories by Joint Analysis of Language and Vision [details]
  • Omkar Parkhi, Zoox Inc., Topic: Face recognition in still images and videos [details]
  • Jan Hosang, Max Planck Institute for Informatics. Topic: Detection Proposals and Learning Non-Maximum Suppression [details]


The first two Assignments are to be done in groups of 2-3 people. Only one file per group needs to be uploaded by any one of the group members. Please use the IITK CSE moodle website for submitting the assignments. The time stamp of the moodle upload will be considered your submission time.
Any submission by mail will not be considered.

  1. Pick a paper (within the areas given in Summary above) which catches your attention from a recent computer vision conference (ECCV 2016, ICCV 2015, CVPR 2015 or 2016 -- not workshop papers) and write a one page extended abstract in BMVC 2015 Latex template, section on "Instructions for submission of camera ready papers" (in groups of 2-3; need not be same as project group). The idea here is to understand the paper and highlight the contributions (problem solved, novelty and other advantages of the approach and experimental results) made.

    The proceedings for recent ICCVs and CVPRs are open access. For ECCV 2016 (which will happen in Sep 2016 but the acceptance decisions are out) you will have to search the internet or look at webpages of some prominent researchers or search arXiv (1. the X is read as chi -- so arXiv is read as "Archive", 2. keep looking after every few days as new papers keep on coming).

    Some examples -- object detection, deep face recognition, and many more at BMVC 2015 website.

    Due 20th August 2016, 23:59h IST
  2. Pick a paper (within the areas given in Summary above) which catches your attention from a recent computer vision conference (ECCV 2015, 2016; ICCV 2015 or CVPR 2015, 2016 -- not workshop papers)) and write a review for the paper (in groups of 2-3; need not be same as project group). In particular, you should

    • Summarize the paper
    • Point out the strengths of the paper
    • Point out the weaknesses of the paper
    • Say if you would like to accept it or reject it with justifications
    • Optional -- Contrast and compare it with other recent papers/methods

    For reference, reviewer guidelines for CVPR 2015 are here. Also, you can find numerous articles by different academics on how to review papers. I will put up some light formatting requirements soon.
    Due 21st October 2016, 23:59h IST
  3. One ECCV 2016 paper each has been assigned to your group (see the mailing list). You have to prepare a presentation for explaining the paper in sufficient details in 15 minutes. You are also expected to read some of the other papers and contribute to class discussions. There will be time for questions in addition to the 15 min presentation time.
    Due 31st October 2016, 23:59h IST

Late Submission Policy

You have total 4 late submission days (for the assignments, the project proposal and the project end-sem reports). After that, you will be penalized by deducting 5 marks for each late day (over the 4 late days allowed). The late days cumulate over the different deadlines, i.e. if you were late by 1 day for assignment 1 and then late again by 3 days for assignment 2, you would have used up the total of 4 late days quota. The project presentations (state-of-the-art, mid-sem and end-sem) have to be uploaded two days prior to the presentation day (e.g. if presentation is on the 5th, the deadline for uploading presentations would be 23:59h of the 3rd) and this is a hard deadline - if you fail to do so you will not be allowed to give the respective presentation. The idea here is that you should practice giving the presentation at least a day in advance. All the deadlines will be announced on the webpage in due time.

Project Details

The project will be a significant component of the course and will be continuously evaluated. The project should not overlap with any project, e.g. for any other course, that you might have done previously or are doing currently. If there is any such overlap, it should be decalred clearly; failure to do so will be considered as cheating and the Department's anti-cheating policy will be used to deal with such cases.

You are expected to choose a problem for your project, implement/reproduce approximately results from an existing paper using open-source or available code and finally either (i) implement a key algorithm in that yourself or (ii) go beyond that work by identifying some weakness and improving on it.

Milestones for project; see the course calendar below for exact deadlines.

  • Make groups of 2-3
  • Proposal and State-of-the-art presentation - expectation is that you will read at least one paper on which you want to base your project on and submit a proposal by (i) describing the problem, (ii) discussing a few methods which try to solve the problem and (iii) your plan for the project towards solving that problem.
  • Mid-sem project evaluation (presentation and short report) - reproduce some results using open-source libraries or available code (released by authors). Present or demo your results.
  • End-sem project evaluation (presentation and report) - Implement one key algorithm yourself or go beyond the base method. Summarize your contribution and present or demo your results.

Only one file per group needs to be uploaded by any one of the group members. Please use the IITK CSE moodle website for submitting the project related presentations and report. The time stamp of the moodle upload will be considered your submission time.
Any submission by mail will not be considered.

Course Calendar

All deadlines below are at 23:59 IST of the respective dates.
The course material (slides etc.) are freely usable for educational and non-commercial research purpose, with due attribution. The material is avialable as is without any warranty, expressed or implied, whatsoever. Any commercial use requires prior written permission from the author. If you are the owner of any of the content included (eg. images), and feel that it has been unfairly used, kindly let me know and I will either attribute it to you as you specify or take it off, depending on your request.

# Date Content Remarks/Deadlines
1 27 Jul 2016 Introduction slides
2. 29 Jul 2016 Preliminaries I -- (Classification, CNN ...) CNN tutorial (first draft), slides
3. 03 Aug 2016 Human Actions I & Segmentation I slides
4. 05 Aug 2016 Preliminaries II -- (Backprop, RNN, LSTM ...)slides, Rojas-NN-Chap7
a. 09 Aug 2016 GL: Chetan Arora, IIITDdetails
5. 10 Aug 2016 Preliminaries III -- (Backprop, RNN) Submit groups for project
6. 17 Aug 2016 Prelim. IV (LSTM) & Vision and Language I slides
7. 19 Aug 2016 Image Classification slides
20 Aug 2016 Assignment 1 due
8. 24 Aug 2016 Object Detection Islides
25 Aug 2016 SoA presentation file due
9. 26 Aug 2016 Object Detection IIslides
i. 31 Aug 2016 Project presentations I Project proposal due
ii. 01 Sep 2016 Project presentations II
iii. 02 Sep 2016 Project presentations III
b. 07 Sep 2016 GL: Hakan Bilen, Oxforddetails
10. 09 Sep 2016 Unsupervised Representation Learning slides
14 Sep 2016 No class Mid-sem exams
16 Sep 2016 No class Mid-sem exams
c. 21 Sep 2016 GL: Makarand Tapaswi, Torontodetails
11. 23 Sep 2016 Metric Learning and Applications slides
12. 28 Sep 2016 On depth of networksslides
02 Oct 2016 Mid-sem eval. files due
iv. 05 Oct 2016 Mid-sem evaluation
v 06 Oct 2016 Mid-sem evaluation
d. 07 Oct 2016 GL: Omkar Parkhi, Zoox Inc.details
12 Oct 2016 No class Mid-sem break
14 Oct 2016 No class Mid-sem break
21 Oct 2016 No class Cultural festival
21 Oct 2016 Assignment 2 due
13. 26 Oct 2016 Vision and Language II -- VQA slides
e. 27 Oct 2016 GL: Jan Hosang, MPI Infdetails
vi. 28 Oct 2016 A3 Seminar I
vii. 31 Oct 2016 Assignment 3 due
viii. 02 Nov 2016 A3 Seminar II
ix. 04 Nov 2016 A3 Seminar III
07 Nov 2016 Project files due
x. 09 Nov 2016 Project evaluations Final project evaluation
xi. 11 Nov 2016 Project evaluations Final project evaluation

Details of Guest Lectures


Chetan Arora
Assistant Professor
Indraprastha Institute of Information Technology Delhi (IIIT Delhi)
Activity Recognition in First Person Videos
Time and Venue
Date: 09 August 2016, Tuesday
Time: 1100--1200 (tea at 1045)
Venue: RM101, CSE
(In person)
Wearable cameras like the GoPro are one of the best selling cameras these days. The always on nature and the first person point of view are the unique characetristics of such egocentric cameras giving access to the information not possible with traditional point and shoot cameras. Recognizing wearer's activity is one of the core tasks in many egocentric applications. In this talk I will present some of our work in this area starting with our earlier work on long term activity recognition using traditional machine learning methods. I will then go on to explain how deep learning helped us to generalize the recognition to much larger class of activities for which designing hand tuned features was unthinkable.
Speaker Bio
Chetan Arora received his Bachelor's degree in Electrical Engineering in 1999 and the Ph.D. degree in Computer Science in 2012, both from IIT Delhi. From 2000-2009 he was an entrepreneur involved in setting up companies working on various computer vision based products. From 2012 to 2014 he was post-doctoral researcher at Hebrew University, Israel. He is currently an Assistant Professor at IIIT Delhi. His broad areas of research include computer vision and image processing.

Hakan Bilen
Visual Geometry Group
University of Oxford
Weakly Supervised Object detection
Time and Venue
Date: 07 September 2016, Wednesday
Time: 1930--2030
Venue: KD101, CSE
(By video conferencing)
Weakly supervised learning of object detection is an important problem in image understanding that still does not have a satisfactory solution. In this talk, we address this problem by improving different aspects of the standard multiple instance learning based object detection. We first present a method that can represent and exploit presence of multiple object instances in an image. Second we further improve this method by imposing similarity among objects of the same class. Finally we propose a weakly supervised deep detection architecture that can exploit the power of deep convolutional neural networks pre-trained on large-scale image-level classification tasks.
Speaker Bio
Hakan Bilen received his PhD degree in Electrical Engineering in 2013 and spent a year as a postdoctoral researcher at the University of Leuven in Belgium. He is currently a postdoctoral researcher in the University of Oxford since 2015. His research areas include computer vision and machine learning.

Makarand Tapaswi
University of Toronto
Understanding Stories by Joint Analysis of Language and vision
Time and Venue
Date: 21 September 2016, Wednesday
Time: 1400--1500 IST
Venue: RM101, CSE
(In person)
Humans spend a large amount of time listening, watching, and reading stories. We argue that the ability to model, analyze, and create new stories is a stepping stone towards strong AI. We thus work on teaching AI to understand stories in films and TV series. To obtain a holistic view of the story, we align videos with novel sources of text such as plot synopses and books. Plots contain a summary of the core story and allow to obtain a high-level overview. On the contrary, books provide rich details about characters, scenes and interactions allowing to ground visual information in corresponding textual descriptions. We also work on testing machine understanding of stories by asking it to answer questions. To this end, we create a large benchmark dataset of almost 15,000 questions from 400 movies and explore its characteristics with several baselines.
Speaker Bio
Makarand Tapaswi received his undergraduate education from NITK Surathkal in Electronics and Communications Engineering. Thereafter he pursued an Erasmus Mundus Masters program in Information and Communication Technologies from UPC Barcelona and KIT Germany. He continued with the Computer Vision lab at Karlsruhe Institute of Technology in Germany and recently completed his PhD. He will be going to University of Toronto as a post-doctoral fellow starting in October.

Omkar Parkhi
Zoox Inc.
Face recognition in still images and videos
Time and Venue
Date: 7 October 2016
Time: 1030--1200 IST
Venue: KD101, CSE
(By video conferencing)
In this talk I will describes feature representations for face recognition, and their application to various activities relating to image and video datasets.

First, we will look at different "shallow" representations for faces in images and videos. The objective is to learn compact yet effective representations for describing faces. Specifically will see the effectiveness of "Fisher Vector" descriptors for this task. We show that these descriptors are perfectly suited for face representation tasks both in images as well as videos. I will also look at various approaches to effectively reduce their dimension while improving their performance further. These "Fisher Vector" features are also amenable to extreme compression and work equally well when compressed by over 2000 times as compared to their non compressed counterparts. These features have achieved the state-of-the-art results on challenging public benchmarks.

More recently the Convolution Neural Networks (CNN) have been dominating the field of face recognition as with the other fields of computer vision. Most of the public research on CNNs for face recognition has been contributed by the Internet giants like Facebook. At the same time, in the academic world, increasingly complex network architectures were introduced specifically for facial recognition. One such proposal used 200 trained networks for final score prediction. We aim to propose a simple yet effective solution to this problem and investigate the use of ``Very Deep'' architectures for face representation tasks. For training these networks, we collected one of the largest annotated public datasets of celebrity faces requiring minimum manual annotations. We bring out specific details of these network architectures and their training objective functions essential to their performance and achieve state-of-art result on challenging datasets.

Having described these representation, I will explain their application to various problems in the field. We will look at a method for labeling faces in the challenging environment of broadcast videos using their associated textual data, such as subtitles and transcripts. We show that our CNN representation is well suited for this task and also propose a scheme to automatically differentiate the primary cast of a TV serial or movie from that of the background characters. We improve existing methods of collecting supervision from textual data and show that the careful alignment of video and textual data results in significant improvement in the amount of training data collected automatically, which has a direct positive impact on the performance of labeling mechanisms. We provide extensive evaluations on different benchmark datasets achieving, again, state-of-the-art results.

Further we show that both the shallow as well the deep features described above have excellent capabilities in switching modalities from photos to paintings and vice-a-versa. We propose a system to retrieve paintings for similar looking people given a picture and investigate the use of facial attributes for this task. Finally, I will show an "on-the-fly" real time search system that has been built to search through thousands of hours of video data starting from a text query. To ensure real time performance, we propose product quantization schemes for making face representations memory efficient. We also present the demo system based on this design for the British Broadcasting Corporation (BBC) to search through their archive.

All of these contributions have been designed with a keen eye on their application in the real world. As a result, most of discussed contributions have an associated code release and a working online demonstration.

Additionally I will also briefly describe some of our previous work on detecting deformable animals (cats and dogs) and their sub-categorization.
Speaker Bio
Omkar is an alumnus of CVIT, IIIT Hyderabad and did his PhD under Andrew Zisserman at Oxford. Currently he is with the autonomous driving startup Zoox Inc.

Jan Hosang
PhD candidate
Max Planck Institute for Informatics
Detection Proposals and Learning Non-Maximum Suppression
Time and Venue
Date: 26 October 2016
Time: 1930--2030 IST
Venue: KD101, CSE
(By video conferencing)
The talk will focus on the very first and very last step in the common object detection pipeline. Proposals are a common technique to cut down the search space compared to typical sliding window detection, while keeping high detection quality. I will talk about the implications of the search space reduction and proposal evaluation.

Non-maximum suppression is a hand crafted post processing step that persists even though we like to think of object detectors as end-to-end trained systems. In its typical form it forces a trade-off between how many occluded objects can be detected and how many false detections are generated. I will present how it is possible to learn non-maximum suppression with a Convnet by posing it as rescoring task.
Speaker Bio
Jan Hosang received is Diploma in computer science at RWTH Aachen University in 2011. Since then he has interned at the handwriting recognition group at Google and joined the UdS Computer Science Graduate School in 2012. He is currently pursuing a PhD in computer science in the Computer Vision and Multi-modal Computing group at the Max Planck Institute for Informatics, Saarbrücken. His research interests are computer vision and machine learning, in particular object detection.


Leave a Reply

Your email address will not be published. Required fields are marked *