MILE: A minimally interactive learning framework for visual data analysis

Document Type

Conference Proceeding

Date of Original Version



Recent progress in computer vision has shown that researchers are beginning to be more receptive to the idea of object tracking and scene understanding, specifically on large-scale surveillance video stream. Unfortunately, in order to train such systems, commonly adopted approach of full annotation of event of interest in all images and videos of a large dataset is an onerous and expensive task. Instead of relying on small supervised datasets for complex visual tasks, incremental learning strategy allows us to use large, inexpensive datasets in a computationally intelligent scheme. In this paper, we propose MILE, a minimally interactive learning framework to incrementally learn complex visual models for robust long-term tracking and high-level behavior analysis. The core idea of our incremental learning is that we constantly update the adaptive feature models of targets and background based on reliable detection and tracking results. We employ a Gaussian mixture method to dynamically model the background and detect all moving targets in the scene and apply a modified mixture particle filter to capture the multi-modality distribution in multiple targets tracking, which can not only improve detection accuracy but also help to solve targets occlusion and merging problems. As a proof of concept, a prototype implementation of MILE has been designed and validated on several challenging large scale surveillance visual data, yielding satisfactory performance of tracking in large scale video stream.

Publication Title

2018 IEEE 8th Annual Computing and Communication Workshop and Conference, CCWC 2018