The problem for generating representative set and summarization has been done by many researchers. We would like report some of the excellent and most related work. Scene Summarization for Online Image Collections by simon, examines the distribution of images in the collection to select a set of canonical views to form the scene summary, using clustering techniques on visual features using SIFT. The author summarized all images based on likely hood, coverage and orthogonality. Our approach is similar with the clustering technique and selection categories. But, the key difference between and our approach is generating windows before clustering phase. In, the author summarizes images based on spatial patterns in photo sets, as well as textual-topical patterns and user (photographer) identity cues. The key difference between and our approach is that we are not using geo-referenced images in our experiments. We highly focus on low level features of images. In, the author generates diverse and representative image search results for landmarks based on context- and content-based tools. To do that the author used location and other metadata as well as tags associated with images, and the images’ visual features. The work is somewhat different with Simon because the author starts from the tags that represent landmarks while the author used also SIFT for the visual features comparison. The differences with our approaches that we are not using metadata associated with images as well as image tags. We only concentrate on visual features of images.

The problem is of selecting iconic images to summarize general visual categories. The author defined iconic images as high-quality representatives of a large group of images consistent both in appearance and semantics. The approach to find such groups was to perform joint clustering in the space of global image descriptors and latent topic vectors of tags associated with the images. The author has also used a ranking mechanism ranking learned from a large collection of labeled images. It’s assumed that there is one iconic view of the scene rather than a diverse set of representative views as we show in this work.

In the absence of location metadata, temporal metadata was also considered in the past for the purpose of photo collection summarization. Graham et al. describe an algorithm to heuristically select representative photos for a given time period in a personal collection, utilizing patterns in human photo-taking habits. Additional time-based work aims to detect events in personal collections, which could be the basis for collection summarization. However, again, all these projects considered singlephotographer collections only. Several projects use geographic data to organize photo collections in novel ways, for example, by detecting significant events and locations in a photo collection. Such structures could indeed be the basis for collection summarization. However, we don’t have geographic data of images.

In general our work is different in terms of windows generating technique in initial stage and ranking mechanism.