Exploring low-level vision models. Case study: saliency prediction.

Ivet Rafegas

Master thesis from Universitat Autònoma de Barcelona - 2013

Download the publication :

After a decade of huge progress in computer vision using flat processing schemes, new architectures based on deep hierarchies are currently gaining strength in the field. This new paradigm is in line with how visual information is hierarchically processed in the human visual system. Several low-level vision models have proposed hierarchical schemes to simulate the first stages of the ventral stream. In this work, we analyse three of these models (ID, HMAX, MP) giving a unifying overview of them. In this project we focus in what we call the Induction-Derived (ID) model . The choice is based on its generalisation properties shown in predicting both colour induction effects and saliency maps. Its main stages can be summarized as a linear filtering (L1) followed by a centre-surround mechanism (L2), a divisive normalisation (L3) and the application of a weighting function (ECSF) (L4). Our aim is exploring each layer function by proposing alternative implementations to achieve more accurate responses. As case study, we work on saliency prediction since a large standard datasets are available allowing testing the effects of the studied alternatives. We analyse the DOOG filter family vs. the multi-resolution wavelet, a shaped centre-surround vs. the previous rectangular window different divisive normalisation functions vs. the rational quadratic, and null or random weighting function vs. the ECSF. We conclude that extending the family of filters improves feature selectivity; shaped centre-surround can provide a more accurate response and that divisive normalisation functions require being better fitted to the task in hand. After analysing performance measurements of saliency prediction we propose a new measurement, WARP, and we compare and evaluate all the proposed alternatives in a large set of experiments that provides with a wide analysis of the L1, L2 and L3 model stages. Performed experiments appear to diminish the effects of L4 weighting stage (ECSF). Additionally, we start to explore how we can scale on these hierarchies, in view of a more complex task such as object recognition. We derive a new representation from the model output that can be the starting point for a trainable layer that could give a visual code for object recognition.

Images and movies

BibTex references

@MastersThesis\{Raf2013,
author = "Ivet Rafegas",
title = "Exploring low-level vision models. Case study: saliency prediction.",
school = "Universitat Aut\`onoma de Barcelona",
year = "2013",
keywords = "bio-inspired, centre-surround, deep hierarchies, divisive normalisation, early vision models, HMAX, induction, Malik-Perona, saliency estimation, saliency estimation measurement, saliency map, ventral stream, visual codes, visual cortex, V1-like",
abstract = "After a decade of huge progress in computer vision using flat processing schemes, new architectures based on deep hierarchies are currently gaining strength in the field. This new paradigm is in line with how visual information is hierarchically processed in the human visual system. Several low-level vision models have proposed hierarchical schemes to simulate the first stages of the ventral stream. In this work, we analyse three of these models (ID, HMAX, MP) giving a unifying overview of them. In this project we focus in what we call the Induction-Derived (ID) model . The choice is based on its generalisation properties shown in predicting both colour induction effects and saliency maps. Its main stages can be summarized as a linear filtering (L1) followed by a centre-surround mechanism (L2), a divisive normalisation (L3) and the application of a weighting function (ECSF) (L4). Our aim is exploring each layer function by proposing alternative implementations to achieve more accurate responses.
As case study, we work on saliency prediction since a large standard datasets are available allowing testing the effects of the studied alternatives. We analyse the DOOG filter family vs. the multi-resolution wavelet, a shaped centre-surround vs. the previous rectangular window different divisive normalisation functions vs. the rational quadratic, and null or random weighting function vs. the ECSF. We conclude that extending the family of filters improves feature selectivity; shaped centre-surround can provide a more accurate response and that divisive normalisation functions require being better fitted to the task in hand. After analysing performance measurements of saliency prediction we propose a new measurement, WARP, and we compare and evaluate all the proposed alternatives in a large set of experiments that provides with a wide analysis of the L1, L2 and L3 model stages. Performed experiments appear to diminish the effects of L4 weighting stage (ECSF).
Additionally, we start to explore how we can scale on these hierarchies, in view of a more complex task such as object recognition. We derive a new representation from the model output that can be the starting point for a trainable layer that could give a visual code for object recognition.",
keywords = "bio-inspired, centre-surround, deep hierarchies, divisive normalisation, early vision models, HMAX, induction, Malik-Perona, saliency estimation, saliency estimation measurement, saliency map, ventral stream, visual codes, visual cortex, V1-like",
advisor1 = "1",
url = "http://www.cat.uab.cat/Public/Publications/2013/Raf2013"
}

Other publications in the database

» Ivet Rafegas