A Bayesian inference theory of attention Sharat Chikkerur, Thomas Serre & Tomaso Poggio CBCL, McGovern Institute for Brain Research, MIT
•Filter theory (Broadbent) •Biased competition (Desimone) •Feature integration theory (Treisman) •Guided search (Wolfe) •Scanpath theory (Noton)
Computational Role
Biology • V1 • V4 • MT • LIP • FEF
•Bayesian surprise (Itti) •Bottleneck (Tsotsos)
Attention
Effects
•Contrast gain •Response gain •Modulation under spatial attention •Modulation under feature attention
•Pop-out •Serial vs. Parallel •Bottom-up vs. Top-down
Role of attention
Invariant recognition: mixed blessing Large pooling in the higher regions leads to invariance
(Kreiman G., C. Hung, T. Poggio and J. DiCarlo, SUNS 06)
Limitation of feed-forward model IT
Zoccolan Kouh Poggio DiCarlo 2007
V4
Reynolds Chelazzi & Desimone 1999
Psychophysics
Serre Oliva Poggio 2007
Feedforward vs. attentive-processing
Attention is needed to recognize objects under clutter
A theoretical framework
Perception as Bayesian inference •
Mumford and Lee, “Hierarchical Bayesian Inference in the Visual Cortex”, JOSA, 20(7), 2003
•
xIT
Recurrent feed-forward/feedback loops integrate bottom up information with top down priors
•
Bottom-up signals : Data dependent
•
Top-down signals : Task dependent
•
Top down signals provide context information and help to disambiguate bottom-up signals
xV4
xV2
xV1
x0
Attention as Bayesian inference (MIT, Chikkerur, Serre, Poggio)
PFC LIP/FEF
IT V4 V2 Desimone ,MIT (unpublished)
Spatial attention: What isWhere at location L? O? Feature-based attention: is object
Model description “What”
“Where” L
Fi
Fil N
Feature-maps
Feature-maps
Image
Model properties: invariance “What”
“Where” L
Fi
Fil N
Model properties: crowding “What”
“Where” L
Fi
Fil N
Model: spatial attention L
Fi
X
Fil N
What is at location X?
*
*
*
Model: feature-based attention X L
Fi
Fil *
*
*
Where is object X?
N
Effects of attention
Spatial Invariance
Spatial Attention
Feature Attention
Feature Popout
Parallel vs. Serial Search
Recognition under clutter: Feature+Spatial
Spatial attention Model
McAdams and Maunsell ‘99
Unattended Attended
0
20
40
60
80
100
120
140
160
180
Feature-based attention Bichot and Desimone ‘05
Model
P stim/P cue NP stim/ P.cue P.stim/NP cue NP stim/NP cue
0
0.1
0.2
0.3
0.4
0.5
Contrast gain vs. Response gain Trujillo and Treue ‘02
Mc Adams and Maunsell’99
Psychophysics
(joint work with Cheston Tan)
Model can predict human eye-movements
Top-down Bottom-up spatial attention and feature attention Method
Method
ROC area (Cars) ROC area (Pedestrian) ROC area (absolute)
Bruce and Tsotos ’06 42.3% Itti et al. ’01 Torralba et al. Itti et al ’01 78.9% Proposed Proposed 80.4% Humans
87.8%
72.8%
42.3%
72.7%
77.1%
77.9%
80.1% 87.4%
Recognition performance improves with attention
Chikkerur, Serre, Tan & Poggio (in prep)
Relation to prior work
Thank you
Examples
Examples
Quantitative evaluation: ROC
Quantitative evaluation: ROC Integrating (local) feature-based + (global) context-based cues accounts for 92% of inter-subject agreement! 1
ROC area
0.75 0.5 0.25 0 car Humans Top-down (feature-based)
pedestrian Bottom-up Feaure-based + contextual cues
Chikkerur ,Tan Serre & Poggio (SFN ‘09,VSS ‘09)
Effect of clutter on detection
recognition without attention
recognition under attention
Scale and location prediction
Performance improves under attention performance (d’)
3 2 1 0
one shift no attention of attention Model Humans
Tan, Chikkerur , Serre & Poggio (VSS ‘09)