Engineering | ASU Active Perception Group (APG) / Yezhou Yang

ASU Active Perception Group Seminar Series

Key Facts

Seminar Date
Biweekly Friday 9am
Workshop Location
Brickyard 5th floor conference room
Yezhou Yang, Open for volunteers
Everyone is welcome to sit-in for the presentation part
April 20th 2018, Shuai Li and Maverick Chung

Congratulations Maverick for concluding his Senior project with ASU APG.

April 13th: Xin Ye - Robot with vision that finds objects


April 06th: Venka and Diptanchu - Partially observable decision makings

March 30th, Houpu Yao, Recognition by Imagination


March 23rd, Kong Shu from UCI


The topic is “scene parsing through per-pixel labeling: a better and faster way”, which partially combines the following two write-ups (with splash figures:-)–
Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR 2018.
Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, submitted to ECCV2018.

Objects may appear at arbitrary scales in perspective images of a scene, posing a challenge for recognition systems that process images at a fixed resolution. We propose a depth-aware gating module that adaptively selects the pooling field size (by fusing multi-scale pooled features) in a convolutional network architecture according to the object scale (inversely proportional to the depth) so that small details are preserved for distant objects while larger receptive fields are used for those nearby. The depth gating signal is provided by stereo disparity or estimated directly from monocular input. We further integrate this depth-aware gating into a recurrent convolutional neural network to perform semantic segmentation. Our recurrent module iteratively refines the segmentation results, leveraging the depth and semantic predictions from the previous iterations.

Moreover, rather than fusing mutli-scale pooled features based on estimated depth, we show the “correct” size of pooling field for each pixel can be learned in an attentional fashion by our  Pixel-wise Attentional Gating unit (PAG) that learns to selectively process a subset of spatial locations at each layer of a deep convolutional network.  PAG enables us to achieve parsimonious inference in per-pixel labeling tasks with a limited computational budget. PAG is a generic, architecture-independent, problem-agnostic mechanism that can be readily “plugged in” to an existing model with fine-tuning. We utilize PAG in two ways: 1) learning spatially varying pooling fields that improve model performance without the extra computation cost associated with multi-scale pooling, and 2) learning a dynamic computation policy for each pixel to decrease total computation while maintaining accuracy. We extensively evaluate PAG on a variety of per-pixel labeling tasks, including semantic segmentation, boundary detection, monocular depth and surface normal estimation. We demonstrate that PAG allows competitive or state-of-the-art performance on these tasks. Our experiments show that PAG learns dynamic spatial allocation of computation over the input image which provides better performance trade-offs compared to related approaches (e.g., truncating deep models or dynamically skipping whole layers). Generally, we observe PAG can reduce computation by 10% without noticeable loss in accuracy and performance degrades gracefully when imposing stronger computational constraints.

March 09 2018: Kevin Luck, Differentiable Neural Computer


Feb 17th, Varun: Effects of AR/VR in HRI


Feb 2nd, Kausic: Hallucination


Jan 19th, Mohammad: AHCNN

Jan 12, 2018: Duo Lu, on Visual Recognition and Security


Dec 1st, 2017, Jacob Zhiyuan Fang on Capsule Networks and ASU APG end-of-semester gathering.


Oct 19th and Nov 3rd Fall research expo
Oct 5th, Mo Izady, Key Evidences Localization in Medical Images
September 21st 2017, Shuai Li: "Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning"
September 8th, Zhiyuan (Jacob) Fang


April 24th, 2017, Aman Verma: Structure from Motion and its CNN modeling


Aug 18 2017 Summer Research Expo


External Speakers: Kowshik Thopalli and Perikumar Mukundbhai Javia

Topic: Visual Question Answering and it’s Adversarial Modeling



ASU APG Memory of 2016-2017










April 24th, 2017, Ramu Ponneganti: Rationalizing Neural Predictions


April 3rd, Xin Ye, Hand Movement Prediction from Vision for Human Robot Interaction


Feb 27th 2017, Khimya Khetarpal, Learning Visual Representations

Feb 20th 2017, Rudra Saha, InfoVAE

Feb 1st, Mo Izady, Deep learning in medical image processing

Jan 23rd 2017, Divyanshu Bandil, Visual Question Categorization

Jan 9th 2017, Mohammad Farhadi, Meta-modeling for deep learning


Nov 21st 2016 Stephen McAleer, Generative Adversarial Networks (GAN)


wp_20161121_001-1 wp_20161121_002-1

Slides from Stephen:


Nov 7th 2016, Ramu Ponneganti: Event Recounting for Video processing
Nov 7th 2016, Yantian Zha: Differentiable Neural Computer