ASU Active Perception Group Seminar Series
- Seminar Date
- Biweekly Friday 1pm to 3pm
- Workshop Location
- Brickyard 5th floor conference room (580)
- Yezhou Yang, Open for volunteers
- Everyone is welcome to sit-in for the presentation part
ASU APG seminar end of 2019 Fall gathering
ASU APG End-of-summer-2018 Research Expo (Aug 13 2018)
April 20th 2018, Shuai Li and Maverick Chung
Congratulations Maverick for concluding his Senior project with ASU APG.
April 13th: Xin Ye - Robot with vision that finds objects
April 06th: Venka and Diptanchu - Partially observable decision makings
March 30th, Houpu Yao, Recognition by Imagination
March 23rd, Kong Shu from UCI
The topic is “scene parsing through per-pixel labeling: a better and faster way”, which partially combines the following two write-ups (with splash figures:-)–
Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR 2018.
Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, submitted to ECCV2018.
Objects may appear at arbitrary scales in perspective images of a scene, posing a challenge for recognition systems that process images at a fixed resolution. We propose a depth-aware gating module that adaptively selects the pooling field size (by fusing multi-scale pooled features) in a convolutional network architecture according to the object scale (inversely proportional to the depth) so that small details are preserved for distant objects while larger receptive fields are used for those nearby. The depth gating signal is provided by stereo disparity or estimated directly from monocular input. We further integrate this depth-aware gating into a recurrent convolutional neural network to perform semantic segmentation. Our recurrent module iteratively refines the segmentation results, leveraging the depth and semantic predictions from the previous iterations.
Moreover, rather than fusing mutli-scale pooled features based on estimated depth, we show the “correct” size of pooling field for each pixel can be learned in an attentional fashion by our Pixel-wise Attentional Gating unit (PAG) that learns to selectively process a subset of spatial locations at each layer of a deep convolutional network. PAG enables us to achieve parsimonious inference in per-pixel labeling tasks with a limited computational budget. PAG is a generic, architecture-independent, problem-agnostic mechanism that can be readily “plugged in” to an existing model with fine-tuning. We utilize PAG in two ways: 1) learning spatially varying pooling fields that improve model performance without the extra computation cost associated with multi-scale pooling, and 2) learning a dynamic computation policy for each pixel to decrease total computation while maintaining accuracy. We extensively evaluate PAG on a variety of per-pixel labeling tasks, including semantic segmentation, boundary detection, monocular depth and surface normal estimation. We demonstrate that PAG allows competitive or state-of-the-art performance on these tasks. Our experiments show that PAG learns dynamic spatial allocation of computation over the input image which provides better performance trade-offs compared to related approaches (e.g., truncating deep models or dynamically skipping whole layers). Generally, we observe PAG can reduce computation by 10% without noticeable loss in accuracy and performance degrades gracefully when imposing stronger computational constraints.
March 09 2018: Kevin Luck, Differentiable Neural Computer
Feb 17th, Varun: Effects of AR/VR in HRI
Feb 2nd, Kausic: Hallucination
Jan 19th, Mohammad: AHCNN
Jan 12, 2018: Duo Lu, on Visual Recognition and Security
Dec 1st, 2017, Jacob Zhiyuan Fang on Capsule Networks and ASU APG end-of-semester gathering.
Oct 19th and Nov 3rd Fall research expo
Oct 5th, Mo Izady, Key Evidences Localization in Medical Images
September 21st 2017, Shuai Li: "Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning"
September 8th, Zhiyuan (Jacob) Fang
April 24th, 2017, Aman Verma: Structure from Motion and its CNN modeling
Aug 18 2017 Summer Research Expo
External Speakers: Kowshik Thopalli and Perikumar Mukundbhai Javia
Topic: Visual Question Answering and it’s Adversarial Modeling
ASU APG Memory of 2016-2017
April 24th, 2017, Ramu Ponneganti: Rationalizing Neural Predictions
April 3rd, Xin Ye, Hand Movement Prediction from Vision for Human Robot Interaction
Feb 27th 2017, Khimya Khetarpal, Learning Visual Representations
Feb 20th 2017, Rudra Saha, InfoVAE
Feb 1st, Mo Izady, Deep learning in medical image processing
Jan 23rd 2017, Divyanshu Bandil, Visual Question Categorization
Jan 9th 2017, Mohammad Farhadi, Meta-modeling for deep learning
Nov 21st 2016 Stephen McAleer, Generative Adversarial Networks (GAN)
Slides from Stephen: