Shen Yan

shenyan at google dot com

I am a Research Scientist at Google DeepMind, where I work on video-text modeling and its applications.

I did my PhD at the Computer Science Department at Michigan State University. During my PhD, I was fortunate to work with people at Bosch Research, Bytedance AML, Abacus.AI, Argo AI, and Google Research (Perception and Brain team).

Google Scholar / LinkedIn / Twitter

profile photo

Research

VideoPrism: A Foundational Visual Encoder for Video Understanding
Google Research, VFFM
ICML, 2024
blog / arXiv / bibtex

A general-purpose video encoder that tackles diverse video understanding tasks with a single frozen model.

PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter
Junfei Xiao, Zheng Xu, Alan Yuille, Shen Yan*, Boyu Wang*
arXiv, 2024
arXiv / bibtex

We employ a progressively aligned tiny PaLM-2 as the vision-language adaptor.

Pixel Aligned Language Models
Jiarui Xu, Xingyi Zhou, Shen Yan, Xiuye Gu, Anurag Arnab, Chen Sun, Xiaolong Wang, Cordelia Schmid
CVPR, 2024
arXiv / code / bibtex

We propose PixelLLM to equip LLMs with pixel-aligned localization capability.

Streaming Dense Video Captioning
Xingyi Zhou, Anurag Arnab, Shyamal Buch, Shen Yan, Austin Myers, Xuehan Xiong, Arsha Nagrani, Cordelia Schmid
CVPR, 2024
arXiv / code / bibtex

An online video captioner based on token clustering and streaming decoding.

UnLoc: A Unified Framework for Video Localization Tasks
Shen Yan*, Xuehan Xiong*, Arsha Nagrani, Anurag Arnab, Zhonghao Wang, Weina Ge, David Ross, Cordelia Schmid
ICCV, 2023
arXiv / code / bibtex

UnLoc unifies moment retrieval, temporal localization and action segmentation with a single stage model.

VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
Shen Yan*, Tao Zhu*, Zirui Wang, Yuan Cao, Mi Zhang, Soham Ghosh, Yonghui Wu, Jiahui Yu
arXiv, 2023
arXiv / bibtex

VideoCoCa maximally reuses pretrained CoCa and minimizes additional training cost.

Soft Augmentation for Image Classification
Yang Liu, Shen Yan, Laura Leal-Taixé, James Hays, Deva Ramanan
CVPR, 2023  
arXiv / code / bibtex

Soft augmentations produce better calibrated models on occluded examples.

Multiview Transformers for Video Recognition
Shen Yan, Xuehan Xiong, Anurag Arnab, Zhichao Lu, Mi Zhang, Chen Sun, Cordelia Schmid
CVPR, 2022  
arXiv / code / bibtex

A simple method for capturing multiresolution temporal context in transformers.

Deep AutoAugment
Yu Zheng, Zhi Zhang, Shen Yan, Mi Zhang
ICLR, 2022  
arXiv / code / bibtex / slides

Build a data augmentation policy progressively based on regularized gradient matching.

NAS-Bench-x11 and the Power of Learning Curves
Shen Yan*, Colin White*, Yash Savani, Frank Hutter
NeurIPS, 2021  
arXiv / code / bibtex / slides

A surrogate method to create multi-fidelity NAS benchmarks.

CATE: Computation-aware Neural Architecture Encoding with Transformers
Shen Yan, Kaiqiang Song, Fei Liu, Mi Zhang
ICML, 2021 (Long Presentation)
video: 17 min/ arXiv / code / bibtex

Pre-training computation-aware architecture embeddings can also help with architecture search.

Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?
Shen Yan, Yu Zheng, Wei Ao, Xiao Zeng, Mi Zhang
NeurIPS, 2020  
video: 3 min/ arXiv / code / bibtex

Pre-training structure-aware architecture embeddings help architecture search.

MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution
Taojiannan Yang, Sijie Zhu, Chen Chen, Shen Yan, Mi Zhang, Andrew Wills
ECCV, 2020   (Oral)
video: 10 min/ arXiv / code / bibtex

Mutual learning with input resolution and network width improves accuracy-efficiency tradeoffs.

Improve Unsupervised Domain Adaptation with Mixup Training
Shen Yan, Huan Song, Nanxiang Li, Lincan Zou, Liu Ren
arXiv, 2020
arXiv / code / bibtex

Mixup can also help with unsupervised domain adaptation.

Deep Fisher Faces
Harald Hanselmann, Shen Yan, Hermann Ney
BMVC, 2017
bibtex

We extend the center loss with an inter-class loss reminiscent of the popular early face recognition approach Fisherfaces.

Service

PC member, AutoML Workshop, ICML 2021

PC member, NAS Workshop, ICLR 2021

Reviewer, ICML 2020, 2021, 2022, 2023

Reviewer, ICLR 2021, 2022, 2023

Reviewer, NeurIPS 2020, 2021, 2022, 2023

Reviewer, CVPR 2021, 2022, 2023

Reviewer, ICCV 2021, 2023

Reviewer, ECCV 2022

Reviewer, TMLR 2022, 2023

Reviewer, PAMI 2022, 2023
TA for Bachelor, Kinect Programming, Fall 2015

This guy makes a nice webpage.