Yuzhang Shang

Office: UCF Global 216

I am an Assistant Professor in the Department of Computer Science and Institute of Artificial Intelligence at University of Central Florida (UCF). I am directing the Efficient, Scalable and Accelerated Learning Laboratory (EXceL-Lab).

I obtained my Ph.D. in Computer Science at the Illinois Institute of Technology (2021-2025) advised by Prof. Yan Yan (now at University of Illinois Chicago, UIC) and co-advised by Prof. Peng-Jun Wan.

During my Ph.D. journey, I was awarded MLCommons Rising Star in 2025. I was a visiting student at the University of Wisconsin-Madison hosted by Prof. Yong Jae Lee (lab famous for LLaVA). I interned as a researcher at Google DeepMind working with Daniele Moro, Weijun Wang and Andrew Howard (team famous for MobileNet), and I interned as a research scientist at Cisco Research working with Gaowen Liu and Ramana Kompella. Before Ph.D., I worked as research assistants at Shandong University and Hong Kong University of Science and Technology (HKUST) under the supervision of Prof. Liqiang Nie and Prof. Dan Xu, respectively. I received my bachelor’s degrees in Applied Mathematics and Economics (dual degrees) at Wuhan University.

[Hiring!] I am actively looking for postdocs, Ph.D. students, visiting scholars/students, and master/undergraduate researchers. If you are interested in working with me, feel free to contact me via yuzhang.hire@gmail.com. UCF@CSRankings: Computer Vision (8th), AI (44th), Overall (54th).

Besides research, I am a contract photographer for Shutterstock Images and Getty Images (my portfolio).

Selected Publications

For a full list of publications, please refer to this page.

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

Yuzhang Shang, Mu Cai, Bingxin Xu, Yong Jae Lee, and Yan Yan

International Conference on Computer Vision (ICCV) 2025

arXiv Code

The first token reduction method for accelerating Multimodal LLM.
PB-LLM: Partially Binarized Large Language Models

Yuzhang Shang, Zhihang Yuan, and Zhen Dong

International Conference on Learning Representations (ICLR) 2024

arXiv Code

The first binarization exploration for LLMs.
PTQ4DM: Post-training Quantization on Diffusion Models

Yuzhang Shang, Zhihang Yuan, Bin Xie, Bingzhe Wu, and Yan Yan

Computer Vision and Pattern Recognition (CVPR) 2023

arXiv Code

The first network compression method for diffusion models.
ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models

Zhihang Yuan*, Yuzhang Shang*, Yue Song, Qiang Wu, Yan Yan, and Guangyu Sun

arXiv Dec. 2023

arXiv Code

The first low-rank decomposition method for LLMs. The concept of low-rank attention was later adopted in DeepSeek-v2 (six months afterward).