Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs
Luke J. Huang, Zhuoyang Zhang, Qinghao Hu, Shang Yang, Song Han
Hey, I'm Luke! I'm a Physics + CS student at MIT, where my research has spanned ML systems, RL for reasoning, and generative models. I'm currently on leave as a Member of Technical Staff Resident at OpenAI.
Luke J. Huang, Zhuoyang Zhang, Qinghao Hu, Shang Yang, Song Han
Zhuoyang Zhang*, Luke J. Huang*, Chengyue Wu, Shang Yang, Kelly Peng, Yao Lu, Song Han
Zhuoyang Zhang, Shang Yang, Qinghao Hu, Luke J. Huang, James Hou, Yifu Sun, Yao Lu, Song Han
Shiekh Zia Uddin*, Sachin Vaidya*, Sarthak Choudhary, Zhuo Chen, Ronald K. Salib, Luke J. Huang, Dirk R. Englund, Marin A. Soljacic