2025-02-13 |
Learning Conformal Abstention Policies for Adaptive Risk Management in Large Language and Vision-Language Models |
Sina Tayebati, Divake Kumar, Nastaran Darabi, Dinithi Jayasuriya, Ranganath Krishnan, Amit Ranjan Trivedi |
|
2025-02-13 |
Pippo: High-Resolution Multi-View Humans from a Single Image |
Yash Kant, Ethan Weber, Jin Kyu Kim, Rawal Khirodkar, Su Zhaoen, Julieta Martinez, Igor Gilitschenski, Shunsuke Saito, Timur Bagautdinov |
|
2025-02-12 |
Hypencoder: Hypernetworks for Information Retrieval |
Julian Killingback, Hansi Zeng, Hamed Zamani |
|
2025-02-12 |
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving |
Yong Lin, Shange Tang, Bohan Lyu, Jiayun Wu, Hongzhou Lin, Kaiyu Yang, Jia Li, Mengzhou Xia, Danqi Chen, Sanjeev Arora, Chi Jin |
|
2025-02-12 |
Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models |
Samuel Stevens, Wei-Lun Chao, Tanya Berger-Wolf, Yu Su |
|
2025-02-12 |
CoS: Chain-of-Shot Prompting for Long Video Understanding |
Jian Hu, Zixu Cheng, Chenyang Si, Wei Li, Shaogang Gong |
|
2025-02-12 |
Gemstones: A Model Suite for Multi-Faceted Scaling Laws |
Sean McLeish, John Kirchenbauer, David Yu Miller, Siddharth Singh, Abhinav Bhatele, Micah Goldblum, Ashwinee Panda, Tom Goldstein |
|
2025-02-12 |
Retrieval-augmented Large Language Models for Financial Time Series Forecasting |
Mengxi Xiao, Zihao Jiang, Lingfei Qian, Zhengyu Chen, Yueru He, Yijing Xu, Yuecheng Jiang, Dong Li, Ruey-Ling Weng, Min Peng, Jimin Huang, Sophia Ananiadou, Qianqian Xie |
|
2025-02-12 |
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More |
Xialie Zhuang, Zhikai Jia, Jianjin Li, Zhenyu Zhang, Li Shen, Zheng Cao, Shiwei Liu |
|
2025-02-12 |
Skill Expansion and Composition in Parameter Space |
Tenglong Liu, Jianxiong Li, Yinan Zheng, Haoyi Niu, Yixing Lan, Xin Xu, Xianyuan Zhan |
|
2025-02-12 |
Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents |
Ilia Karmanov, Amala Sanjay Deshmukh, Lukas Voegtle, Philipp Fischer, Kateryna Chumachenko, Timo Roman, Jarno Seppänen, Jupinder Parmar, Joseph Jennings, Andrew Tao, Karan Sapra |
|
2025-02-12 |
Expect the Unexpected: FailSafe Long Context QA for Finance |
Kiran Kamble, Melisa Russak, Dmytro Mozolevskyi, Muayad Ali, Mateusz Russak, Waseem AlShikh |
|
2025-02-12 |
FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks |
Luca Della Libera, Francesco Paissan, Cem Subakan, Mirco Ravanelli |
|
2025-02-12 |
Teaching Language Models to Critique via Reinforcement Learning |
Zhihui Xie, Jie chen, Liyu Chen, Weichao Mao, Jingjing Xu, Lingpeng Kong |
|
2025-02-12 |
Magic 1-For-1: Generating One Minute Video Clips within One Minute |
Hongwei Yi, Shitong Shao, Tian Ye, Jiantong Zhao, Qingyu Yin, Michael Lingelbach, Li Yuan, Yonghong Tian, Enze Xie, Daquan Zhou |
|
2025-02-12 |
Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon |
Nurit Cohen-Inger, Yehonatan Elisha, Bracha Shapira, Lior Rokach, Seffi Cohen |
|
2025-02-12 |
VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation |
Sixiao Zheng, Zimian Peng, Yanpeng Zhou, Yi Zhu, Hang Xu, Xiangru Huang, Yanwei Fu |
|
2025-02-12 |
CAD-Editor: A Locate-then-Infill Framework with Automated Training Data Synthesis for Text-Based CAD Editing |
Yu Yuan, Shizhao Sun, Qi Liu, Jiang Bian |
|
2025-02-12 |
Enhance-A-Video: Better Generated Video for Free |
Yang Luo, Xuanlei Zhao, Mengzhao Chen, Kaipeng Zhang, Wenqi Shao, Kai Wang, Zhangyang Wang, Yang You |
|
2025-02-12 |
Auditing Prompt Caching in Language Model APIs |
Chenchen Gu, Xiang Lisa Li, Rohith Kuditipudi, Percy Liang, Tatsunori Hashimoto |
|
2025-02-12 |
NatureLM: Deciphering the Language of Nature for Scientific Discovery |
Yingce Xia, Peiran Jin, Shufang Xie, Liang He, Chuan Cao, Renqian Luo, Guoqing Liu, Yue Wang, Zequn Liu, Yuan-Jyue Chen, Zekun Guo, Yeqi Bai, Pan Deng, Yaosen Min, Ziheng Lu, Hongxia Hao, Han Yang, Jielan Li, Chang Liu, Jia Zhang, Jianwei Zhu, Kehan Wu, W |
|
2025-02-12 |
Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training |
Yuchen Zhuang, Jingfeng Yang, Haoming Jiang, Xin Liu, Kewei Cheng, Sanket Lokegaonkar, Yifan Gao, Qing Ping, Tianyi Liu, Binxuan Huang, Zheng Li, Zhengyang Wang, Pei Chen, Ruijie Wang, Rongzhi Zhang, Nasser Zalmout, Priyanka Nigam, Bing Yin, Chao Zhang |
|
2025-02-12 |
Scaling Pre-training to One Hundred Billion Data for Vision Language Models |
Xiao Wang, Ibrahim Alabdulmohsin, Daniel Salz, Zhe Li, Keran Rong, Xiaohua Zhai |
|
2025-02-12 |
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction |
Junlong Li, Daya Guo, Dejian Yang, Runxin Xu, Yu Wu, Junxian He |
|
2025-02-12 |
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! |
Dacheng Li, Shiyi Cao, Tyler Griggs, Shu Liu, Xiangxi Mo, Shishir G. Patil, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica |
|
2025-02-12 |
Competitive Programming with Large Reasoning Models |
OpenAI, Ahmed El-Kishky, Alexander Wei, Andre Saraiva, Borys Minaev, Daniel Selsam, David Dohan, Francis Song, Hunter Lightman, Ignasi Clavera, Jakub Pachocki, Jerry Tworek, Lorenz Kuhn, Lukasz Kaiser, Mark Chen, Max Schwarzer, Mostafa Rohaninejad, Nat Mc |
|
2025-02-12 |
Forbidden Science: Dual-Use AI Challenge Benchmark and Scientific Refusal Tests |
David Noever, Forrest McKee |
|
2025-02-12 |
Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE |
Haiduo Huang, Fuwei Yang, Zhenhua Liu, Yixing Xu, Jinze Li, Yang Liu, Xuanwu Yin, Dong Li, Pengju Ren, Emad Barsoum |
|
2025-02-12 |
Towards Internet-Scale Training For Agents |
Brandon Trabucco, Gunnar Sigurdsson, Robinson Piramuthu, Ruslan Salakhutdinov |
|
2025-02-12 |
Embodied Red Teaming for Auditing Robotic Foundation Models |
Sathwik Karnik, Zhang-Wei Hong, Nishant Abhangi, Yen-Chen Lin, Tsun-Hsuan Wang, Christophe Dupuy, Rahul Gupta, Pulkit Agrawal |
|
2025-02-11 |
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging |
Md. Ashraful Islam, Mohammed Eunus Ali, Md Rizwan Parvez |
|
2025-02-11 |
The Curse of Depth in Large Language Models |
Wenfang Sun, Xinyuan Song, Pengxiang Li, Lu Yin, Yefeng Zheng, Shiwei Liu |
|
2025-02-11 |
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning |
Bidipta Sarkar, Warren Xia, C. Karen Liu, Dorsa Sadigh |
|
2025-02-11 |
SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators |
Daniil Moskovskiy, Nikita Sushko, Sergey Pletenev, Elena Tutubalina, Alexander Panchenko |
|
2025-02-11 |
DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization |
Zhenglin Zhou, Xiaobo Xia, Fan Ma, Hehe Fan, Yi Yang, Tat-Seng Chua |
|
2025-02-11 |
Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation |
Chenkai Xu, Xu Wang, Zhenyi Liao, Yishun Li, Tianqi Hou, Zhijie Deng |
|
2025-02-11 |
MetaChain: A Fully-Automated and Zero-Code Framework for LLM Agents |
Jiabin Tang, Tianyu Fan, Chao Huang |
|
2025-02-11 |
Matryoshka Quantization |
qianli_cs |
|
2025-02-11 |
Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT |
Dongyang Liu, Shicheng Li, Yutong Liu, Zhen Li, Kai Wang, Xinyue Li, Qi Qin, Yufei Liu, Yi Xin, Zhongyu Li, Bin Fu, Chenyang Si, Yuewen Cao, Conghui He, Ziwei Liu, Yu Qiao, Qibin Hou, Hongsheng Li, Peng Gao |
|
2025-02-11 |
History-Guided Video Diffusion |
Kiwhan Song, Boyuan Chen, Max Simchowitz, Yilun Du, Russ Tedrake, Vincent Sitzmann |
|
2025-02-11 |
CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers |
D. She, Mushui Liu, Jingxuan Pang, Jin Wang, Zhen Yang, Wanggui He, Guanghao Zhang, Yi Wang, Qihan Huang, Haobin Tang, Yunlong Yu, Siming Fu |
|
2025-02-11 |
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling |
Runze Liu, Junqi Gao, Jian Zhao, Kaiyan Zhang, Xiu Li, Biqing Qi, Wanli Ouyang, Bowen Zhou |
|
2025-02-11 |
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning |
Chengqi Lyu, Songyang Gao, Yuzhe Gu, Wenwei Zhang, Jianfei Gao, Kuikun Liu, Ziyi Wang, Shuaibin Li, Qian Zhao, Haian Huang, Weihan Cao, Jiangning Liu, Hongwei Liu, Junnan Liu, Songyang Zhang, Dahua Lin, Kai Chen |
|
2025-02-11 |
Lossless Acceleration of Large Language Models with Hierarchical Drafting based on Temporal Locality in Speculative Decoding |
Sukmin Cho, Sangjin Choi, Taeho Hwang, Jeongyeon Seo, Soyeong Jeong, Huije Lee, Hoyun Song, Jong C. Park, Youngjin Kwon |
|
2025-02-11 |
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates |
Ling Yang, Zhaochen Yu, Bin Cui, Mengdi Wang |
|
2025-02-11 |
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models |
Haiwen Diao, Xiaotong Li, Yufeng Cui, Yueze Wang, Haoge Deng, Ting Pan, Wenxuan Wang, Huchuan Lu, Xinlong Wang |
|
2025-02-11 |
Dual Caption Preference Optimization for Diffusion Models |
Amir Saeidi, Yiran Luo, Agneet Chatterjee, Shamanthak Hegde, Bimsara Pathiraja, Yezhou Yang, Chitta Baral |
|
2025-02-11 |
APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding |
Xinyu Yang, Tianqi Chen, Beidi Chen |
|
2025-02-11 |
Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM |
Qingshui Gu, Shu Li, Tianyu Zheng, Zhaoxiang Zhang |
|
2025-02-11 |
LM2: Large Memory Models |
Jikun Kang, Wenqi Wu, Filippos Christianos, Alex J. Chan, Fraser Greenlee, George Thomas, Marvin Purtorab, Andy Toulis |
|
2025-02-11 |
Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile |
Hangliang Ding, Dacheng Li, Runlong Su, Peiyuan Zhang, Zhijie Deng, Ion Stoica, Hao Zhang |
|
2025-02-11 |
The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering |
Zhuowei Li, Haizhou Shi, Yunhe Gao, Di Liu, Zhenting Wang, Yuxiao Chen, Ting Liu, Long Zhao, Hao Wang, Dimitris N. Metaxas |
|
2025-02-11 |
Adaptive Semantic Prompt Caching with VectorQ |
Luis Gaspar Schroeder, Shu Liu, Alejandro Cuadron, Mark Zhao, Stephan Krusche, Alfons Kemper, Matei Zaharia, Joseph E. Gonzalez |
|
2025-02-11 |
SPARC: Subspace-Aware Prompt Adaptation for Robust Continual Learning in LLMs |
Dinithi Jayasuriya, Sina Tayebati, Davide Ettori, Ranganath Krishnan, Amit Ranjan Trivedi |
|
2025-02-11 |
Intelligent Sensing-to-Action for Robust Autonomy at the Edge: Opportunities and Challenges |
Amit Ranjan Trivedi, Sina Tayebati, Hemant Kumawat, Nastaran Darabi, Divake Kumar, Adarsh Kumar Kosta, Yeshwanth Venkatesha, Dinithi Jayasuriya, Nethmi Jayasinghe, Priyadarshini Panda, Saibal Mukhopadhyay, Kaushik Roy |
|
2025-02-11 |
Continuous 3D Perception Model with Persistent State |
Qianqian Wang, Yifei Zhang, Aleksander Holynski, Alexei A. Efros, Angjoo Kanazawa |
|
2025-02-11 |
Value-Based Deep RL Scales Predictably |
Oleh Rybkin, Michal Nauman, Preston Fu, Charlie Snell, Pieter Abbeel, Sergey Levine, Aviral Kumar |
|
2025-02-10 |
Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs |
Rohit Saxena, Aryo Pradipta Gema, Pasquale Minervini |
|
2025-02-10 |
No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces |
Daniel Marczak, Simone Magistri, Sebastian Cygert, Bartłomiej Twardowski, Andrew D. Bagdanov, Joost van de Weijer |
|
2025-02-10 |
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference |
Zehua Pei, Lancheng Zou, Hui-Ling Zhen, Xianzhi Yu, Wulong Liu, Sinno Jialin Pan, Mingxuan Yuan, Bei Yu |
|
2025-02-10 |
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning |
Yuwei Yin, Giuseppe Carenini |
|
2025-02-10 |
QuEST: Stable Training of LLMs with 1-Bit Weights and Activations |
Andrei Panferov, Jiale Chen, Soroush Tabesh, Roberto L. Castro, Mahdi Nikdan, Dan Alistarh |
|
2025-02-10 |
Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More |
Feng Wang, Yaodong Yu, Guoyizhe Wei, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie |
|
2025-02-10 |
YINYANG-ALIGN: Benchmarking Contradictory Objectives and Proposing Multi-Objective Optimization based DPO for Text-to-Image Alignment |
Amitava Das, Yaswanth Narsupalli, Gurpreet Singh, Vinija Jain, Vasu Sharma, Suranjana Trivedy, Aman Chadha, Amit Sheth |
|
2025-02-10 |
QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation |
Yue Zhao, Fuzhao Xue, Scott Reed, Linxi Fan, Yuke Zhu, Jan Kautz, Zhiding Yu, Philipp Krähenbühl, De-An Huang |
|
2025-02-10 |
MEETING DELEGATE: Benchmarking LLMs on Attending Meetings on Our Behalf |
Lingxiang Hu, Shurun Yuan, Xiaoting Qin, Jue Zhang, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan, Qi Zhang |
|
2025-02-10 |
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails |
Yihe Deng, Yu Yang, Junkai Zhang, Wei Wang, Bo Li |
|
2025-02-10 |
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation |
Shilong Zhang, Wenbo Li, Shoufa Chen, Chongjian Ge, Peize Sun, Yida Zhang, Yi Jiang, Zehuan Yuan, Binyue Peng, Ping Luo |
|
2025-02-10 |
Fast Video Generation with Sliding Tile Attention |
Peiyuan Zhang, Yongqi Chen, Runlong Su, Hangliang Ding, Ion Stoica, Zhenghong Liu, Hao Zhang |
|
2025-02-10 |
AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting |
Chung-Ho Wu, Yang-Jung Chen, Ying-Huan Chen, Jie-Ying Lee, Bo-Hsu Ke, Chun-Wei Tuan Mu, Yi-Chuan Huang, Chin-Yang Lin, Min-Hung Chen, Yen-Yu Lin, Yu-Lun Liu |
|
2025-02-10 |
Goku: Flow Based Video Generative Foundation Models |
Shoufa Chen, Chongjian Ge, Yuqi Zhang, Yida Zhang, Fengda Zhu, Hao Yang, Hongxiang Hao, Hui Wu, Zhichao Lai, Yifei Hu, Ting-Che Lin, Shilong Zhang, Fu Li, Chuan Li, Xing Wang, Yanghua Peng, Peize Sun, Ping Luo, Yi Jiang, Zehuan Yuan, Bingyue Peng, Xiaobin |
|
2025-02-10 |
On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices |
Bosung Kim, Kyuhwan Lee, Isu Jeong, Jungmin Cheon, Yeojin Lee, Seulki Lee |
|
2025-02-10 |
Linear Correlation in LM's Compositional Generalization and Hallucination |
Letian Peng, Chenyang An, Shibo Hao, Chengyu Dong, Jingbo Shang |
|
2025-02-10 |
利用潜在推理扩大测试时间计算:递归深度方法 |
timbilt |
|
2025-02-10 |
Generating Symbolic World Models via Test-time Scaling of Large Language Models |
Zhouliang Yu, Yuhuan Yuan, Tim Z. Xiao, Fuxiang Frank Xia, Jie Fu, Ge Zhang, Ge Lin, Weiyang Liu |
|
2025-02-10 |
Agency Is Frame-Dependent |
David Abel, André Barreto, Michael Bowling, Will Dabney, Shi Dong, Steven Hansen, Anna Harutyunyan, Khimya Khetarpal, Clare Lyle, Razvan Pascanu, Georgios Piliouras, Doina Precup, Jonathan Richens, Mark Rowland, Tom Schaul, Satinder Singh |
|
2025-02-10 |
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models |
Xiao-Wen Yang, Xuan-Yi Zhu, Wen-Da Wei, Ding-Chu Zhang, Jie-Jing Shao, Zhi Zhou, Lan-Zhe Guo, Yu-Feng Li |
|
2025-02-10 |
VideoRoPE: What Makes for Good Video Rotary Position Embedding? |
Xilin Wei, Xiaoran Liu, Yuhang Zang, Xiaoyi Dong, Pan Zhang, Yuhang Cao, Jian Tong, Haodong Duan, Qipeng Guo, Jiaqi Wang, Xipeng Qiu, Dahua Lin |
|
2025-02-10 |
CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance |
Yongchao Chen, Yilun Hao, Yueying Liu, Yang Zhang, Chuchu Fan |
|
2025-02-08 |
ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features |
Alec Helbling, Tuna Han Salih Meral, Ben Hoover, Pinar Yanardag, Duen Horng Chau |
|