2024-05-03 |
Customizing Text-to-Image Models with a Single Image Pair |
Maxwell Jones, Sheng-Yu Wang, Nupur Kumari, David Bau, Jun-Yan Zhu |
|
2024-05-03 |
FLAME: Factuality-Aware Alignment for Large Language Models |
Sheng-Chieh Lin, Luyu Gao, Barlas Oguz, Wenhan Xiong, Jimmy Lin, Wen-tau Yih, Xilun Chen |
|
2024-05-03 |
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment |
Gerald Shen, Zhilin Wang, Olivier Delalleau, Jiaqi Zeng, Yi Dong, Daniel Egert, Shengyang Sun, Jimmy Zhang, Sahil Jain, Ali Taghibakhshi, Markel Sanz Ausin, Ashwath Aithal, Oleksii Kuchaiev |
|
2024-05-03 |
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models |
Seungone Kim, Juyoung Suk, Shayne Longpre, Bill Yuchen Lin, Jamin Shin, Sean Welleck, Graham Neubig, Moontae Lee, Kyungjae Lee, Minjoon Seo |
|
2024-05-03 |
LLM-AD: Large Language Model based Audio Description System |
Peng Chu, Jiang Wang, Andre Abrantes |
|
2024-05-03 |
WildChat: 1M ChatGPT Interaction Logs in the Wild |
Wenting Zhao, Xiang Ren, Jack Hessel, Claire Cardie, Yejin Choi, Yuntian Deng |
|
2024-05-03 |
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report |
Justin Zhao, Timothy Wang, Wael Abid, Geoffrey Angus, Arnav Garg, Jeffery Kinnison, Alex Sherstinsky, Piero Molino, Travis Addair, Devvret Rishi |
|
2024-05-03 |
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation |
Yupeng Zhou, Daquan Zhou, Ming-Ming Cheng, Jiashi Feng, Qibin Hou |
|
2024-05-02 |
Automatic Creative Selection with Cross-Modal Matching |
Alex Kim, Jia Huang, Rob Monarch, Jerry Kwac, Anikesh Kamath, Parmeshwar Khurd, Kailash Thiyagarajan, Goodman Gu |
|
2024-05-02 |
Paint by Inpaint: Learning to Add Image Objects by Removing Them First |
Navve Wasserman, Noam Rotstein, Roy Ganz, Ron Kimmel |
|
2024-05-02 |
Self-Play Preference Optimization for Language Model Alignment |
Yue Wu, Zhiqing Sun, Huizhuo Yuan, Kaixuan Ji, Yiming Yang, Quanquan Gu |
|
2024-05-02 |
STT: Stateful Tracking with Transformers for Autonomous Driving |
cs.RO ‧ Longlong Jing, Ruichi Yu, Xu Chen, Zhengli Zhao, Shiwei Sheng, Colin Graber, Qi Chen, Qinru Li, Shangxuan Wu, Han Deng, Sangjin Lee, Chris Sweeney, Qiurui He, Wei-Chih Hung, Tong He, Xingyi Zhou, Farshid Moussavi, Zijian Guo, Yin Zhou, Mingxing Ta |
|
2024-05-02 |
Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge |
Bin Xiao, Chunan Shi, Xiaonan Nie, Fan Yang, Xiangwei Deng, Lei Su, Weipeng Chen, Bin Cui |
|
2024-05-02 |
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound |
Haohe Liu, Xuenan Xu, Yi Yuan, Mengyue Wu, Wenwu Wang, Mark D. Plumbley |
|
2024-05-02 |
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3 |
Junsang Yoon, Akshat Gupta, Gopala Anumanchipalli |
|
2024-05-02 |
A Careful Examination of LLM Performance on Grade School Arithmetic |
andy99 |
|
2024-05-02 |
Spectrally Pruned Gaussian Fields with Neural Compensation |
Runyi Yang, Zhenxin Zhu, Zhou Jiang, Baijun Ye, Xiaoxue Chen, Yifei Zhang, Yuantao Chen, Jian Zhao, Hao Zhao |
|
2024-05-01 |
Lightplane: Highly-Scalable Components for Neural 3D Fields |
Ang Cao, Justin Johnson, Andrea Vedaldi, David Novotny |
|
2024-05-01 |
MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model |
Wenxun Dai, Ling-Hao Chen, Jingbo Wang, Jinpeng Liu, Bo Dai, Yansong Tang |
|
2024-05-01 |
Kan: Kolmogorov–Arnold Networks |
wojciem |
|
2024-05-01 |
Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting |
Paul Engstler, Andrea Vedaldi, Iro Laina, Christian Rupprecht |
|
2024-05-01 |
MicroDreamer: Zero-shot 3D Generation in $sim$20 Seconds by Score-based Iterative Reconstruction |
Luxi Chen, Zhengyi Wang, Chongxuan Li, Tingting Gao, Hang Su, Jun Zhu |
|
2024-05-01 |
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation |
Yunhao Ge, Xiaohui Zeng, Jacob Samuel Huffman, Tsung-Yi Lin, Ming-Yu Liu, Yin Cui |
|
2024-05-01 |
Extending Llama-3's Context Ten-Fold Overnight |
Peitian Zhang, Ninglu Shao, Zheng Liu, Shitao Xiao, Hongjin Qian, Qiwei Ye, Zhicheng Dou |
|
2024-05-01 |
Octopus v4: Graph of language models |
Wei Chen, Zhiyuan Li |
|
2024-05-01 |
Iterative Reasoning Preference Optimization |
Richard Yuanzhe Pang, Weizhe Yuan, Kyunghyun Cho, He He, Sainbayar Sukhbaatar, Jason Weston |
|
2024-05-01 |
SAGS: Structure-Aware 3D Gaussian Splatting |
Evangelos Ververas, Rolandos Alexandros Potamias, Jifei Song, Jiankang Deng, Stefanos Zafeiriou |
|
2024-05-01 |
Better & Faster Large Language Models via Multi-token Prediction |
Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozière, David Lopez-Paz, Gabriel Synnaeve |
|
2024-05-01 |
GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting |
Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, Zexiang Xu |
|
2024-05-01 |
DOCCI: Descriptions of Connected and Contrasting Images |
Yasumasa Onoe, Sunayana Rane, Zachary Berger, Yonatan Bitton, Jaemin Cho, Roopal Garg, Alexander Ku, Zarana Parekh, Jordi Pont-Tuset, Garrett Tanzer, Su Wang, Jason Baldridge |
|
2024-05-01 |
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation |
Chanran Kim, Jeongin Lee, Shichang Joung, Bongmo Kim, Yeul-Min Baek |
|
2024-04-30 |
Stylus: Automatic Adapter Selection for Diffusion Models |
Michael Luo, Justin Wong, Brandon Trabucco, Yanping Huang, Joseph E. Gonzalez, Zhifeng Chen, Ruslan Salakhutdinov, Ion Stoica |
|
2024-04-30 |
DressCode: Autoregressively Sewing and Generating Garments from Text Guidance |
Kai He, Kaixin Yao, Qixuan Zhang, Jingyi Yu, Lingjie Liu, Lan Xu |
|
2024-04-30 |
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting |
Fangcheng Liu, Yehui Tang, Zhenhua Liu, Yunsheng Ni, Kai Han, Yunhe Wang |
|
2024-04-30 |
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models |
PaulHoule |
|
2024-04-30 |
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models |
Pat Verga, Sebastian Hofstatter, Sophia Althammer, Yixuan Su, Aleksandra Piktus, Arkady Arkhangorodsky, Minjie Xu, Naomi White, Patrick Lewis |
|
2024-04-30 |
Capabilities of Gemini Models in Medicine |
Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike Schaekermann, Aishwarya Kamath, Yong Cheng, David G. T. Barrett, Cathy Cheung, B |
|
2024-04-30 |
LEGENT: Open Platform for Embodied Agents |
Zhili Cheng, Zhitong Wang, Jinyi Hu, Shengding Hu, An Liu, Yuge Tu, Pengkai Li, Lei Shi, Zhiyuan Liu, Maosong Sun |
|
2024-04-30 |
Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations |
Puhao Li, Tengyu Liu, Yuyang Li, Muzhi Han, Haoran Geng, Shu Wang, Yixin Zhu, Song-Chun Zhu, Siyuan Huang |
|
2024-04-29 |
MaPa: Text-driven Photorealistic Material Painting for 3D Shapes |
Shangzhan Zhang, Sida Peng, Tao Xu, Yuanbo Yang, Tianrun Chen, Nan Xue, Yujun Shen, Hujun Bao, Ruizhen Hu, Xiaowei Zhou |
|
2024-04-29 |
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs |
zerojames |
|
2024-04-29 |
HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections |
Chen Dudai, Morris Alper, Hana Bezalel, Rana Hanocka, Itai Lang, Hadar Averbuch-Elor |
|
2024-04-29 |
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning |
Lin Xu, Yilin Zhao, Daquan Zhou, Zhijie Lin, See Kiong Ng, Jiashi Feng |
|
2024-04-26 |
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings |
Olivia Wiles, Chuhan Zhang, Isabela Albuquerque, Ivana Kajić, Su Wang, Emanuele Bugliarello, Yasumasa Onoe, Chris Knutsen, Cyrus Rashtchian, Jordi Pont-Tuset, Aida Nematzadeh |
|
2024-04-26 |
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs |
An Yan, Zhengyuan Yang, Junda Wu, Wanrong Zhu, Jianwei Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Julian McAuley, Jianfeng Gao, Lijuan Wang |
|
2024-04-26 |
NeRF-XL: Scaling NeRFs with Multiple GPUs |
Ruilong Li, Sanja Fidler, Angjoo Kanazawa, Francis Williams |
|
2024-04-26 |
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving |
Jiehui Huang, Xiao Dong, Wenhui Song, Hanhui Li, Jun Zhou, Yuhao Cheng, Shutao Liao, Long Chen, Yiqiang Yan, Shengcai Liao, Xiaodan Liang |
|
2024-04-26 |
Make Your LLM Fully Utilize the Context |
Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou |
|
2024-04-26 |
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding |
Mostafa Elhoushi, Akshat Shrivastava, Diana Liskovich, Basil Hosmer, Bram Wasti, Liangzhen Lai, Anas Mahmoud, Bilge Acun, Saurabh Agarwal, Ahmed Roman, Ahmed A Aly, Beidi Chen, Carole-Jean Wu |
|
2024-04-26 |
Tele-FLM Technical Report |
Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Chao Wang, Xinzhang Liu, Zihan Wang, Yu Zhao, Xin Wang, Yuyao Huang, Shuangyong Song, Yongxiang Li, Zheng Zhang, Bo Zhao, Aixin Sun, Yequan Wang, Zhongjiang He, Zhongyuan Wang, Xuelong Li, Tiejun Huang |
|
2024-04-26 |
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites |
Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, |
|
2024-04-26 |
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension |
Bohao Li, Yuying Ge, Yi Chen, Yixiao Ge, Ruimao Zhang, Ying Shan |
|
2024-04-26 |
Interactive3D: Create What You Want by Interactive 3D Generation |
Shaocong Dong, Lihe Ding, Zhanpeng Huang, Zibin Wang, Tianfan Xue, Dan Xu |
|
2024-04-26 |
MoDE: CLIP Data Experts via Clustering |
Jiawei Ma, Po-Yao Huang, Saining Xie, Shang-Wen Li, Luke Zettlemoyer, Shih-Fu Chang, Wen-Tau Yih, Hu Xu |
|
2024-04-26 |
MaGGIe: Masked Guided Gradual Human Instance Matting |
Chuong Huynh, Seoung Wug Oh, Abhinav Shrivastava, Joon-Young Lee |
|
2024-04-26 |
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference |
João Monteiro, Étienne Marcotte, Pierre-André Noël, Valentina Zantedeschi, David Vázquez, Nicolas Chapados, Christopher Pal, Perouz Taslakian |
|
2024-04-26 |
Editable Image Elements for Controllable Synthesis |
Jiteng Mu, Michaël Gharbi, Richard Zhang, Eli Shechtman, Nuno Vasconcelos, Xiaolong Wang, Taesung Park |
|
2024-04-26 |
BASS: Batched Attention-optimized Speculative Sampling |
Haifeng Qian, Sujan Kumar Gonugondla, Sungsoo Ha, Mingyue Shang, Sanjay Krishna Gouda, Ramesh Nallapati, Sudipta Sengupta, Xiaofei Ma, Anoop Deoras |
|
2024-04-26 |
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data |
Sachin Mehta, Maxwell Horton, Fartash Faghri, Mohammad Hossein Sekhavat, Mahyar Najibi, Mehrdad Farajtabar, Oncel Tuzel, Mohammad Rastegari |
|
2024-04-25 |
MotionMaster: Training-free Camera Motion Transfer For Video Generation |
Teng Hu, Jiangning Zhang, Ran Yi, Yating Wang, Hongrui Huang, Jieyu Weng, Yabiao Wang, Lizhuang Ma |
|
2024-04-25 |
PuLID: Pure and Lightning ID Customization via Contrastive Alignment |
Zinan Guo, Yanze Wu, Zhuowei Chen, Lang Chen, Qian He |
|
2024-04-25 |
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning |
Weifeng Chen, Jiacheng Zhang, Jie Wu, Hefeng Wu, Xuefeng Xiao, Liang Lin |
|
2024-04-24 |
Transformers Can Represent $n$-gram Language Models |
Anej Svete, Ryan Cotterell |
|
2024-04-24 |
FlashSpeech: Efficient Zero-Shot Speech Synthesis |
Zhen Ye, Zeqian Ju, Haohe Liu, Xu Tan, Jianyi Chen, Yiwen Lu, Peiwen Sun, Jiahao Pan, Weizhen Bian, Shulin He, Qifeng Liu, Yike Guo, Wei Xue |
|
2024-04-24 |
Pegasus-v1 Technical Report |
Raehyuk Jung, Hyojun Go, Jaehyuk Yi, Jiho Jang, Daniel Kim, Jay Suh, Aiden Lee, Cooper Han, Jae Lee, Jeff Kim, Jin-Young Kim, Junwan Kim, Kyle Park, Lucas Lee, Mars Ha, Minjoon Seo, Abraham Jo, Ed Park, Hassan Kianinejad, SJ Kim, Tony Moon, Wade Jeong, An |
|
2024-04-24 |
Multi-Head Mixture-of-Experts |
Xun Wu, Shaohan Huang, Wenhui Wang, Furu Wei |
|
2024-04-24 |
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework |
Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari |
|
2024-04-24 |
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models |
Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis |
|
2024-04-24 |
SnapKV: LLM Knows What You are Looking for Before Generation |
Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, Deming Chen |
|
2024-04-23 |
Learning H-Infinity Locomotion Control |
cs.RO ‧ Junfeng Long, Wenye Yu, Quanyi Li, Zirui Wang, Dahua Lin, Jiangmiao Pang |
|
2024-04-23 |
Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer |
Eric Brachmann, Jamie Wynn, Shuai Chen, Tommaso Cavallari, Áron Monszpart, Daniyar Turmukhambetov, Victor Adrian Prisacariu |
|
2024-04-23 |
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions |
tosh |
|
2024-04-23 |
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study |
Wei Huang, Xudong Ma, Haotong Qin, Xingyu Zheng, Chengtao Lv, Hong Chen, Jie Luo, Xiaojuan Qi, Xianglong Liu, Michele Magno |
|
2024-04-23 |
FlowMind: Automatic Workflow Generation with LLMs |
Zhen Zeng, William Watson, Nicole Cho, Saba Rahimi, Shayleen Reynolds, Tucker Balch, Manuela Veloso |
|
2024-04-23 |
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis |
Yuxi Ren, Xin Xia, Yanzuo Lu, Jiacheng Zhang, Jie Wu, Pan Xie, Xing Wang, Xuefeng Xiao |
|
2024-04-23 |
A Multimodal Automated Interpretability Agent |
Tamar Rott Shaham, Sarah Schwettmann, Franklin Wang, Achyuta Rajaram, Evan Hernandez, Jacob Andreas, Antonio Torralba |
|
2024-04-23 |
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation |
Yuying Ge, Sijie Zhao, Jinguo Zhu, Yixiao Ge, Kun Yi, Lin Song, Chen Li, Xiaohan Ding, Ying Shan |
|
2024-04-23 |
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone |
Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Caio César Teodoro Mendes, Weizh |
|
2024-04-23 |
MultiBooth: Towards Generating All Your Concepts in an Image from Text |
Chenyang Zhu, Kai Li, Yue Ma, Chunming He, Li Xiu |
|
2024-04-23 |
Music Consistency Models |
Zhengcong Fei, Mingyuan Fan, Junshi Huang |
|
2024-04-22 |
How Far Can We Go with Practical Function-Level Program Repair? |
Jiahong Xiang, Xiaoyang Xu, Fanchu Kong, Mingyuan Wu, Haotian Zhang, Yuqun Zhang |
|
2024-04-22 |
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation |
Wenhao Huang, Chenghao Peng, Zhixu Li, Jiaqing Liang, Yanghua Xiao, Liqian Wen, Zulong Chen |
|
2024-04-22 |
LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency |
Zhaodonghui Li, Haitao Yuan, Huiming Wang, Gao Cong, Lidong Bing |
|
2024-04-22 |
TextSquare: Scaling up Text-Centric Visual Instruction Tuning |
Jingqun Tang, Chunhui Lin, Zhen Zhao, Shu Wei, Binghong Wu, Qi Liu, Hao Feng, Yang Li, Siqi Wang, Lei Liao, Wei Shi, Yuliang Liu, Hao Liu, Yuan Xie, Xiang Bai, Can Huang |
|
2024-04-22 |
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models |
Chuofan Ma, Yi Jiang, Jiannan Wu, Zehuan Yuan, Xiaojuan Qi |
|
2024-04-22 |
Does Gaussian Splatting need SFM Initialization? |
Yalda Foroutan, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi |
|
2024-04-22 |
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation |
Tianyuan Zhang, Hong-Xing Yu, Rundi Wu, Brandon Y. Feng, Changxi Zheng, Noah Snavely, Jiajun Wu, William T. Freeman |
|
2024-04-19 |
MeshLRM: Large Reconstruction Model for High-Quality Mesh |
Xinyue Wei, Kai Zhang, Sai Bi, Hao Tan, Fujun Luan, Valentin Deschaintre, Kalyan Sunkavalli, Hao Su, Zexiang Xu |
|
2024-04-19 |
EdgeFusion: On-Device Text-to-Image Generation |
Thibault Castells, Hyoung-Kyu Song, Tairen Piao, Shinkook Choi, Bo-Kyeong Kim, Hanyoung Yim, Changgwun Lee, Jae Gon Kim, Tae-Ho Kim |
|
2024-04-19 |
Dynamic Typography: Bringing Words to Life |
Zichen Liu, Yihao Meng, Hao Ouyang, Yue Yu, Bolin Zhao, Daniel Cohen-Or, Huamin Qu |
|
2024-04-19 |
Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment |
Zhaofeng Wu, Ananth Balashankar, Yoon Kim, Jacob Eisenstein, Ahmad Beirami |
|
2024-04-19 |
MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation |
Kuan-Chieh, Wang, Daniil Ostashev, Yuwei Fang, Sergey Tulyakov, Kfir Aberman |
|
2024-04-19 |
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing |
milliondreams |
|
2024-04-19 |
Introducing v0.5 of the AI Safety Benchmark from MLCommons |
Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu C |
|
2024-04-19 |
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding |
Hanshi Sun, Zhuoming Chen, Xinyu Yang, Yuandong Tian, Beidi Chen |
|
2024-04-19 |
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data |
Chandeepa Dissanayake, Lahiru Lowe, Sachith Gunasekara, Yasiru Ratnayake |
|
2024-04-19 |
BLINK: Multimodal Large Language Models Can See but Not Perceive |
Xingyu Fu, Yushi Hu, Bangzheng Li, Yu Feng, Haoyu Wang, Xudong Lin, Dan Roth, Noah A. Smith, Wei-Chiu Ma, Ranjay Krishna |
|
2024-04-19 |
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models |
Aitor Ormazabal, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Deyu Fu, Donovan Ong, Eric Chen, Eugenie Lamprecht, Hai Pham, Isaac Ong, Kaloyan Aleksiev, Lei Li, Matthew Henderson, Max Bain, Mikel Artetxe, Nishant Relan, Piotr Padlewski, Q |
|
2024-04-19 |
AniClipart: Clipart Animation with Text-to-Video Priors |
Ronghuan Wu, Wanchao Su, Kede Ma, Jing Liao |
|
2024-04-18 |
Long-form music generation with latent diffusion |
doodlesdev |
|