AI Research

Latest research papers from arXiv covering machine learning, computer vision, natural language processing, and more.

arXivPDF

Tokenisation via Convex Relaxations

Tokenisation is an integral part of the current NLP pipeline. Current tokenisation algorithms such as BPE and Unigram are greedy algorithms -- they make locally optimal decisions without considering the resulting vocabulary as a whole. We instead formulate tokeniser construction as a linear program ...

Jan Tempus, Philip Whittington, Craig W. Schmidt
May 21, 2026
arXivPDF

Integrable Elasticity via Neural Demand Potentials

We propose the Integrable Context-Dependent Demand Network (ICDN), a demand-first neural model for multiproduct retail demand. The model learns log-demand as a smooth, context-conditioned function of log-prices, allowing elasticities to be derived exactly from the learned demand surface. On the Domi...

Carlos Heredia, Daniel Roncel
May 21, 2026
arXivPDF

Cambrian-P: Pose-Grounded Video Understanding

Camera pose matters. The position and orientation of each viewpoint define a shared spatial coordinate frame that relates observations across video frames. Yet this signal is largely absent from multimodal LLMs (MLLMs) for video understanding, which process frames as isolated 2D snapshots, instead o...

Jihan Yang, Zifan Zhao, Xichen Pan
May 21, 2026
arXivPDF

MotiMotion: Motion-Controlled Video Generation with Visual Reasoning

Current motion-controlled image-to-video generation models rigidly follow user-provided trajectories that are often sparse, imprecise, and causally incomplete. Such reliance often yields unnatural or implausible outcomes, especially by missing secondary causal consequences. To address this, we intro...

Lee Hsin-Ying, Hanwen Jiang, Yiqun Mei
May 21, 2026
arXivPDF

Vector Policy Optimization: Training for Diversity Improves Test-Time Search

Language models must now generalize out of the box to novel environments and work inside inference-scaling search procedures, such as AlphaEvolve, that select rollouts with a variety of task-specific reward functions. Unfortunately, the standard paradigm of LLM post-training optimizes a pre-specifie...

Ryan Bahlous-Boldi, Isha Puri, Idan Shenfeld
May 21, 2026
arXivPDF

AwareVLN: Reasoning with Self-awareness for Vision-Language Navigation

Vision-and-Language Navigation (VLN) requires an agent to ground language instructions to its own movement within a visual environment. While state-of-the-art methods leverage the reasoning capabilities of Vision-Language Models (VLMs) for end-to-end action prediction, they often lack an explicit an...

Wenxuan Guo, Xiuwei Xu, Yichen Liu
May 21, 2026
arXivPDF

Remember to be Curious: Episodic Context and Persistent Worlds for 3D Exploration

Exploration is a prerequisite for learning useful behaviors in sparse-reward, long-horizon tasks, particularly within 3D environments. Curiosity-driven reinforcement learning addresses this via intrinsic rewards derived from the mismatch between the agent's predictive model of the world and reality....

Lily Goli, Justin Kerr, Daniele Reda
May 21, 2026
arXivPDF

GesVLA: Gesture-Aware Vision-Language-Action Model Embedded Representations

Vision-Language-Action (VLA) models have shown strong potential for general-purpose robot manipulation by unifying perception and action. However, existing VLA systems primarily rely on textual instructions and struggle to resolve spatial ambiguity in complex scenes with multiple similar objects. To...

Wenxuan Guo, Ziyuan Li, Meng Zhang
May 21, 2026
arXivPDF

Sensor2Sensor: Cross-Embodiment Sensor Conversion for Autonomous Driving

Robust training and validation of Autonomous Driving Systems (ADS) require massive, diverse datasets. Proprietary data collected by Autonomous Vehicle (AV) fleets, while high-fidelity, are limited in scale, diversity of sensor configurations, as well as geographic and long-tail-behavioral coverage. ...

Jiahao Wang, Bo Sun, Yijing Bai
May 21, 2026
arXivPDF

MOSS: Self-Evolution through Source-Level Rewriting in Autonomous Agent Systems

Autonomous agentic systems are largely static after deployment: they do not learn from user interactions, and recurring failures persist until the next human-driven update ships a fix. Self-evolving agents have emerged in response, but all confine evolution to text-mutable artifacts -- skill files, ...

Qianshu Cai, Yonggang Zhang, Xianzhang Jia
May 21, 2026
arXivPDF

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Linear attention replaces the unbounded cache of softmax attention with a fixed-size recurrent state, reducing sequence mixing to linear time and decoding to constant memory. The hard part is not just what to forget, but how to edit this compressed memory without scrambling existing associations. De...

Ali Hatamizadeh, Yejin Choi, Jan Kautz
May 21, 2026
arXivPDF

LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems

Large language model (LLM)-based multi-agent systems increasingly rely on intermediate communication to coordinate complex tasks. While most existing systems communicate through natural language, recent work shows that latent communication, particularly through transformer key-value (KV) caches, can...

Sadia Asif, Mohammad Mohammadi Amiri, Momin Abbas
May 21, 2026

Data from arXiv.org • Updated hourly