AI Research

Latest research papers from arXiv covering machine learning, computer vision, natural language processing, and more.

arXivPDF

Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework

The rapid growth of scientific literature has made it increasingly difficult for researchers to efficiently discover, evaluate, and synthesize relevant work. Recent advances in multi-agent large language models (LLMs) have demonstrated strong potential for understanding user intent and are being tra...

Komal Kumar, Aman Chadha, Salman Khan
Apr 7, 2026
arXivPDF

In-Place Test-Time Training

The static ``train then deploy" paradigm fundamentally limits Large Language Models (LLMs) from dynamically adapting their weights in response to continuous streams of new information inherent in real-world tasks. Test-Time Training (TTT) offers a compelling alternative by updating a subset of model...

Guhao Feng, Shengjie Luo, Kai Hua
Apr 7, 2026
arXivPDF

Action Images: End-to-End Policy Learning via Multiview Video Generation

World action models (WAMs) have emerged as a promising direction for robot policy learning, as they can leverage powerful video backbones to model the future states. However, existing approaches often rely on separate action modules, or use action representations that are not pixel-grounded, making ...

Haoyu Zhen, Zixian Gao, Qiao Sun
Apr 7, 2026
arXivPDF

DiffHDR: Re-Exposing LDR Videos with Video Diffusion Models

Most digital videos are stored in 8-bit low dynamic range (LDR) formats, where much of the original high dynamic range (HDR) scene radiance is lost due to saturation and quantization. This loss of highlight and shadow detail precludes mapping accurate luminance to HDR displays and limits meaningful ...

Zhengming Yu, Li Ma, Mingming He
Apr 7, 2026
arXivPDF

The Character Error Vector: Decomposable errors for page-level OCR evaluation

The Character Error Rate (CER) is a key metric for evaluating the quality of Optical Character Recognition (OCR). However, this metric assumes that text has been perfectly parsed, which is often not the case. Under page-parsing errors, CER becomes undefined, limiting its use as a metric and making e...

Jonathan Bourne, Mwiza Simbeye, Joseph Nockels
Apr 7, 2026
arXivPDF

Target Policy Optimization

In RL, given a prompt, we sample a group of completions from a model and score them. Two questions follow: which completions should gain probability mass, and how should the parameters move to realize that change? Standard policy-gradient methods answer both at once, so the update can overshoot or u...

Jean Kaddour
Apr 7, 2026
arXivPDF

Exclusive Unlearning

When introducing Large Language Models (LLMs) into industrial applications, such as healthcare and education, the risk of generating harmful content becomes a significant challenge. While existing machine unlearning methods can erase specific harmful knowledge and expressions, diverse harmful conten...

Mutsumi Sasaki, Kouta Nakayama, Yusuke Miyao
Apr 7, 2026
arXivPDF

Generating Synthetic Doctor-Patient Conversations for Long-form Audio Summarization

Long-context audio reasoning is underserved in both training data and evaluation. Existing benchmarks target short-context tasks, and the open-ended generation tasks most relevant to long-context reasoning pose well-known challenges for automatic evaluation. We propose a synthetic data generation pi...

Yanis Labrak, David Grünert, Séverin Baroudi
Apr 7, 2026
arXivPDF

Shot-Based Quantum Encoding: A Data-Loading Paradigm for Quantum Neural Networks

Efficient data loading remains a bottleneck for near-term quantum machine-learning. Existing schemes (angle, amplitude, and basis encoding) either underuse the exponential Hilbert-space capacity or require circuit depths that exceed the coherence budgets of noisy intermediate-scale quantum hardware....

Basil Kyriacou, Viktoria Patapovich, Maniraman Periyasamy
Apr 7, 2026
arXivPDF

Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents

Large language models are increasingly deployed as autonomous agents executing multi-step workflows in real-world software environments. However, existing agent benchmarks suffer from three critical limitations: (1) trajectory-opaque grading that checks only final outputs, (2) underspecified safety ...

Bowen Ye, Rang Li, Qibin Yang
Apr 7, 2026

Data from arXiv.org • Updated hourly