A→Z
A2ZAI

AI Research

Latest research papers from arXiv covering machine learning, computer vision, natural language processing, and more.

arXivPDF

V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties

Large-scale video generation models have shown remarkable potential in modeling photorealistic appearance and lighting interactions in real-world scenes. However, a closed-loop framework that jointly understands intrinsic scene properties (e.g., albedo, normal, material, and irradiance), leverages t...

Ye Fang, Tong Wu, Valentin Deschaintre
Dec 12, 2025
arXivPDF

Particulate: Feed-Forward 3D Object Articulation

We present Particulate, a feed-forward approach that, given a single static 3D mesh of an everyday object, directly infers all attributes of the underlying articulated structure, including its 3D parts, kinematic structure, and motion constraints. At its core is a transformer network, Part Articulat...

Ruining Li, Yuxin Yao, Chuanxia Zheng
Dec 12, 2025
arXivPDF

AnchorDream: Repurposing Video Diffusion for Embodiment-Aware Robot Data Synthesis

The collection of large-scale and diverse robot demonstrations remains a major bottleneck for imitation learning, as real-world data acquisition is costly and simulators offer limited diversity and fidelity with pronounced sim-to-real gaps. While generative models present an attractive solution, exi...

Junjie Ye, Rong Xue, Basile Van Hoorick
Dec 12, 2025
arXivPDF

Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation

Reality is a dance between rigid constraints and deformable structures. For video models, that means generating motion that preserves fidelity as well as structure. Despite progress in diffusion models, producing realistic structure-preserving motion remains challenging, especially for articulated a...

Yang Fei, George Stoica, Jingyuan Liu
Dec 12, 2025
arXivPDF

Uncertainty-Aware Domain Adaptation for Vitiligo Segmentation in Clinical Photographs

Accurately quantifying vitiligo extent in routine clinical photographs is crucial for longitudinal monitoring of treatment response. We propose a trustworthy, frequency-aware segmentation framework built on three synergistic pillars: (1) a data-efficient training strategy combining domain-adaptive p...

Wentao Jiang, Vamsi Varra, Caitlin Perez-Stable
Dec 12, 2025
arXivPDF

Softmax as Linear Attention in the Large-Prompt Regime: a Measure-based Perspective

Softmax attention is a central component of transformer architectures, yet its nonlinear structure poses significant challenges for theoretical analysis. We develop a unified, measure-based framework for studying single-layer softmax attention under both finite and infinite prompts. For i.i.d. Gauss...

Etienne Boursier, Claire Boyer
Dec 12, 2025
arXivPDF

Super Suffixes: Bypassing Text Generation Alignment and Guard Models Simultaneously

The rapid deployment of Large Language Models (LLMs) has created an urgent need for enhanced security and privacy measures in Machine Learning (ML). LLMs are increasingly being used to process untrusted text inputs and even generate executable code, often while having access to sensitive system cont...

Andrew Adiletta, Kathryn Adiletta, Kemal Derya
Dec 12, 2025
arXivPDF

MatAnyone 2: Scaling Video Matting via a Learned Quality Evaluator

Video matting remains limited by the scale and realism of existing datasets. While leveraging segmentation data can enhance semantic stability, the lack of effective boundary supervision often leads to segmentation-like mattes lacking fine details. To this end, we introduce a learned Matting Quality...

Peiqing Yang, Shangchen Zhou, Kai Hao
Dec 12, 2025
arXivPDF

Agile Flight Emerges from Multi-Agent Competitive Racing

Through multi-agent competition and the sparse high-level objective of winning a race, we find that both agile flight (e.g., high-speed motion pushing the platform to its physical limits) and strategy (e.g., overtaking or blocking) emerge from agents trained with reinforcement learning. We provide e...

Vineet Pasumarti, Lorenzo Bianchi, Antonio Loquercio
Dec 12, 2025
arXivPDF

Conditional Coverage Diagnostics for Conformal Prediction

Evaluating conditional coverage remains one of the most persistent challenges in assessing the reliability of predictive systems. Although conformal methods can give guarantees on marginal coverage, no method can guarantee to produce sets with correct conditional coverage, leaving practitioners with...

Sacha Braun, David Holzmüller, Michael I. Jordan
Dec 12, 2025
arXivPDF

Learning Minimal Representations of Fermionic Ground States

We introduce an unsupervised machine-learning framework that discovers optimally compressed representations of quantum many-body ground states. Using an autoencoder neural network architecture on data from $L$-site Fermi-Hubbard models, we identify minimal latent spaces with a sharp reconstruction q...

Felix Frohnert, Emiel Koridon, Stefano Polla
Dec 12, 2025

Data from arXiv.org • Updated hourly