RoadblockArtificial IntelligenceProgressing

Training and inference efficiency

The computational cost of training and serving large language models grows faster than hardware improvements can offset. Scaling laws suggest diminishing returns without architectural innovation. Mixture-of-experts, state-space models, linear attention variants, and speculative decoding offer paths to efficiency, but each introduces new trade-offs in quality, memory, or engineering complexity. Achieving compute-optimal scaling while maintaining capability across diverse tasks is critical for sustainable AI development.

Training and inference efficiency

Knowing the Self, Understanding the World: A Dual-Cognition Benchmark for UAV Spatio-temporal Reasoning with MLLMs

FVAttn: Adaptive Sparse Attention with Runtime Load Balancing for Video Generation

PagedWeight: Efficient MoE LLM Serving with Dynamic Quality-Aware Weight Quantization

A Blueprint for Equilibrium-Based Differentiable Continuous-Variable Thermodynamic Computing

Vision-Language Assistant for Emotional Reactions to Risky Driving

Cluster-Aware Matching via Laplacian Optimal Transport

Physics-enhanced reinforcement learning for real-time optimal control of dynamical systems

Evaluating Open-Weight LLMs for Generating Structured Threat Information for Autonomous Vehicle Vulnerabilities

Vision-Language-Motion Maps: An Open-Vocabulary, Uncertainty-Aware, Queryable Motion Attribute for 3D Scene Maps

When Does Muon Help Agentic Reinforcement Learning?