Back to Roadmap
RoadblockArtificial IntelligenceProgressing

Mechanistic interpretability

Understanding the internal computations of neural networks at the level of individual features and circuits remains extremely challenging. Sparse autoencoders have revealed interpretable features in medium-scale models, but scaling these techniques to frontier models with hundreds of billions of parameters is an open problem. Key questions include whether models represent concepts in superposition, how to extract faithful causal explanations of model behavior, and whether mechanistic understanding can yield practical safety guarantees.

Recent papers / Artificial Intelligence

Uncertainty analysis in digital twins and integration of aleatory uncertainties for virtual entity models

June 10, 2026openalex

G-SENSE: Generalized Sensorless External Force Estimation for Humanoid Robots via Centroidal Dynamics

June 10, 2026openalex