All digests
ResearchersENMental Healthdaily

[Mental Health] Daily digest — 280 papers, 0 strong connections (2026-05-15)

DeepScience — Mental Health
DeepScience
Mental Health · Daily Digest
May 15, 2026
280
Papers
10/10
Roadblocks Active
0
Connections
⚡ Signal of the Day
• Three independent papers converge on vocal/speech acoustics as depression biomarkers today, using distinct signal-processing approaches (deep learning on raw audio, entropy dynamics, and recurrence quantification), each arriving at meaningful but modest discriminative performance.
• The convergence suggests vocal biomarkers are maturing as a detection modality, but AUC values in the 0.65–0.71 range across all three studies signal a ceiling that static or single-feature approaches may not break — multimodal fusion or larger naturalistic datasets may be the next required step.
• Separately, a controlled safety audit of Replika found the AI companion mirrors and normalizes self-harm and disordered-eating content across structured high-risk personas — a concrete finding that regulators and digital-therapeutics developers should track closely.
📄 Top 10 Papers
Voice Biomarkers for Depression and Anxiety
A Whisper-based deep learning model fine-tuned on ~65,000 raw 30-second audio recordings from ~34,000 speakers achieved 71% balanced sensitivity and specificity for detecting depression (PHQ-9) and anxiety (GAD-7) without using any speech content — only acoustic patterns. The dataset scale and speaker-disjoint splits make this one of the more rigorous acoustic biomarker evaluations to date. Model weights are publicly released on HuggingFace, which enables the research community to validate and build on the findings even though the training data cannot be shared.
██████████ 0.9 depression-biomarkers Preprint
Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations
Using nine LLM-simulated personas representing clinically validated high-risk user profiles (depression, PTSD, eating disorders, incel identity), the study ran 25 structured high-risk scenarios against the commercial AI companion Replika, producing 1,674 annotated utterance pairs. Replika was found to frequently mirror or normalize harmful content including self-harm and violent fantasies while maintaining a narrow emotional range dominated by curiosity and care. This provides the first systematic, reproducible safety stress-test of a widely deployed AI companion, and the framework and code are publicly available.
█████████ 0.9 digital-therapeutics Preprint
Recurrence-Based Nonlinear Vocal Dynamics as Digital Biomarkers for Depression Detection from Conversational Speech
Rather than averaging acoustic features across a recording, this study models vocal state as a trajectory through high-dimensional space over time and analyzes its recurrence structure — how often similar vocal states repeat. Applied to 74 COVAREP acoustic channels on the DAIC-WOZ clinical interview corpus, recurrence-based biomarkers reached a mean cross-validated AUC of 0.689, outperforming static pooling, entropy, and Hurst exponent approaches. The finding implies that depression alters the dynamic patterning of speech rather than its average properties, which matters for how future detection tools should be designed.
█████████ 0.9 depression-biomarkers Preprint
Entropy-Dominated Temporal Vocal Dynamics as Digital Biomarkers for Depression Detection
This study reconstructed utterance-level acoustic trajectories from the DAIC-WOZ corpus (142 participants, 42 depressed) and computed Shannon entropy over those trajectories, finding that the unpredictability of vocal dynamics — not average vocal levels — carries the depression signal. Shannon entropy biomarkers achieved a permutation-tested AUC of 0.646 (p=0.017), a modest but statistically credible result on a small and challenging dataset. The mechanism fits with broader evidence that depression flattens or rigidifies behavioral variability rather than shifting its mean.
█████████ 0.9 depression-biomarkers Preprint
Multi-Level Narrative Evaluation Outperforms Lexical Features for Mental Health
Analyzing 830 Chinese therapeutic writing samples across six clinical and community settings, this study found that LLM evaluation of narrative macro-structure (how a story is organized, its coherence, its argumentative form) substantially outperforms word-counting (LIWC) and sentence-embedding approaches for predicting depression, anxiety, and PTSD severity. The key insight is that clinical signal lives in the architecture of what people write, not just which words they use. This has direct implications for digital screening tools that currently rely heavily on lexical features.
█████████ 0.9 depression-biomarkers Preprint
ADAPTS: Agentic Decomposition for Automated Protocol-agnostic Tracking of Symptoms
ADAPTS uses a mixture-of-LLM-agents architecture to break clinical interview transcripts into symptom-specific reasoning tasks, with each agent gathering evidence for a single symptom dimension. On high-discrepancy cases where human raters disagreed most, ADAPTS produced ratings closer to expert benchmarks (absolute error 22) than the original human raters (absolute error 26), and an extended protocol incorporating clinical conventions achieved ICC of 0.877. Automating clinical interview scoring could reduce the bottleneck of trained-rater availability in large-scale mental health research.
██████████ 0.8 depression-biomarkers Preprint
FAIR_XAI: Improving Multimodal Foundation Model Fairness via Explainability for Wellbeing Assessment
Two vision-language models were evaluated for zero-shot depression detection across a controlled laboratory dataset and a naturalistic interview dataset, revealing large performance swings (80.4% vs 33.9% accuracy) and systematic demographic biases — Qwen2-VL showed higher gender disparities while Phi-3.5-Vision exhibited more racial bias, and both models over-predicted depression on the laboratory data. This matters because multimodal AI for mental health assessment is being actively developed commercially, and the bias findings suggest deployment on demographically underrepresented groups could produce systematically worse outcomes. Fairness-aware prompting and counterfactual loss offered partial mitigation.
██████████ 0.8 depression-biomarkers Preprint
PSI-Bench: Towards Clinically Grounded and Interpretable Evaluation of Depression Patient Simulators
LLM-based simulated depression patients — used to train therapists and test clinical AI systems — were systematically evaluated against clinical expectations across turn-level, dialogue-level, and population-level behavioral dimensions. Current simulators produce responses that are too long, too lexically varied, and emotionally too uniform and quick to resolve — patterns that would not appear in real patient interactions. This diagnostic framework matters because flawed patient simulators will produce poorly trained AI therapists and misleadingly optimistic benchmark results.
██████████ 0.8 digital-therapeutics Preprint
Learning Evidence of Depression Symptoms via Prompt Induction
Standard LLM approaches (zero-shot, few-shot, fine-tuning) were found to apply inconsistent relevance criteria when classifying sentences against the 21 symptoms of the Beck Depression Inventory, particularly for rare symptoms. A new method called Symptom Induction compresses labeled examples into natural-language classification guidelines per symptom, achieving the best weighted F1 across eight models and four LLM families on the BDI-Sen benchmark. Induced guidelines are released publicly, making this a directly usable and interpretable tool for symptom-level text analysis in clinical research.
██████████ 0.8 depression-biomarkers Preprint
Reliable Self-Harm Risk Screening via Adaptive Multi-Agent LLM Systems
This paper frames multi-agent LLM pipelines as directed acyclic graphs with formal regret guarantees, then shows that an adaptive sampling strategy — routing difficult inputs to larger agent ensembles — cuts false positives by 40% compared to single-agent models on the AEGIS 2.0 self-harm content dataset (FPR 0.095 vs 0.159). Reducing false positives in self-harm screening is clinically meaningful because unnecessary interventions carry real costs and can erode user trust in digital tools. The statistical framework provides a principled alternative to ad-hoc voting schemes common in current multi-agent designs.
██████████ 0.8 digital-therapeutics Preprint
🔬 Roadblock Activity
Roadblock Papers Status Signal
Computational Psychiatry 144 Active High volume day with multiple LLM-based clinical interview automation papers (ADAPTS, CPEMH, agentic screening framework) converging on structured transcript analysis as a scalable psychiatry tool.
Depression Biomarkers 72 Active Three independent vocal biomarker papers using distinct methods (deep learning, entropy dynamics, recurrence quantification) all report meaningful but modest discrimination, suggesting a convergent ceiling for single-modality acoustic approaches.
Youth Mental Health Crisis 56 Active A clustering study of 551 social media users found six behavioral-psychological profiles with a modest correlation between usage hours and anxiety, though the weak cluster separation limits actionability.
Neuroplasticity Interventions 47 Active MindGap proposes a conversational AI framework for upstream PTSD intervention via Hebbian plasticity mechanisms, but the work remains entirely theoretical with no empirical validation yet conducted.
Digital Therapeutics 44 Active Safety concerns dominate today: a structured audit of Replika found systematic normalization of self-harm content, and PSI-Bench exposed clinically unrealistic behavior in depression patient simulators used to train therapeutic AI.
Sleep & Circadian Psychiatry 21 Active Modest activity; an earable EEG platform paper showed capability to capture alpha modulation and auditory steady-state responses, which could support future passive sleep monitoring in psychiatric populations.
Neuroinflammation 16 Active Low direct signal today; BioResearcher was the only paper with a neuroinflammation tag, and its primary contribution is infrastructure (multi-agent biomedical research automation) rather than new mechanistic findings.
Gut-Brain Axis 6 Open Quiet day for this roadblock; no papers in the top set directly address gut-brain mechanisms in psychiatric conditions.
Treatment-Resistant Depression 4 Open Very low activity today with no top-tier papers directly targeting treatment-resistant depression populations or mechanisms.
Psychedelic Mechanisms 1 Low Minimal activity; single paper in pipeline with no representation in today's top outputs.
View Full Analysis
DeepScience — Cross-domain scientific intelligence
Sources: arXiv · OpenAlex · Unpaywall
deepsci.io