DeepScience

DeepScience — Mental Health

DeepScience

Mental Health · Daily Digest

May 25, 2026

279

Papers

10/10

Roadblocks Active

Connections

⚡ Signal of the Day

• Speech-based depression and anxiety detection reaches 71% sensitivity/specificity across ~5,000 test subjects using a Whisper-LoRA model with publicly released weights — the most concrete, deployable result in today's pipeline.

• Multiple independent groups are converging on LLM-based symptom extraction from clinical transcripts (EmoTrack, ADAPTS, CPEMH), suggesting the field is rapidly standardizing around transformer architectures for depression severity scoring — but dataset proprietary concerns remain a shared bottleneck.

• Watch for reliability and bias caveats: the LLM screening benchmark paper finds one widely-used open model (Llama-3.1-8B) loses reliability almost entirely under real-world speech noise, and GPT-based models show higher depression classification accuracy for male than female participants — deployment-readiness is not uniform.

📄 Top 10 Papers

Voice Biomarkers for Depression and Anxiety

A Whisper-small model fine-tuned with LoRA adapters on ~64,800 audio recordings from 34,457 people achieves 71% equal sensitivity and specificity for detecting depression and anxiety from raw speech, without analyzing the words spoken. The model was trained and validated on demographically balanced splits and the weights are publicly released on HuggingFace, making this one of the few voice-biomarker results that others can immediately test. Because the training data is proprietary, full replication is limited, but the released inference model still enables real-world evaluation.

██████████ 0.9 depression-biomarkers Preprint

Read Save Connections

ADAPTS: Agentic Decomposition for Automated Protocol-agnostic Tracking of Symptoms

ADAPTS breaks down long clinical interviews into smaller symptom-specific reasoning tasks assigned to a team of LLMs working together, then recombines their outputs into a depression severity rating. On a subset of high-disagreement interviews, the automated system's absolute error (22 points) was closer to an expert benchmark than the original human raters (26 points), and reliability reached ICC=0.877 with an extended protocol. This matters because unstructured clinical interviews are common in real practice but extremely hard to automate — this approach shows LLMs can handle protocol variation without fine-tuning.

██████████ 0.9 depression-biomarkers Preprint

Read Save Connections

EmoTrack: Robust Depression Tracking from Counseling Transcripts across Session Regimes

EmoTrack extracts structured clinical signals from therapy transcripts using an LLM, pairs them with frozen sentence-level embeddings, and uses a compact attention mechanism to track depression severity across multiple counseling sessions. It reduces prediction error by 13.5% relative to the best single-session baseline on the DAIC-WOZ benchmark. The ability to track changes across sessions rather than scoring each in isolation is important for real therapy monitoring, where trends matter more than snapshots.

██████████ 0.9 depression-biomarkers Preprint

Read Save Connections

PULSE: Agentic Investigation with Passive Sensing for Proactive Intervention in Cancer Survivorship

Among cancer survivors — who face elevated depression and anxiety but often can't fill out self-report surveys at their worst moments — PULSE uses a smartphone sensor-monitoring LLM agent that autonomously queries relevant data streams to predict when a person wants emotional support. The agentic approach (where the AI chooses what to look at) substantially outperforms a structured single-query approach, reaching 0.743 balanced accuracy for predicting emotion regulation desire. This addresses a genuine measurement gap: passive sensing can reach people when active check-ins fail.

██████████ 0.8 digital-therapeutics Preprint

Read Save Connections

When Symptoms Are Not Enough: Evidence-Weighting Patterns in Large Language Model Psychiatric Screening

Five LLMs were tested on 555 semi-structured interviews against clinician-confirmed psychiatric diagnoses, with accuracy ranging widely from 0.49 to 0.86 depending on the model and disorder. GPT-4.1 Mini and GPT-5 Mini were most consistent across depression, anxiety, and PTSD, but all models showed higher accuracy for male than female participants — a bias pattern that would be critical to catch before deployment. Data and prompts are posted on OSF, allowing others to replicate the benchmarking.

██████████ 0.8 digital-therapeutics Preprint

Read Save Connections

Can We Trust LLMs for Mental Health Screening? Consistency, ASR Robustness, and Evidence Faithfulness

When voice transcription errors are introduced (simulating real-world speech recognition noise), Llama-3.1-8B's reliability for estimating anxiety and depression scores collapses from acceptable (ICC=0.82) to near-random (ICC=0.36) at just 10% word error rate, while Phi-4 and Gemma-2-9B remain stable above ICC=0.89. This is a practical warning for anyone building screening tools on top of speech-to-text pipelines: model choice matters enormously for robustness, and Llama-based approaches may be unsuitable for noisy audio environments.

██████████ 0.8 depression-biomarkers Preprint

Read Save Connections

Von Economo neurons enable reliable social skill acquisition in recurrent spiking neural networks: a computational account with clinical predictions

Von Economo neurons (VENs) are large, rare brain cells found in social species that are depleted in several psychiatric conditions including schizophrenia and autism. This simulation study shows that removing VENs from a spiking neural network causes training to fail 30% of the time versus 2% when they are present (Fisher's exact OR=21), and that their absence is most harmful during mid-training when the network is forming cooperative cell assemblies. The mechanistic specificity — VENs matter most in a defined learning window — generates testable clinical predictions about developmental timing of vulnerability.

██████████ 0.8 computational-psychiatry Preprint

Read Save Connections

The Complex Brain Hypothesis: Resolving the Entropy-Content Conundrum in Minimal Phenomenal Experience

Deep meditative states and psychedelic states (5-MeO-DMT) both show unusually high brain entropy on neuroimaging, yet they feel completely different — meditation produces minimal, quiet experience while psychedelics produce overwhelming content. This paper argues the contradiction resolves when you measure brain complexity rather than entropy alone: complexity distinguishes the two states even though raw entropy does not. This reframing has implications for understanding why psychedelics produce therapeutic effects: it is not simply about increasing neural disorder, but about the structured richness of that activity.

██████████ 0.8 psychedelic-mechanisms Preprint

Read Save Connections

Functional Whole-Brain Models: A New Framework for Unifying Brain Structure and Cognitive Function

Two traditions in computational neuroscience have grown in isolation: biophysically detailed brain simulations that look realistic but cannot perform tasks, and deep learning models that perform cognitive tasks but ignore brain anatomy. This perspective paper argues for merging them into 'functional whole-brain models' that are both structurally grounded and task-performing, outlining four criteria and a three-step roadmap. For psychiatric research, such models could eventually simulate what goes wrong in conditions like depression at a mechanistic level — but this paper is a position statement, not an empirical result.

██████████ 0.8 computational-psychiatry Preprint

Read Save Connections

Measuring Psychological States Through Semantic Projection: A Theory-Driven Approach to Language-Based Assessment

Instead of training a classifier on labeled data, this approach maps text responses onto geometric axes in embedding space — axes defined by the meaning of clinical scale items from PHQ-9, GAD-7, and similar tools. In 247 observations from 145 participants, scores from structured word-list tasks correlated meaningfully with validated clinical measures and outperformed VADER sentiment analysis. The method requires no labeled training data and could lower the barrier to deploying language-based screening where annotated datasets are unavailable.

██████████ 0.7 depression-biomarkers Preprint

Read Save Connections

🔬 Roadblock Activity

Roadblock	Papers	Status	Signal
Computational Psychiatry	143	Active	High activity day with multiple papers pushing AI-based symptom modeling, including a spiking network study providing mechanistic predictions about Von Economo neuron dysfunction and a position paper calling for unified brain models that can both simulate and perform cognitive tasks.
Depression Biomarkers	64	Active	Voice and text biomarker research is the dominant theme, with the voice-biomarker paper (71% sensitivity/specificity, released model weights) and ADAPTS transcript-scoring system providing the most deployment-ready results.
Digital Therapeutics	60	Active	LLM reliability for screening is under scrutiny — the LLM psychiatric screening benchmark and ASR-robustness study both flag model-specific failure modes and demographic disparities that would need to be resolved before clinical deployment.
Neuroplasticity Interventions	42	Active	A theoretical PTSD framework (MindGap) proposes targeting a pre-cognitive 'feeling tone gap' to dissolve trauma pathways rather than managing symptoms, but it lacks empirical support; the Von Economo neuron simulation provides more concrete mechanistic grounding.
Youth Mental Health Crisis	39	Active	No papers in today's top selection directly address youth populations; the high paper count likely reflects general mental health screening work that touches adolescent cohorts peripherally.
Neuroinflammation	17	Active	Minimal direct signal in top papers today; the whole-brain modeling framework paper tangentially mentions neuroinflammation as a downstream application domain but no empirical neuroinflammation work surfaced in the highest-relevance tier.
Sleep & Circadian Psychiatry	11	Active	No papers addressing sleep or circadian mechanisms appeared in the top tier today despite moderate pipeline activity; this roadblock remains underrepresented in the highest-quality outputs.
Treatment-Resistant Depression	6	Open	Low activity; two theoretical emotion taxonomy papers (Friction Theory) reference treatment-resistant depression as a potential application but offer no empirical evidence.
Gut-Brain Axis	5	Open	Very low activity with no papers reaching the top selection; this roadblock remains quiet in today's pipeline.
Psychedelic Mechanisms	2	Low	Only two papers today, but the Complex Brain Hypothesis paper provides a meaningful reframing — brain complexity rather than entropy as the key metric — that could sharpen mechanistic hypotheses about why psychedelics produce therapeutic effects.

View Full Analysis

DeepScience — Cross-domain scientific intelligence
Sources: arXiv · OpenAlex · Unpaywall
deepsci.io

Unsubscribe