All digests
General publicENArtificial Intelligencedaily

[Artificial Intelligence] AI Reads Heart Scans, Storm Damage, and Tax Risk

DeepScience — Artificial Intelligence
DeepScience · Artificial Intelligence · Daily Digest

AI Reads Heart Scans, Storm Damage, and Tax Risk

Today's papers show AI stepping into real consequential decisions — in hospitals, disaster zones, and tax offices.
May 21, 2026
Three stories today, all grounded in actual data from actual events. I'd call this a quietly solid day — no fireworks, but three papers that show AI doing genuinely useful things in the world while being honest about what it can't yet do. Let's dig in.
Today's stories
01 / 03

AI reads your heart ultrasound in 94 seconds instead of 8 minutes

Your heart scan takes a trained technician about eight minutes of careful work — a pilot study in Seoul just cut that to 94 seconds without losing accuracy.

An echocardiogram is an ultrasound of your heart — the same basic technology used to check babies before birth, but pointed at your chest to watch your heart pump. A cardiologist or technician normally sits down, scrolls through the footage, measures the chambers, and checks how strongly the heart squeezes. At Seoul National University Bundang Hospital, a team ran a direct head-to-head: they gave the same 40 heart scans to both a trained cardiac sonographer and a fully automated commercial AI system called SONIX Health. The human needed a median of 490 seconds — about eight minutes. The AI needed 94 seconds. The key question was never just speed. Think of it like spell-checking an important letter: fast is worthless if the mistakes slip through. For the most critical measurement — LVEF, or left ventricular ejection fraction, which tells you what percentage of blood your heart pushes out with each beat — the AI matched the human's measurements almost exactly. The mean difference was literally 0.00 percentage points. That's tight. The catch? This is a pilot study of 40 scans at one hospital, judged against a single senior cardiologist as the sole reference point. Forty cases is a first conversation, not a conclusion. Some secondary measurements showed much weaker agreement — one dropped as low as 0.625 on a scale where 1.0 is perfect. The AI also fell slightly short of the human on grading aortic regurgitation, a heart-valve condition. Real progress, but this needs a much larger multicentre trial before anyone replaces a trained sonographer.

Glossary
echocardiogramAn ultrasound scan of the heart used to check its structure and how well it pumps blood.
LVEF (left ventricular ejection fraction)The percentage of blood pumped out of the heart's main chamber with each beat — a key indicator of heart health.
intraclass correlation coefficient (ICC)A number between 0 and 1 that measures how closely two measurement methods agree; closer to 1 means stronger agreement.
02 / 03

Street-view photos plus building data let AI map hurricane damage at 93% accuracy

After a hurricane, someone has to look at every damaged house on every street — researchers just trained AI to do it from photos, and it hit 93% accuracy.

When a hurricane tears through a neighbourhood, emergency responders, insurers, and city planners all need the same thing quickly: a map of which buildings are severely damaged, which are moderate, and which are fine. Right now that means humans driving or walking every street, assessing each home by eye. A research team working with post-Hurricane Ian (2022, Florida) data asked: what if AI could classify damage from street-level photos? Their system, called MMST, works like a home inspector with two toolboxes instead of one. The first is the photos — the AI scans visible structural damage, just as a human would. The second is a dossier on each building: its age, its assessed value, how fast the winds were blowing nearby, how close it was to the storm's track. It's the difference between a doctor who glances at you and one who reads your chart before walking in. Combining both streams lifted overall accuracy to 92.67% on the Hurricane Ian test set — about two percentage points better than the best image-only system. The more honest number is the Matthews Correlation Coefficient — a stricter metric that penalises you for skipping rare categories. There, the team beat the previous best by 11 percentage points, which is a more meaningful signal than the headline accuracy. The catch: the model was trained and tested on a single hurricane in Florida, using field-assessment labels from the StEER reconnaissance database. Solid data — but one storm, one region, one housing stock. A system like this needs to prove it works on different storms, different construction styles, different climates before emergency managers stake real decisions on it.

Glossary
Swin TransformerA type of AI image-recognition model that processes images in overlapping patches, known for handling large, detailed photos efficiently.
Matthews Correlation Coefficient (MCC)A single score for how well a classifier performs across all categories, including rare ones — harder to game than simple accuracy.
multimodalCombining more than one type of input — in this case, photos and numerical building data — into a single AI model.
03 / 03

Indonesia's tax AI can explain why it flagged you — and that matters

An AI sorted 49,000 Indonesian tax records for fraud risk — and this time, it had to show its working.

Tax compliance sounds dry until you remember what it funds. Every rupiah hidden from a tax authority is a rupiah that doesn't go toward roads or hospitals. Indonesia's Directorate General of Taxes gave researchers 49,159 real administrative records — and a thorny problem buried inside them: only about 1 in 19 entries was a genuine non-complier. That's like trying to spot one slightly-off apple in a crate that's mostly fine. If your AI just calls everyone compliant, it scores 95% accuracy and catches nobody. The team tackled this with two moves. First, they used resampling techniques to give the model more practice examples of the rare bad cases — imagine photocopying the unusual apples so the model gets used to spotting them. Second, they stacked several AI models on top of each other, like a panel of judges all seeing the same case and then comparing notes. The headline result is 97% overall accuracy. The more honest number is a minority-class F1-score of 0.73 — meaning the system correctly catches roughly 73% of actual non-compliers it encounters. The genuinely interesting part is what happens next: the researchers applied two tools called SHAP and LIME that force the model to explain each decision in plain terms. What did they find? Raw financial scale — how much you paid in taxes, how large your assets are — drives the predictions more than profit ratios or margins. Your absolute size matters more to this model than how efficiently you're hiding income. The catch is significant: there's no temporal validation. The researchers never tested whether a model trained on older records correctly predicts newer ones. In tax behaviour, that gap is exactly where the interesting evasion happens.

Glossary
stacking ensembleA technique where the predictions of several AI models are fed into a final model that learns which combination to trust — like a panel of expert advisors plus one decision-maker.
SHAP and LIMETwo tools that explain an AI's decision by showing which input features pushed the outcome in which direction, making the 'black box' more legible to humans.
F1-scoreA combined measure of how often a model correctly identifies the thing it's looking for (precision) and how rarely it misses one (recall) — a better measure than accuracy when one category is rare.
class imbalanceWhen one category in a dataset is far rarer than others, making it easy for a model to appear accurate by mostly ignoring the rare class.
The bigger picture

Three different domains — cardiology, disaster response, tax enforcement — but the same underlying pattern. AI is getting better at tasks that used to require a trained human to sit with a pile of data and make a judgment call. In each paper today, the improvement comes from combination: images plus structured records, visual scans plus building history, raw prediction plus the ability to explain a decision. That last part matters more than it sounds. An AI that flags your business as a tax risk without being able to say why is hard for an auditor to act on — and impossible for you to contest if it's wrong. The interpretability layer in the Indonesian tax paper, the brutally honest MCC score in the hurricane work, the pilot-study caveat in the echocardiography study: these are all signs of a field learning to be honest with itself about where its tools fall short. I find that encouraging, not deflating.

What to watch next

The echocardiography result needs a larger multicentre trial before it moves from interesting to actionable — 40 scans at one hospital is a proof of concept. The hurricane classifier becomes more convincing the moment someone tests it on a storm outside Florida: different housing stock, different wind patterns, different damage signatures. If you're watching the broader AI-in-disaster-response space, CVPR (the main computer vision conference) runs in mid-June in Seattle — expect several papers on satellite and street-level damage assessment that will help answer the generalisation question.

Further reading
Thanks for reading — JB.
DeepScience — Cross-domain scientific intelligence
deepsci.io