DeepScience — Artificial Intelligence

DeepScience · Artificial Intelligence · Daily Digest

AI Reads Heart Scans, Storm Damage, and Tax Risk

Today's papers show AI stepping into real consequential decisions — in hospitals, disaster zones, and tax offices.

            May 21, 2026
          

Three stories today, all grounded in actual data from actual events. I'd call this a quietly solid day — no fireworks, but three papers that show AI doing genuinely useful things in the world while being honest about what it can't yet do. Let's dig in.

Today's stories

              01 / 03
            

AI reads your heart ultrasound in 94 seconds instead of 8 minutes

Your heart scan takes a trained technician about eight minutes of careful work — a pilot study in Seoul just cut that to 94 seconds without losing accuracy.

An echocardiogram is an ultrasound of your heart — the same basic technology used to check babies before birth, but pointed at your chest to watch your heart pump. A cardiologist or technician normally sits down, scrolls through the footage, measures the chambers, and checks how strongly the heart squeezes. At Seoul National University Bundang Hospital, a team ran a direct head-to-head: they gave the same 40 heart scans to both a trained cardiac sonographer and a fully automated commercial AI system called SONIX Health. The human needed a median of 490 seconds — about eight minutes. The AI needed 94 seconds. The key question was never just speed. Think of it like spell-checking an important letter: fast is worthless if the mistakes slip through. For the most critical measurement — LVEF, or left ventricular ejection fraction, which tells you what percentage of blood your heart pushes out with each beat — the AI matched the human's measurements almost exactly. The mean difference was literally 0.00 percentage points. That's tight. The catch? This is a pilot study of 40 scans at one hospital, judged against a single senior cardiologist as the sole reference point. Forty cases is a first conversation, not a conclusion. Some secondary measurements showed much weaker agreement — one dropped as low as 0.625 on a scale where 1.0 is perfect. The AI also fell slightly short of the human on grading aortic regurgitation, a heart-valve condition. Real progress, but this needs a much larger multicentre trial before anyone replaces a trained sonographer.

Glossary

echocardiogram — An ultrasound scan of the heart used to check its structure and how well it pumps blood.

LVEF (left ventricular ejection fraction) — The percentage of blood pumped out of the heart's main chamber with each beat — a key indicator of heart health.

intraclass correlation coefficient (ICC) — A number between 0 and 1 that measures how closely two measurement methods agree; closer to 1 means stronger agreement.

Source: Fully automated artificial intelligence–based echocardiographic analysis substantially reduces workflow time while preserving measurement accuracy: a pilot study

              02 / 03
            

Street-view photos plus building data let AI map hurricane damage at 93% accuracy

After a hurricane, someone has to look at every damaged house on every street — researchers just trained AI to do it from photos, and it hit 93% accuracy.

When a hurricane tears through a neighbourhood, emergency responders, insurers, and city planners all need the same thing quickly: a map of which buildings are severely damaged, which are moderate, and which are fine. Right now that means humans driving or walking every street, assessing each home by eye. A research team working with post-Hurricane Ian (2022, Florida) data asked: what if AI could classify damage from street-level photos? Their system, called MMST, works like a home inspector with two toolboxes instead of one. The first is the photos — the AI scans visible structural damage, just as a human would. The second is a dossier on each building: its age, its assessed value, how fast the winds were blowing nearby, how close it was to the storm's track. It's the difference between a doctor who glances at you and one who reads your chart before walking in. Combining both streams lifted overall accuracy to 92.67% on the Hurricane Ian test set — about two percentage points better than the best image-only system. The more honest number is the Matthews Correlation Coefficient — a stricter metric that penalises you for skipping rare categories. There, the team beat the previous best by 11 percentage points, which is a more meaningful signal than the headline accuracy. The catch: the model was trained and tested on a single hurricane in Florida, using field-assessment labels from the StEER reconnaissance database. Solid data — but one storm, one region, one housing stock. A system like this needs to prove it works on different storms, different construction styles, different climates before emergency managers stake real decisions on it.

Glossary

Swin Transformer — A type of AI image-recognition model that processes images in overlapping patches, known for handling large, detailed photos efficiently.

Matthews Correlation Coefficient (MCC) — A single score for how well a classifier performs across all categories, including rare ones — harder to game than simple accuracy.

multimodal — Combining more than one type of input — in this case, photos and numerical building data — into a single AI model.

Source: Post-hurricane building damage assessment using street-view imagery and structured data: a multimodal deep learning approach

              03 / 03
            

Indonesia's tax AI can explain why it flagged you — and that matters

An AI sorted 49,000 Indonesian tax records for fraud risk — and this time, it had to show its working.

Tax compliance sounds dry until you remember what it funds. Every rupiah hidden from a tax authority is a rupiah that doesn't go toward roads or hospitals. Indonesia's Directorate General of Taxes gave researchers 49,159 real administrative records — and a thorny problem buried inside them: only about 1 in 19 entries was a genuine non-complier. That's like trying to spot one slightly-off apple in a crate that's mostly fine. If your AI just calls everyone compliant, it scores 95% accuracy and catches nobody. The team tackled this with two moves. First, they used resampling techniques to give the model more practice examples of the rare bad cases — imagine photocopying the unusual apples so the model gets used to spotting them. Second, they stacked several AI models on top of each other, like a panel of judges all seeing the same case and then comparing notes. The headline result is 97% overall accuracy. The more honest number is a minority-class F1-score of 0.73 — meaning the system correctly catches roughly 73% of actual non-compliers it encounters. The genuinely interesting part is what happens next: the researchers applied two tools called SHAP and LIME that force the model to explain each decision in plain terms. What did they find? Raw financial scale — how much you paid in taxes, how large your assets are — drives the predictions more than profit ratios or margins. Your absolute size matters more to this model than how efficiently you're hiding income. The catch is significant: there's no temporal validation. The researchers never tested whether a model trained on older records correctly predicts newer ones. In tax behaviour, that gap is exactly where the interesting evasion happens.

Glossary

stacking ensemble — A technique where the predictions of several AI models are fed into a final model that learns which combination to trust — like a panel of expert advisors plus one decision-maker.

SHAP and LIME — Two tools that explain an AI's decision by showing which input features pushed the outcome in which direction, making the 'black box' more legible to humans.

F1-score — A combined measure of how often a model correctly identifies the thing it's looking for (precision) and how rarely it misses one (recall) — a better measure than accuracy when one category is rare.

class imbalance — When one category in a dataset is far rarer than others, making it easy for a model to appear accurate by mostly ignoring the rare class.

Source: Integration of Stacking Ensemble and Explainable AI for Taxpayer Compliance Risk Profiling

The bigger picture

Three different domains — cardiology, disaster response, tax enforcement — but the same underlying pattern. AI is getting better at tasks that used to require a trained human to sit with a pile of data and make a judgment call. In each paper today, the improvement comes from combination: images plus structured records, visual scans plus building history, raw prediction plus the ability to explain a decision. That last part matters more than it sounds. An AI that flags your business as a tax risk without being able to say why is hard for an auditor to act on — and impossible for you to contest if it's wrong. The interpretability layer in the Indonesian tax paper, the brutally honest MCC score in the hurricane work, the pilot-study caveat in the echocardiography study: these are all signs of a field learning to be honest with itself about where its tools fall short. I find that encouraging, not deflating.

What to watch next

The echocardiography result needs a larger multicentre trial before it moves from interesting to actionable — 40 scans at one hospital is a proof of concept. The hurricane classifier becomes more convincing the moment someone tests it on a storm outside Florida: different housing stock, different wind patterns, different damage signatures. If you're watching the broader AI-in-disaster-response space, CVPR (the main computer vision conference) runs in mid-June in Seattle — expect several papers on satellite and street-level damage assessment that will help answer the generalisation question.