The general idea of an EEG system that posts data to a network?
Very, but there are already tons of them at lots of different price, quality, openness levels. A lot of manufacturers have their own protocols; there are also quasi/standards like Lab Streaming Layer for connecting to a hodgepodge of devices.
This particular data?
Probably not so useful. While it’s easy to get something out of an EEG set, it takes some work to get good quality data that’s not riddled with noise (mains hum, muscle artifacts, blinks, etc). Plus, brain waves on their own aren’t particularly interesting—-it’s seeing how they change in response to some external or internal event that tells us about the brain.
Simplification can be good---but they've removed the wrong half here!
The notifications act as an overall progress bar and give you a general sense of what Claude Code is doing: is it looking in the relevant part of your codebase, or has it gotten distracted by some unused, vendored-in code?
"Read 2 files" is fine as a progress indicator but is too vague for anything else. "Read foo.cpp and bar.h" takes almost the same amount of visual space, but fulfills both purposes. You might want to fold long lists of files (5? 15?) but that seems like the perfect place for a user-settable option.
> "Read 2 files" is fine as a progress indicator but is too vague for anything else. "Read foo.cpp and bar.h" takes almost the same amount of visual space, but fulfills both purposes.
Now this is a good, thoughtful response! Totally agree that if you can convey more information using basically the same amount of space, that's likely a better solution regardless of who's using the product.
The idea behind the recent boom in low-field stuff is that you'd like to have small/cheap machines that can be everywhere and produce good-enough images through smarts (algorithms, design) rather than brute force.
The attitude on the research side is essentially "por qué no los dos?" Crank up the field strength AND use better algorithms, in the hopes of expanding what you can study.
The link is essentially a press release. The information you want is (sorta) in the actual paper it describes *.
"The images were analyzed using a commercially available AI-CAD system (Lunit INSIGHT MMG, version 1.1.7.0; Lunit Inc.), developed with deep convolutional neural networks and validated in multinational studies [1, 4]."
It's presumably a proprietary model, so you're not going to get a lot more information about it, but it's also one that's currently deployed in clinics, so...it's arguably a better comparison than a SOTA model some lab dumped on GitHub. I'd add that the post headline is also missing the point of the article: many of the missed cases can be detected with a different form of imaging. It's not really meant to be a model shoot-out style paper.
* Kim, J. Y., Kim, J. J., Lee, H. J., Hwangbo, L., Song, Y. S., Lee, J. W., Lee, N. K., Hong, S. B., & Kim, S. (2025). Added value of diffusion-weighted imaging in detecting breast cancer missed by artificial intelligence-based mammography. La Radiologia medica, 10.1007/s11547-025-02161-1. Advance online publication. https://doi.org/10.1007/s11547-025-02161
"I'm running now" doesn't make you jog if you're sitting down, but it certainly kicks off a campaign if you were considering elected office.
JL Austin called these sort of statements "performative utterances" and there's a lot of linguistic debate about them. Nevertheless, "I declare war", uttered by someone with the power to do so, is pretty unambiguously an example of one.
That depends immensely on the type of effect you're looking for.
Within-subject effects (this happens when one does A, but not when doing B) can be fine with small sample sizes, especially if you can repeat variations on A and B many times. This is pretty common in task-based fMRI. Indeed, I'm not sure why you need >2 participants expect to show that the principle is relatively generalizable.
Between-subject comparisons (type A people have this feature, type B people don't) are the problem because people differ in lots of ways and each contributes one measurement, so you have no real way to control for all that extra variation.
Precisely, and agreed 100%. We need far more within-subject designs.
You would still in general need many subjects to show the same basic within-subject patterns if you want to claim the pattern is "generalizable", in the sense of "may generalize to most people", but, precisely depending on what you are looking at here, and the strength of the effect, of course you may not need nearly as much participants as in strictly between-subject designs.
With the low test-retest reliability of task fMRI, in general, even in adults, this also means that strictly one-off within-subject designs are also not enough, for certain claims. One sort of has to demonstrate that even the within-subject effect is stable too. This may or may not be plausible for certain things, but it really needs to be considered more regularly and explicitly.
Between-subject heterogeneity is a major challenge in neuroimaging. As a developmental researcher, I've found that in structural volumetrics, even after controlling for total brain size, individual variance remains so large that age-brain associations are often difficult to detect and frequently differ between moderately sized cohorts (n=150-300). However, with longitudinal data where each subject serves as their own control, the power to detect change increases substantially—all that between-subject variance disappears with random intercept/slope mixed models. It's striking.
Task-based fMRI has similar individual variability, but with an added complication: adaptive cognition. Once you've performed a task, your brain responds differently the second time. This happens when studies reuse test questions—which is why psychological research develops parallel forms. But adaptation occurs even with parallel forms (commonly used in fMRI for counterbalancing and repeated assessment) because people learn the task type itself. Adaptation even happens within a single scanning session, where BOLD signal amplitude for the same condition typically decreases over time.
These adaptation effects contaminate ICC test-retest reliability estimates when applied naively, as if the brain weren't an organ designed to dynamically respond to its environment. Therefore, some apparent "unreliability" may not reflect the measurement instrument (fMRI) at all, but rather highlights the failures in how we analyze and conceptualize task responses over time.
Yeah, when you start getting into this stuff and see your first dataset with over a hundred MRIs, and actually start manually inspecting things like skull-stripping and stuff, it is shocking how dramatically and obviously different people's brains are from each other. The nice clean little textbook drawings and other things you see in a lot of education materials really hide just how crazy the variation is.
And yeah, part of why we need more within-subject and longitudinal designs is to get at precisely the things you mention. There is no way to know if the low ICCs we see now are in fact adaptation to the task or task generalities, if they reflect learning that isn't necessarily task-relevant adaptation (e.g. the subject is in a different mood on a later test, and this just leads to a different strategy), if the brain just changes far more than we might expect, or all sorts of other possibilities. I suspect if we ever want fMRI to yield practical or even just really useful theoretical insights, we definitely need to suss out within-subject effects that have high test-retest reliability, regardless of all these possible confounds. Likely finding such effects will involve more than just changes to analysis, but also far more rigorous experimental designs (both in terms of multi-modal data and tighter protocols, etc).
FWIW, we've also noticed a lot of magic can happen too when you suddenly have proper longitudinal data that lets you control things at the individual level.
They are indeed coupled, but the coupling is complicated and may be situationally dependent.
Honestly, it's hard to imagine many aggregate measurements that aren't. For example, suppose you learn that the average worker's pay increased. Is it because a) the economy is booming or b) the economy crashed and lower-paid workers have all been laid off (and are no longer counted).
I read that paper as suggesting that development, behavior, and fMRI are all hard.
It's not at all clear to me that teenagers' brains OR behaviours should be stable across years, especially when it involves decision-making or emotions. Their Figure 3 shows that sensory experiments are a lot more consistent, which seems reasonable.
The technical challenges (registration, motion, etc) like things that will improve and there are some practical suggestions as well (counterbalancing items, etc).
While I agree I wouldn't expect too much stability in developing brains, unfortunately there are pretty serious stability issues even in non-developing adult brains (quote below from the paper, for anyone who doesn't want to click through).
I agree it makes a lot of sense though the sensory experiments are more consistent, somatosensory and sensorimotor localization results generally seem to the be most consistent fMRI findings. I am not sure registration or motion correction is really going to help much here, I suspect the reality is just that the BOLD response is a lot less longitudinally stable than we thought (brain is changing more often and more quickly than we expected).
Or if we do get better at this, it will be more sophisticated "correction" methods (e.g. deep-learners that can predict typical longitudinal BOLD changes, and those better allow such changes to be "subtracted out", or something like that). But I am skeptical about progress here given the amount of data needed to develop any kind of corrective improvements in cases where there are such low longitudinal reliabilities.
===
> Using ICCs [intraclass correlation coefficients], recent efforts have examined test-retest reliability of task-based fMRI BOLD signal in adults. Bennett and Miller performed a meta-analysis of 13 fMRI studies between 2001 and 2009 that reported ICCs. ICC values ranged from 0.16 to 0.88, with the average reliability being 0.50 across all studies. Others have also suggested a minimal acceptable threshold of task-based fMRI ICC values of 0.4–0.5 to be considered reliable [...] Moreover, Bennett and Miller, as well as a more recent review, highlight that reliability can change on a study-by-study basis depending on several methodical considerations.
fMRI ususally measures BOLD, changes in blood oxygenation (well, deoxygenation). The point of the paper is that you can get relative changes like that in lots of ways: you could have more or less blood, or take out more/less oxygen from the same blood.
These can be measured themselves separately (that's exactly what they did here!) and if there's a spatial component, which the figures sort of suggest, you can also look at what a particular spot tends to do. It may also be interesting/important to understand why different parts of the brain seem to use different strategies to meet that demand.
Very, but there are already tons of them at lots of different price, quality, openness levels. A lot of manufacturers have their own protocols; there are also quasi/standards like Lab Streaming Layer for connecting to a hodgepodge of devices.
This particular data?
Probably not so useful. While it’s easy to get something out of an EEG set, it takes some work to get good quality data that’s not riddled with noise (mains hum, muscle artifacts, blinks, etc). Plus, brain waves on their own aren’t particularly interesting—-it’s seeing how they change in response to some external or internal event that tells us about the brain.
reply