This is great feedback! Thanks for giving it a try.
You're right that we're limited in how much news we can pull. Generally, we can only look about 90 days into the past for news articles.
I'd definitely like to expand the corpus of information that we can pull from. Getting access to reliable historical data is high on that list, as it will dramatically improve base rate estimation.
Basically what I'm describing is "backtesting" from stock trading space, where a trader comes up with a hypothesis on what will come in the future. Then they retry that scenario in slices where the same conditions happened in the past and see how it would have played out. Importantly, the "algo" sees in real time based on when you start it, so it can not cheat. It makes it easy to see whether or not your intuition is based on actual data.
I kinda feels like you are using the LLM to assign "weights" or important properties of an algo and then directly translating the basic arithmetic accounting of those factors into a prediction. What I expect is that the LLM would also be used to read all past news to find similar patterns and then create time slices where its weights could be tested against a control. It can then backtest its own weights to better tune what factors really led to an outcome and expose this refinement as part of the prediction.
Interesting. I asked similar questions (Who will win the Presidential Election and Will XXX college football team win the national championship, etc) and none of the responses included percentages or yes/no indicators like the examples you link to.
The data was based on behavior between 1999-2014, which means it was behavior that almost certainly was not health motivated since intermittent fasting wasn't popular then. So I think you have to ask: why were these people skipping meals? I think there are a lot of possible explanations and they are all potential confounds.
Like, one obvious reason people skip meals is that they don't have time. If you don't have time to eat, you're probably working long hours and really stressed out. Stress could easily contribute to CVD or otherwise take years off your life.
Another obvious reason to skip meals: you don't have money to buy food. Poverty is well known to lead to worse health outcomes.
Or one more: people skip meals because they have no appetite. What's a big cohort that routinely skips meals because they have no appetite? The elderly.
Maybe they ruled these kinds of things out in the full paper (I've only read the abstract) but so far I'm unimpressed.
I pulled up the full paper. As you speculated, people who skip meals are quite different from people who don't, in ways that might be relevant to overall health outcomes:
> As shown in Table 1, compared with participants with three meals per day, participants eating fewer than three meals per day were more likely to be younger, men, non-Hispanic Black, with less education and lower family income, current smokers, heavy alcohol drinkers, higher physical activity levels, lower total energy intake and lower diet quality, food insecure, and higher frequency of snacks.
They control for these factors in the statistical analysis, to different extents in three different models:
> Model 1 adjusted for age, gender, and race and ethnicity. Model 2 additionally adjusted for education, income, smoking status, alcohol intake, physical activity levels, total energy intake, HEI-2010 score, household adult food insecurity status, and snacks frequency. Model 3 further adjusted for baseline diabetes, hypertension, hypercholesterolemia, CVD, cancer, and BMI status, because these variables may be mediators between meal frequency, intervals and timing, and mortality.
This is an issue. Controlling for the variables in the model helps somewhat, but it's inevitable that the populations are also different in other relevant ways that are not captured in the covariates (e.g. stress levels).
They're also reporting dozens of results (18 in Table 2, another 18 in Table 3, 12 in Table 4) but doing no correction for multiple testing. Given that many of the confidence intervals only barely exclude 1, this is an issue.
Finally, and most damning--the effects almost entirely disappear if they exclude people who had cancer or cardiovascular disease at the beginning of the study. This is hidden in the supplementary material of the paper, not discussed at all within the main body of the paper. Of the effects cited in the abstract the only one that survives excluding people who had cancer or cardiovascular disease at baseline is that people who skip breakfast appear to have higher risk of cardiovascular disease-related mortality.
Given that the study is labelled "prospective" and one of the outcomes studied is death from cardiovascular disease, IMO it is dishonest to not note that the claimed effects disappear if you exclude people who already had cardiovascular disease when the study started.
Ha, similar to how the ketogenic diet was only really used by epileptics and people with similarly severe conditions before about 10 years ago. So eating keto was heavily associated with all kinds of severe problems but it was reverse causation.
Like how "1-2 drinks per day" was healthier than none... because a lot of the people in the "none" category were teetotaling because they were chronically ill, and once you correct for that, the apparently-beneficial effects of that daily glass of wine vanish.
I was wondering if anyone would catch that! Yeah, it's definitely part of the "fiction" here. L-glucose can bind to some things, e.g. taste receptors. "Nothing in nature can interact with it" was definitely an over-simplification
You know, someone else gave me similar feedback as well. I'm kind of reluctant to make it longer, but might be a better story if it is. Like: maybe have something about the government responses to it and what happens, etc, work some of the explanations in there. Going to keep this in mind, appreciate it!
Sweet, I actually love that! Upregulating a plant pathogen feels like it could be disastrous. If I had a better grasp of ecology I'd have thought of more examples like, haha. Might include it in a future iteration if you don't mind!
You're right that we're limited in how much news we can pull. Generally, we can only look about 90 days into the past for news articles.
I'd definitely like to expand the corpus of information that we can pull from. Getting access to reliable historical data is high on that list, as it will dramatically improve base rate estimation.