ResearchGPT: Automated Data Analysis and Interpretation

photochemsyn · on April 25, 2023

This seems like a fairly useful tool, but I'd be a bit cautious - the tradition of poring over a carefully collected and curated data set using tools you understand the strengths and weaknesses of shouldn't be lightly tossed aside. That process can help researchers spot unusual anomolies that lead to novel discoveries, while an automated tool might just discard all outliers.

Incidentally, the far more concerning issue is the use of approaches like this to generate data which opens the doors to a plague of hard-to-detect scientific fraud. In that past, many such high-visibility fraudulent efforts have been detected because the fraudsters duplicated data (or reversibly processed old data in some manner) that was spotted by others in the field, e.g.

https://en.wikipedia.org/wiki/Sch%C3%B6n_scandal

Often these fraudulent productions are inspired by the desire to be first to publish, a situation in which everyone thinks they know how a system works but they're all rushing to get credit (and hence Nobel Prizes and patents etc.) by generating the data from a 'successful experiment' before anyone else can.

cl42 · on April 25, 2023

100% agree with you. There are two things driving my work with this demo:

(1) A lot of researchers are bad at writing code, but they can audit it. This is true for sociologists, psychologists, etc. so I'm hoping something like this can help.

(2) Philosophically, I disagree with the debates that LLMs can't produce new knowledge. I think there's merit to this if we're talking about whether the LLM neural network itself synthesizes new knowledge via its weights... However, why can't we have an LLM try and merge multiple data sets, analyze them, and report back to a human?

To your point + concerns, I think a human still needs to be very careful and actually revisit the analysis for any promising findings, but at least some of the grunt work can be taken care of!

jkh1 · on April 25, 2023

> A lot of researchers are bad at writing code, but they can audit it.

Can they really? It seems to me that users of this would be those that can't write the code they need. How would they be in a position to audit what they get?

cl42 · on April 25, 2023

I'll use myself as an example. I love Pandas + Scikit Learn, but am by no means an expert. Every time I want to build a logistic regression, I have to go back to the docs to review the API.

When I was doing my Master's degree in "Social Science of the Internet" at the Oxford Internet Institute (a sociology + data science program, with many students coming from non-STEM backgrounds), everyone was comfortable debating P values, standard errors in regressions, etc. but many students were extremely intimidated by reading Python docs and/or using the REPL.

cl42 · on April 25, 2023

Thanks for posting this!

I'm the creator of ResearchGPT. A few things folks here might appreciate:

(1) You don't have to share data with the LLM provider to use this; it only shares metadata about your data set

(2) The demo uses Anthropic's Claude, rather than OpenAI's ChatGPT, but you can use our library to swap out any LLM

(3) It's open source!! Woo!

wyem · on April 25, 2023

Seems very promising. I've used Claude a few times and the responses seemed significantly better than GPT-4.

cl42 · on April 25, 2023

I don't have GPT-4 API access so can't comment on that yet. The code quality via chat has been comparable.

One of my goals in the coming weeks (when I get access) is to formally run tests across the models.

cube2222 · on April 25, 2023

It's a really cool area of putting AI in a feedback loop (langchain-like) with its own tools, which I think is where the magic happens, and where we'll see much more happening in the future. This should really super-charge engineers doing stuff in areas where they're not super-comfortable in, but comfortable enough to verify the AI isn't doing anything stupid.

I made something vaguely similar for your local terminal[0] and other locally-available tools.

The idea is to give you a chat with an assistant that can use these local tools. Here it's Python for data analysis, in my case it's more "give it access to your terminal, so it can answer questions / do tasks on your local machine" which is something web-based options can't do right now.

I.e. ask it about your system details (processes, wifi) or to do things (configure something). Have it automatically run the relevant commands, analyze the output, and respond either in natural language or i.e. plot a chart.

AutoGPT[1] is another very interesting project in this area.

[0]: https://github.com/cube2222/cuttlefish

[1]: https://github.com/Significant-Gravitas/Auto-GPT

Imnimo · on April 25, 2023

>https://github.com/wgryc/phasellm/blob/main/demos-and-produc...

Asking the LLM if it "understands" and only proceeding if it says yes feels very weird to me. Do we really expect the LLM to be able to introspect in that way and give a meaningful answer?

cl42 · on April 25, 2023

Nope! I'm not trying to suggest the LLM legitimately understands my query from a conceptual perspective via that question.

What I am doing in that prompt is ensuring the LLM can follow the instructions. I specifically ask it to write "yes" if it does. If it can't do that part, then I don't want it to even generate code or try to analyze my data.

Hence why I treat it as an assertion failure if it can't follow that instruction, and thus exit the app.

Thanks for reading the code + for the very thoughtful question.

r-zip · on April 25, 2023

Deep nets are notoriously overconfident in their predictions. So why do you expect this approach to succeed?

cl42 · on April 25, 2023

The broader package I’m working on (PhaseLLM) is specifically focused on devtools for observability and robustness of LLM-powered products. I agree with you that there are lots issues with LLMs and taking them to production. I’m hoping products like this + making them robust will help improve the research as well as the UX.

nullsense · on April 26, 2023

>Deep nets are notoriously overconfident in their predictions

Sounds a lot like humans

davidktr · on April 25, 2023

I'm not sure about this approach. From what I have seen, most researchers have no idea how to get their data in a format which can be efficiently analysed.

Once you have that, it's trivial to do any kind of statistical analysis. In R, a regression is simply lm(y ~ x1 + x2 + ... + xn).

You can always look up how an API works, but thinking about data in terms of structures is what hinders effective analysis in most cases.

cl42 · on April 25, 2023

Totally appreciate the feedback and I agree with you that a well structured data set can be trivially analyzed. Heck, at that point you can use drag and drop stats packages too.

The data set I used for the demo has strings for income categories and a mix of categorical variables that the LLM had to transform, which is incredibly promising.

The insights that Claude generated also imply that it can do follow-up analysis.

This is less of a “hey write my regression code for me” and more of a “suggest the analysis, do it, find insights, and run follow up analyses”. That’s way more powerful and interesting.

davidktr · on April 25, 2023

Good points, and I really appreciate your work. You are addressing a real problem.

I'm simply skeptical that someone who can make use of Claude (or any LLM) for data analysis is in need of making use of it. Let's hope I'm too pessimistic here.

cl42 · on April 25, 2023

Oooh, if that’s your concern then give me a few months to launch a product. :-)

reacharavindh · on April 26, 2023

Anecdotally, my wife - a researcher in management accounting who does a lot of analysis of corporate data was very excited about this tool because it allows her to explore the dataset in almost natural language and have a starter Python code base to tinker with.

I have seen her use Python. She uses it like a research notebook. Sequential pipeline like analysis steps. Any little change to a step, and she will run the whole thing :-)

mnky9800n · on April 25, 2023

I did something similar to this but got stuck that the code generated would sometimes work, sometimes not for identical prompts. I also found that as an expert in the topics it was easy to write a prompt that would generally build a reasonable data pipeline but I couldn't imagine if I just had some data, but not the expertise, I could do the same. How do you account for these issues?

cl42 · on April 25, 2023

Fantastic questions! Re: working/not working at times -- this is still an issue. It's why I'm building PhaseLLM more broadly (https://github.com/wgryc/phasellm) -- need a robust pipeline that can also "reset" parts of itself if an LLM makes errors or mistakes.

You can see my prompts in this file: https://github.com/wgryc/phasellm/blob/main/demos-and-produc... I autogenerate a fairly big starting prompt and keep resubmitting it. It describes the data set extensively, which helps quite a bit.

That being said, a lot more can be done here around prompt optimization + making this more robust.

cuuupid · on April 25, 2023

Reminds me of this: https://www.palantir.com/platforms/aip/

I think there’s a lot of value here in empowering business users or more operational folks to use data without needing familiarity with a tool or language meant for data science

yawnxyz · on April 26, 2023

Someone should build a more home-grown / consumer-focused and less B2B version of that tool

arthurcolle · on April 25, 2023

Typo: "There are muliptle prompts "

cl42 · on April 25, 2023

Thanks! Fixing shortly.