Well, R isn’t the best language when it comes to building systems. Most R code i...

disgruntledphd2 · on April 30, 2021

> Well, R isn’t the best language when it comes to building systems. Most R code is essentially one file written to produce an output once (for a paper, project, etc.). This means that people want a better language to build systems which Python fit. That explains why people moved to Jupiter.

I definitely agree that Python is a better general purpose computing language than R, but R's deployment story (i.e. packages) is much, much better than that of Python (pip/poetry/pipenv/conda/whatever came out this week). I honestly don't think that's the reason though, it's more that Python has much, much, much better developer mindshare.

Jupyter is a whole other world though, like iPython was the best thing ever as a proper REPL for python, and Jupyter was good for being able to do graphics with your code. That was all standard in the R world, with Sweave (which I wrote my thesis in), so it didn't appear to add a lot of value (to me, at least).

> I don’t like RStudio for the same reason I don’t like Matlab. I already have my editor and terminal workflow. I don’t want to use/learn a new tool for the privilege to use the language.

I am 100% with you on this, but Rstudio is just a nicer interface over the tools for literate programming in R, and the wonderfulness of Rmd vs ipynb is a thing of joy (to me, at least).

> Mainly, running cells out of order is just an incredibly dumb thing to be possible. This same problem is present in RStudio which you seem to enjoy (highlight and REPL) and you want it in other languages. If the code isn’t written to run in an order, a tool shouldn’t allow it.

So, this is a tricky one. I agree in principle, and I have a habit of continually re-running my documents to ensure that this doesn't cause problems, but there is definitely valid use-cases for out of order execution. Consider that you may often fit a model (which can take ages) and iterate on the visualisation/analysis code, but you don't want to re-run the modelling code every time you change a plot, which your solution would require.

Most of the tools claim to allow you to cache particular blocks, but I've never been able to get it to work reliably.

extr · on April 30, 2021

Yeah, I find that the out-of-order execution issue is common with people who have a software development mindset, but for data analysis/science is basically the only sensible way to work. The "load data" command might be one line but takes 3 minutes to run, while a huge chunk of code that plots the data might take 1 second and I might want to tweak it 50 different ways before settling on something that I like/delivers insight. Producing a standalone script that develops the same insight you get from "playing" with the data is an afterthought in some cases.

disgruntledphd2 · on April 30, 2021

As long as you're aware of the dangers, it's fine. Personally I try to model offline from analysis to avoid this issue, and set eval to no in org for those cases where I've built the model inline with the analysis.

Unfortunately, it generally takes a couple of terrible situations before people learn the problems with this.

hervature · on April 30, 2021

I agree that data analysis needs a tool to persist data while iterating over certain functions. But in this vein, said tool should aim to try to prevent the user from having to run the load_data() function more than once. Not encourage it by allowing someone to permanently manipulate the output of load_data().

disgruntledphd2 · on April 30, 2021

This is an option in many tools, but it doesn't tend to work that well in practice.

I do agree that this is the ideal though (As an example if Pluto is always reactive, then this workflow becomes much more difficult as when you change a downstream datapoint, the model will be re-run).