used to work in insurance and heavily used it.

disgruntledphd2 · on Sept 14, 2021

> Wait how many companies are actually using R in the wild? As I understand it, R is born of academia, great for statistics/analysis but breaks down on data manipulation and isn't used in production/data engineering.

It depends, I've worked in some places where R was the core part of their data infrastructure. Data manipulation (of non text) is far, far better in R.

Integrating with other systems can be tricky though, and you don't have the wide variety of Python libraries available for core SE tasks, so it can often make sense to use Python even though it's not as good for a lot of the core work.

Additionally, R is a very, very flexible language (like Python), but without strong community lead norms (unlike Python) so it's pretty easy to make a mess with it.

Finally, when you need to hand over stuff to software engineers, they vastly tend to prefer Python, so it often ends up being used to make this stuff easier.

Like, in R there's a core tool called broom which will pull out the important features of a model and make it really easy to examine them with your data. There's nothing comparable in Python, and I miss it so so much when I use Python.

That being said, working with strings is much much nicer in Python, and pytest is the bomb, so there's tradeoffs everywhere.

mrtranscendence · on Sept 14, 2021

It's unrelated to your main point, but:

> Additionally, R is a very, very flexible language (like Python)

I'd argue that R is much more flexible than Python syntactically. There's a reason that every attempt at recreating dplyr in Python ends in a bit of a mess (IMO) -- Python just doesn't allow the sort of metaprogramming you'd require for a really nice port. Something as simple as a general pipe operator can't be defined in Python, to say nothing of how dplyr scopes column names within verbs.

Arguably this does allow you to go crazy in a way that ends up being detrimental to readability, but I'd say overall it's a net benefit to R over Python. I really miss this stuff and have spent an undue amount of time thinking of the best way to emulate it (only to come up with ideas that just disappoint).

> Finally, when you need to hand over stuff to software engineers, they vastly tend to prefer Python

Indeed, this is maybe 50% of the reason my organization has pushed R to the sidelines over the past few years. We used to be very heavily into R but now it has "you can use it, but don't expect support" status.

mrtranscendence · on Sept 14, 2021

(Replying to disgruntledphd2)

> Well that's just lazy evaluation of function arguments, which can't be done in Python.

"Just lazy evaluation"! :) It's a pretty big deal. This is three-fifths of the way to a macro system.

> But if take a look at the Python data model, it does seem super, super flexible.

Sure, you can have a lot of control over the behavior of Python objects (some techniques of which remain obscure to me even after using Python for many years). But you don't have anything like syntactic macros. You can define a pipe operator with macropy, though -- it's pretty easy. But macropy is basically dead now I think (and a total hack).

> You'll still need strings for column names in any dplyr port though, because of the function argument issue.

This is major, though, because you can't do this:

    mutate(df, x="y" + "z")

You have to do something like what dfply does, defining an object that defines addition, subtraction, etc.

    mutate(df, x=X.y + X.z)

But that hits corner cases quickly. What if you want to call a regular Python function that expects numeric arguments? This won't work:

    mutate(df, x=f(X.y))

etc. Granted, this only really works in R because it's easy to define functions that accept and return vectors. So in that sense it's kind of a leaky abstraction. But you couldn't even get that far in Python, because X.y isn't a vector ... it's a kind of promise to substitute a vector.

Give Python macros, I say! To hell with the consequences!

dragonwriter · on Sept 14, 2021

> Sure, you can have a lot of control over the behavior of Python objects (some techniques of which remain obscure to me even after using Python for many years). But you don't have anything like syntactic macros.

Not yet, but there’s a PEP for that:

https://www.python.org/dev/peps/pep-0638/

mrtranscendence · on Sept 14, 2021

Nice, I'd love for this to see the light of day. I suspect it'll see some resistance (even pattern matching caused conflict, and I thought that was terribly innocuous).

(Why can I reply at this level of nesting now, whereas before I couldn't?)

disgruntledphd2 · on Sept 14, 2021

I'm totally with you on these points, and it's one of the places where R's genesis as a scheme program has lead to really, really good consequences.

Fundamentally though, both DS Python and R are abstractions over well-tested Fortran linear algebra routines (I'm sortof kidding, but only sortof).

disgruntledphd2 · on Sept 14, 2021

> I'd argue that R is much more flexible than Python syntactically. There's a reason that every attempt at recreating dplyr in Python ends in a bit of a mess (IMO) -- Python just doesn't allow the sort of metaprogramming you'd require for a really nice port. Something as simple as a general pipe operator can't be defined in Python, to say nothing of how dplyr scopes column names within verbs.

Well that's just lazy evaluation of function arguments, which can't be done in Python. But if take a look at the Python data model, it does seem super, super flexible. You'll still need strings for column names in any dplyr port though, because of the function argument issue.

Like, both Python/R derive from the CLOS approach (Art of the Metaobject Protocol), but R retains a lot more of the lispy goodness (but Python's implementation is easier to use).