Introducing R to a non-programmer in one hour

jebus989 · on Jan 8, 2014

Per the comments on the article I'd also advise "<-" for assignment rather than "=", reasons (in no particular order):

1) Google says so: http://google-styleguide.googlecode.com/svn/trunk/Rguide.xml

2) It's the accepted convention of the language. When she looks things up of SO, they'll use it; if she looks at a package's source, they'll use it.

3) In RStudio it's "option+hypen" (Mac) to type the four-character " <- ", so no extra typing required.

4) Statistically-speaking, things like x = x + 1 makes no sense

5) You need "<<-" for the occasional global

6) 2 -> x

xerula · on Jan 8, 2014

Also "=" is compulsory for setting some named function arguments that have a default value. E.g. format(pi, digits <- 16) only prints with the default number of digits, format(pi, digits=16) works as expected. I think it's nice to have a syntax that distinguishes between object and argument assignment.

Also you can do things like this when you're bored:

((a <- b <- 1) + 1 -> c) + 1 -> d

(Someone also once told me that "=" for normal assignment sometimes fails in unexpected ways in tryCatch blocks, but I don't know how or why.)

anaphor · on Jan 8, 2014

Also, a language shouldn't use the same operator for declaration (initialization in dynamic languages) and assignment imho.

alecdbrooks · on Jan 8, 2014

Just to be clear, R doesn't ever use the same operator for both, it uses == for comparison like C and Python. That being said, extra differentiation is a bonus.

flyingbrotus · on Jan 8, 2014

"The operators <- and = assign into the environment in which they are evaluated. The operator <- can be used anywhere, whereas the operator = is only allowed at the top level (e.g., in the complete expression typed at the command prompt) or as one of the subexpressions in a braced list of expressions."

danso · on Jan 8, 2014

There's not much you can actually teach to someone in an hour, especially about programming. So much of it is dependent on other computer skills...I once taught a class for five hours of introductory programming, and while some people learned things, we were frequently caught up in things like people saving filenames of "folder/hello.rb"...that is, the forward-slash was part of the filename, not the path...which I didn't even know was possible on any platform.

But what you can get across, and the OP seems to have done this, is how accessible programming can be. I don't know about RStudio, but jumping into a REPL is easy enough in any modern languages, as is running a script. We take it for granted, but very very few people who haven't programmed realize that a "program" is simply composed of text. Hell. most people don't understand that about web pages.

So even if you won't teach much in an hour, overcoming that intimidation factor of "where/how do I even start?" is still pretty huge.

minimaxir · on Jan 8, 2014

RStudio is the best IDE I've ever used and I wish something like that existed for other programming languages. The default workspace is split into the script, data, terminal, and meta (files/charts/help), and selecting a data frame variable will display it in a formatted window, making it very, very easy to debug data issues (I'm attempting to switch from R to Python/pandas due to performance/package issues, and I'm having much more difficulty debugging code.)

catinsocks · on Jan 8, 2014

Check out Spyder if you are doing Science stuff with python: https://code.google.com/p/spyderlib/

minimaxir · on Jan 8, 2014

Actually, that looks right up my alley. Thanks!

Fomite · on Jan 8, 2014

RStudio is amazing - and has been key to several of my colleagues stepping away from SAS and toward R.

dded · on Jan 8, 2014

> that is, the forward-slash was part of the filename, not the path...which I didn't even know was possible on any platform

Nor did I. Which platform?

danso · on Jan 9, 2014

Don't remember for sure, might have been a Windows machine. I just tried it on my Mavericks OS X. It was allowed, but instead of:

       sdf/index.html

it auto-translated to this:

       sdf:index.html

(i'm assuming that's not a Sublime Text 3 thing, which is what I just tested on)

dded · on Jan 9, 2014

This fails on a Mac (10.9) and on CENTOS5 (2.6.18):

    #include <unistd.h>
    #include <fcntl.h>
    #include <stdio.h>

    int main()
    {
        int fd = creat("foo/bar", 1);
        if (fd == -1) {
            printf("error\n");
        } else {
            close(fd);
        }
        return 0;
    }

craigching · on Jan 8, 2014

Nice, brief, introduction to R. If I might be so bold, I might suggest taking the Coursera "Computing for Data Analyis" class. It's only 4 weeks long and the first lesson covers everything in this blog post and more (I'm actually taking this class now). Though the first class is a bit longer than one hour (but not by much!).

VLM · on Jan 8, 2014

As an intro, its well written and documented.

If there is a fail, his sister is a senior in sociology and she "must have" already done significant statistical analysis in class... right? At the very least did some averages using her calculator in high school or something? So there should be substantial opportunity for compare/contrast, assuming his sister is compatible with this learning technique and the author knows anything about what the sister already knows. That would make a remarkably awful non-specific general population introduction.

Also some focus on what she's doing... I did not find generic screwing around with R to be particularly challenging, I found "how do I connect R to my enormous database cluster in an intelligent fast simple way" to be a huge problem when starting out. OK I can average and std dev five hand entered numbers, that's very easy, now how about a complicated multi-site relational database with 1M records? She may have a totally different challenge, like all her raw data is CSV format or something.

R is fun once you climb that vertical learning curve wall at the start and generate your first live result. The learning curve is very easy after that first result.

flyingbrotus · on Jan 8, 2014

R has dedicated drivers for every database that I've ever had to use and when that doesn't work, there's the RODBC package. Working with "complicated multi-site relational database[s]" is always a pain in the ass, so I don't see why the burden of simplifying this should fall on R.

There are tools (such as RHIPE) to connect R to multi-node Hadoop clusters so that complex statistical algorithms can be written in R and compiled as MapReduce jobs. This type of tool seems like what you wish R had, but we're a ways away from having this for every distributed data source...

vcrash · on Jan 8, 2014

The author of this piece is a woman.

VLM · on Jan 8, 2014

Luckily it doesn't change my response in any way other than in addition to all the "he" translating to "the author" all the "she" should refer to "the author's sister". I'm a "he" who provided software support to my sister, so you can see the likely source of that assumption. No negative implication of her ability to successfully portray her chosen gender role intended.

Also I think its lame to check out bylines to make sure I run stuff thru the right sexism filter; I clearly stated its a "good" article; not "good, for a girl" article, or "not up to the manly standards of excellence" article.

pessimizer · on Jan 8, 2014

You don't have to run a "sexism filter" as long as you don't assume that every speaker (whether technical or not) is a man. If you think that the grammatically weird "they" or "he/she" is awkward, and "OP" or "the author" too cold, you've restricted yourself to checking bylines.

Fomite · on Jan 8, 2014

It's not a byline. It's the name of the website, the name of the window when opened in Safari, it's got her picture right next to the text.

You might not have belittled her because she's a woman, but simply assuming "Programming article = male by default" is in and of itself harmful.

on Jan 8, 2014

[deleted]

Fomite · on Jan 8, 2014

Lets try p(female|has authored programming article on a blog called 'http://alyssafrazee.com' with a woman's picture on the side).

VLM made a mistake. They were told, rather gently, of their mistake. "My mistake, edited" would have been fine. Instead they doubled down that they were totally right and fine and cool with blazing past ample evidence of the author's gender in pursuit of "Male as Default".

VLM · on Jan 8, 2014

True, I made a minor grammatical mistake, and when information gathering I read the author's work instead of meditating on her reproductive organs, but that is utterly irrelevant to my claim about the article. I still claim I was correct when I identified the article as a well written general audience tutorial although in my personal experience, tutoring works best as an "extreme customization" presentation not "here's something very general from someone who happens to be your sibling". I don't see any particular reward or advantage in distracting from that analysis of the article.

I do disagree with the interpretation, that doubling down, would have been changing the focus of the article analysis yet again, to something even more gender oriented and even less article oriented such as "the most important topic to discuss right now is if vcrash is male or female? Because that's apparently important somehow other than to vcrash's partner?"

flyingbrotus's reply about the RODBC package was awesome because it was on the topic of the discussion and relates directly to the article and busting thru the vertical learning curve of R.

I may very well be an idiot; I'm OK with that.

However, being an idiot would (appear to) have nothing to do with "That's a good article; also, here's another strategy that's worked slightly better". If that specific individual claim about the article were dumb, discussion of my idiocy would be completely on topic.

On the other hand, grammar errors and irrelevant research errors may very well indicate I'm an idiot, but it has nothing to do with the topic. That's why I was pretty cool with it. I'm never going to be hired as a proofreader by anyone not totally desperate and I'm cool with that.

Fomite puts one space after each period. That is wrong, and henceforth we will solely discuss that yawn inducing aspects of that topic. Or maybe not. I'd rather talk about the article, or learning R, or teaching R, or teaching people, or teaching siblings...

Fomite · on Jan 8, 2014

Someone noticed, and gently corrected you. If you had wanted to, a very simple edit would have been the end of it. Instead, you went on some weird tangent about sexism filters and how it's A-OK, and now you're upset that we're on the tangent?

VLM · on Jan 8, 2014

OK Fomite, I think we are repeating ourselves, so agreement is highly unlikely although I'm sure civil coexistence is likely.

The author is a heck of a good writer about a topic I like, and also is female, and we can each be interested in one of those topics, its a big internet and we'll all fit in somehow with plenty of space.

Have a pleasant day, and I hope you enjoyed the article.

ndr · on Jan 9, 2014

its -> it's

pessimizer · on Jan 8, 2014

If p(female) was 5%, would it be safe to assume that everyone is a man?

groby_b · on Jan 8, 2014

It's not checking you against a sexism filter - it is subtly pointing out the pervasive gender assumptions society has.

It's in all likelihood an innocent mistake on your side - but the accumulation of those tiny things is, in the long run, "not fun"(tm). It doesn't matter that they're innocent mistakes, it matters that it is a constant thing.

So, please, don't consider this an attack, but a polite request to double-check the next time you make a gendered reference to somebody. Every time you do check, you'll have made life a tiny bit better for the women in the field.

thrownaway2424 · on Jan 8, 2014

In my experience, non-programmers are the only people who can grok R in the first hour.

Crito · on Jan 8, 2014

I'm not so sure. A few years ago in university I took a stats course (offered by the maths department) that had twice-weekly mandatory labs where we learned and used R. Most people in the course were not programmers at all, and I ended up helping people most of the time during the course.

R is really weird, but non-programmers are typically unaccustomed to precisely typing non-english into a computer. They aren't trained to quickly spot syntax errors or grok error messages.

Fomite · on Jan 8, 2014

My own experience was that I came from statistical languages like SAS, bounced right off R, ended up learning Python and then came back to R and was like "ohhhh…"

It's an odd language, that makes assumptions neither group really hold dear to their heart, but it's also intensely useful.

noahmarc · on Jan 8, 2014

For someone with a programming background, this is one of the best introductions I've seen: http://medianetwork.oracle.com/video/player/2623621262001

minimax · on Jan 8, 2014

That's a very good way of putting it!

dangoldin · on Jan 8, 2014

Nice post. I've recently been trying to get back into R with the goal of being able to replace Excel. For the basic stuff R is great but as you start doing more complicated things it takes quite a while to figure out to do them in R. This is a part of the learning curve.

dded · on Jan 8, 2014

I really like Excel for adding up columns of numbers. Any task where you want to see the data (and the quantity of data is low enough to realistically be seen) and the operations performed on the data are relatively simple and straight-forward.

But once the operations start getting more involved, I really start disliking Excel (or any spreadsheet). Excel shows the data but hides the formulas. At this point, I very much prefer a script.

dangoldin · on Jan 8, 2014

Something I've been thinking about is an Excel frontend with an R backend. Each worksheet becomes a data frame so you can edit it in Excel and quickly apply some formula but you can also jump into an R console to do something more complicated.

jofer · on Jan 8, 2014

Have a look at DataNitro for a similar idea with python as the backend: https://datanitro.com/

I've never used it, but it looks quite nifty.

It would certainly be interesting to see something similar for R.