Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Introducing R to a non-programmer in one hour (alyssafrazee.com)
127 points by mcenedella on Jan 8, 2014 | hide | past | favorite | 36 comments


Per the comments on the article I'd also advise "<-" for assignment rather than "=", reasons (in no particular order):

1) Google says so: http://google-styleguide.googlecode.com/svn/trunk/Rguide.xml

2) It's the accepted convention of the language. When she looks things up of SO, they'll use it; if she looks at a package's source, they'll use it.

3) In RStudio it's "option+hypen" (Mac) to type the four-character " <- ", so no extra typing required.

4) Statistically-speaking, things like x = x + 1 makes no sense

5) You need "<<-" for the occasional global

6) 2 -> x


Also "=" is compulsory for setting some named function arguments that have a default value. E.g. format(pi, digits <- 16) only prints with the default number of digits, format(pi, digits=16) works as expected. I think it's nice to have a syntax that distinguishes between object and argument assignment.

Also you can do things like this when you're bored:

((a <- b <- 1) + 1 -> c) + 1 -> d

(Someone also once told me that "=" for normal assignment sometimes fails in unexpected ways in tryCatch blocks, but I don't know how or why.)


Also, a language shouldn't use the same operator for declaration (initialization in dynamic languages) and assignment imho.


Just to be clear, R doesn't ever use the same operator for both, it uses == for comparison like C and Python. That being said, extra differentiation is a bonus.


"The operators <- and = assign into the environment in which they are evaluated. The operator <- can be used anywhere, whereas the operator = is only allowed at the top level (e.g., in the complete expression typed at the command prompt) or as one of the subexpressions in a braced list of expressions."


There's not much you can actually teach to someone in an hour, especially about programming. So much of it is dependent on other computer skills...I once taught a class for five hours of introductory programming, and while some people learned things, we were frequently caught up in things like people saving filenames of "folder/hello.rb"...that is, the forward-slash was part of the filename, not the path...which I didn't even know was possible on any platform.

But what you can get across, and the OP seems to have done this, is how accessible programming can be. I don't know about RStudio, but jumping into a REPL is easy enough in any modern languages, as is running a script. We take it for granted, but very very few people who haven't programmed realize that a "program" is simply composed of text. Hell. most people don't understand that about web pages.

So even if you won't teach much in an hour, overcoming that intimidation factor of "where/how do I even start?" is still pretty huge.


RStudio is the best IDE I've ever used and I wish something like that existed for other programming languages. The default workspace is split into the script, data, terminal, and meta (files/charts/help), and selecting a data frame variable will display it in a formatted window, making it very, very easy to debug data issues (I'm attempting to switch from R to Python/pandas due to performance/package issues, and I'm having much more difficulty debugging code.)


Check out Spyder if you are doing Science stuff with python: https://code.google.com/p/spyderlib/


Actually, that looks right up my alley. Thanks!


RStudio is amazing - and has been key to several of my colleagues stepping away from SAS and toward R.


> that is, the forward-slash was part of the filename, not the path...which I didn't even know was possible on any platform

Nor did I. Which platform?


Don't remember for sure, might have been a Windows machine. I just tried it on my Mavericks OS X. It was allowed, but instead of:

       sdf/index.html
it auto-translated to this:

       sdf:index.html


(i'm assuming that's not a Sublime Text 3 thing, which is what I just tested on)


This fails on a Mac (10.9) and on CENTOS5 (2.6.18):

    #include <unistd.h>
    #include <fcntl.h>
    #include <stdio.h>

    int main()
    {
        int fd = creat("foo/bar", 1);
        if (fd == -1) {
            printf("error\n");
        } else {
            close(fd);
        }
        return 0;
    }


Nice, brief, introduction to R. If I might be so bold, I might suggest taking the Coursera "Computing for Data Analyis" class. It's only 4 weeks long and the first lesson covers everything in this blog post and more (I'm actually taking this class now). Though the first class is a bit longer than one hour (but not by much!).


As an intro, its well written and documented.

If there is a fail, his sister is a senior in sociology and she "must have" already done significant statistical analysis in class... right? At the very least did some averages using her calculator in high school or something? So there should be substantial opportunity for compare/contrast, assuming his sister is compatible with this learning technique and the author knows anything about what the sister already knows. That would make a remarkably awful non-specific general population introduction.

Also some focus on what she's doing... I did not find generic screwing around with R to be particularly challenging, I found "how do I connect R to my enormous database cluster in an intelligent fast simple way" to be a huge problem when starting out. OK I can average and std dev five hand entered numbers, that's very easy, now how about a complicated multi-site relational database with 1M records? She may have a totally different challenge, like all her raw data is CSV format or something.

R is fun once you climb that vertical learning curve wall at the start and generate your first live result. The learning curve is very easy after that first result.


R has dedicated drivers for every database that I've ever had to use and when that doesn't work, there's the RODBC package. Working with "complicated multi-site relational database[s]" is always a pain in the ass, so I don't see why the burden of simplifying this should fall on R.

There are tools (such as RHIPE) to connect R to multi-node Hadoop clusters so that complex statistical algorithms can be written in R and compiled as MapReduce jobs. This type of tool seems like what you wish R had, but we're a ways away from having this for every distributed data source...


The author of this piece is a woman.


Luckily it doesn't change my response in any way other than in addition to all the "he" translating to "the author" all the "she" should refer to "the author's sister". I'm a "he" who provided software support to my sister, so you can see the likely source of that assumption. No negative implication of her ability to successfully portray her chosen gender role intended.

Also I think its lame to check out bylines to make sure I run stuff thru the right sexism filter; I clearly stated its a "good" article; not "good, for a girl" article, or "not up to the manly standards of excellence" article.


You don't have to run a "sexism filter" as long as you don't assume that every speaker (whether technical or not) is a man. If you think that the grammatically weird "they" or "he/she" is awkward, and "OP" or "the author" too cold, you've restricted yourself to checking bylines.


It's not a byline. It's the name of the website, the name of the window when opened in Safari, it's got her picture right next to the text.

You might not have belittled her because she's a woman, but simply assuming "Programming article = male by default" is in and of itself harmful.


[deleted]


Lets try p(female|has authored programming article on a blog called 'http://alyssafrazee.com' with a woman's picture on the side).

VLM made a mistake. They were told, rather gently, of their mistake. "My mistake, edited" would have been fine. Instead they doubled down that they were totally right and fine and cool with blazing past ample evidence of the author's gender in pursuit of "Male as Default".


True, I made a minor grammatical mistake, and when information gathering I read the author's work instead of meditating on her reproductive organs, but that is utterly irrelevant to my claim about the article. I still claim I was correct when I identified the article as a well written general audience tutorial although in my personal experience, tutoring works best as an "extreme customization" presentation not "here's something very general from someone who happens to be your sibling". I don't see any particular reward or advantage in distracting from that analysis of the article.

I do disagree with the interpretation, that doubling down, would have been changing the focus of the article analysis yet again, to something even more gender oriented and even less article oriented such as "the most important topic to discuss right now is if vcrash is male or female? Because that's apparently important somehow other than to vcrash's partner?"

flyingbrotus's reply about the RODBC package was awesome because it was on the topic of the discussion and relates directly to the article and busting thru the vertical learning curve of R.

I may very well be an idiot; I'm OK with that.

However, being an idiot would (appear to) have nothing to do with "That's a good article; also, here's another strategy that's worked slightly better". If that specific individual claim about the article were dumb, discussion of my idiocy would be completely on topic.

On the other hand, grammar errors and irrelevant research errors may very well indicate I'm an idiot, but it has nothing to do with the topic. That's why I was pretty cool with it. I'm never going to be hired as a proofreader by anyone not totally desperate and I'm cool with that.

Fomite puts one space after each period. That is wrong, and henceforth we will solely discuss that yawn inducing aspects of that topic. Or maybe not. I'd rather talk about the article, or learning R, or teaching R, or teaching people, or teaching siblings...


Someone noticed, and gently corrected you. If you had wanted to, a very simple edit would have been the end of it. Instead, you went on some weird tangent about sexism filters and how it's A-OK, and now you're upset that we're on the tangent?


OK Fomite, I think we are repeating ourselves, so agreement is highly unlikely although I'm sure civil coexistence is likely.

The author is a heck of a good writer about a topic I like, and also is female, and we can each be interested in one of those topics, its a big internet and we'll all fit in somehow with plenty of space.

Have a pleasant day, and I hope you enjoyed the article.


its -> it's


If p(female) was 5%, would it be safe to assume that everyone is a man?


It's not checking you against a sexism filter - it is subtly pointing out the pervasive gender assumptions society has.

It's in all likelihood an innocent mistake on your side - but the accumulation of those tiny things is, in the long run, "not fun"(tm). It doesn't matter that they're innocent mistakes, it matters that it is a constant thing.

So, please, don't consider this an attack, but a polite request to double-check the next time you make a gendered reference to somebody. Every time you do check, you'll have made life a tiny bit better for the women in the field.


In my experience, non-programmers are the only people who can grok R in the first hour.


I'm not so sure. A few years ago in university I took a stats course (offered by the maths department) that had twice-weekly mandatory labs where we learned and used R. Most people in the course were not programmers at all, and I ended up helping people most of the time during the course.

R is really weird, but non-programmers are typically unaccustomed to precisely typing non-english into a computer. They aren't trained to quickly spot syntax errors or grok error messages.


My own experience was that I came from statistical languages like SAS, bounced right off R, ended up learning Python and then came back to R and was like "ohhhh…"

It's an odd language, that makes assumptions neither group really hold dear to their heart, but it's also intensely useful.


For someone with a programming background, this is one of the best introductions I've seen: http://medianetwork.oracle.com/video/player/2623621262001


That's a very good way of putting it!


Nice post. I've recently been trying to get back into R with the goal of being able to replace Excel. For the basic stuff R is great but as you start doing more complicated things it takes quite a while to figure out to do them in R. This is a part of the learning curve.


I really like Excel for adding up columns of numbers. Any task where you want to see the data (and the quantity of data is low enough to realistically be seen) and the operations performed on the data are relatively simple and straight-forward.

But once the operations start getting more involved, I really start disliking Excel (or any spreadsheet). Excel shows the data but hides the formulas. At this point, I very much prefer a script.


Something I've been thinking about is an Excel frontend with an R backend. Each worksheet becomes a data frame so you can edit it in Excel and quickly apply some formula but you can also jump into an R console to do something more complicated.


Have a look at DataNitro for a similar idea with python as the backend: https://datanitro.com/

I've never used it, but it looks quite nifty.

It would certainly be interesting to see something similar for R.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: