Hacker Newsnew | past | comments | ask | show | jobs | submit | more crowding's commentslogin

I've used Matlab for about 15 years and R for about 3. Matlab burned me pretty badly during my graduate career; two weeks after deciding to learn R, I'd recreated a data analysis/modeling/graphing problem I'd previously done in MATLAB, using about one third the amount of code in R. Perhaps R and its available libraries better suit the way I think about problems, but most people in my department agree I code circles around them in Matlab, and I've since repeated the experiment with similar results.

So, some reasons a Matlab user might want to consider learning R:

-- R is currently the lingua franca for academic statisticians. New methods papers, textbooks, and toolkits are much more likely to ship with R libraries and implementations then Matlab or anything else.

-- Speaking of statistics, the MathWorks' most recent revamping of the Statistics Toolbox is an obvious imitation and pale shadow of base R, giving you a more verbose way to do half of what base R does for statistics, and then R has everthing available on CRAN to add to that.

-- R is a smaller, yet more expressive language. To be fair, it has about the same density of WTF and non-orthogonality as Matlab (which see, Patrick Burns' "The R Inferno") but makes up for it by being much better at functional programming, and by having more of the Lisp nature in general (R is homoiconic; you can write R code that manipulates R code). If you want object-oriented programming you're about equally screwed in both languages though. R has syntactic support for named/optional arguments to functions, as opposed to Matlab's horrible InputParser/nargin hacks.

-- and in general the idea of giving names to things (rows and columns of a matrix, individual elements, function arguments) is pervasively supported in R. Matlab doesn't even have a decent approximation of a hashtable.

-- R is not quite as insistent on being its own universe. For example, you can write R scripts invokable directly from the shell, without jumping through awful "expect" style hoops and waiting 30 seconds for system startup / license server failure every invocation. For reproducible analysis you really want a build tool ( http://archive.nodalpoint.org/2007/03/18/a_pipeline_is_a_mak... ); so having your analysis scripts being callable by Make or SCons is a no-brainer.

-- Speaking of reproducibility, much of the "reproducible research" movement (which basically says, "hey, maybe scientific data analyses and papers should be done with version control, build automation, and maybe even testing, like software people have been doing since forever") is centered around R. I'm currently doing a project with "knitr," an R library that helps writing reproducible reports; if I want to talk about a particular graph or cite a p-value, I don't manually copypasta the data into a word processor; I write in my document the command to compute the value or plot the graph, and it gets updated whenever I render to a PDF. That ensures that results keep track of any changes in dataset or analysis.

-- The R community in general is more frank about its shortcomings and limitations, which might only be possible in a free software project. For both systems, you can say, word for word, that "there are a lot of awful decisions that (R/MATLAB) inherited due to (S/MATLAB)'s 1970s origins in (John Chambers/Cleve Moler)'s attempt to build a useful sort of interactive shell over Fortran numerics libraries, which it turns out should not be what you build a real programming language on top of." The difference is that the R folks will talk openly about the ways in which R sucks, but you won't get any such acknowledgement from the Mathworks.

-- R-help is both more active and contains smarter inhabitants than comp.soft-sys.matlab. Similarly for R/Matlab questions on StackOverflow.

-- R has a much better packaging system. Actually I should change the emphasis: R even has a packaging system. Libraries install from CRAN/Bioconductor/Rforce with one command, and installing them doesn't tromp all over the global namespace. The code is much higher quality than you find on the Mathworks File Exchange; most of the time I look on the File Exchange anything that looks like it solves my immediate problem hasn't been updated for 5 years and no longer works. CRAN on the other hand has maintainers who will remove packages that stop working. Consequently, people take more ownership of their packages.

-- ggplot2 is the best library in any language for taking your data and making a useful 2d plot out of it. I've written hundreds of lines of Matlab to build graphics that are a couple of phrases in ggplot. On the other hand, Matlab is better at 3d graphing (which I hardly use) and interactive graphics (but there's a lot of people attacking that on the R side.)

-- Simlarly I've written hundreds of lines of Matlab to do data manipulation operations that are like breathing air with Hadley's other great library, plyr.

-- Downsides? R is somewhat slower (if you want to compare two laughably slow languages; we're talking roughly CPython vs Ruby). Matlab has a better IDE with better debugging facilities. Depending on your field you might have more colleagues that are familiar with Matlab (true for engineering, definitely false for statistics.) R's online help is harder to navigate, which lends it a somewhat more difficult learning curve. Actually creating a package to distribute your code is pretty hard to figure out.


> R is somewhat slower Actually it's MUCH slower, up to a point of being entirely not usable for very large datasets (even ~100GB). True, much of MATLAB speed comes from using highly optimized BLAS (Math Kernel Library by Intel). But not just it. R lacks JIT optimization and numerous attempts to add it were unsuccessful. In fact it's so bad that Ross Ihaka, one of R's creator, proposed to "simply start over and build something better". See http://xianblog.wordpress.com/2010/09/13/simply-start-over-a...

TL;DR don't use R if you work with large datasets


They're both designed around completely in-memory arrays, which are passed around by-value with a copy-on-write scheme.

For R there is the bigmem package for mmapped arrays. And the "compiler" JIT packace is included since R 2.13.

I've seen that link before. See above re: one group's willingness to talk about the shortcomings versus another organization's preference to paper over it with marketing.


"They're both designed around completely in-memory arrays, which are passed around by-value with a copy-on-write scheme." True, but that doesn't invalidate my point. The datasets I'm typically working with are quite large 300GB-1TB (I have 2TB ram on my main server). I've tried both R and MATLAB and R has been a disaster. Even to plot say 10 million points on a graph is a pain.


> R is currently the lingua franca for academic statisticians. New methods papers, textbooks, and toolkits are much more likely to ship with R libraries and implementations then Matlab or anything else.

Is this the same in industry? I've heard a professor say that SAS is more widely used.


The idea appears to be that array languages have builtin functions for operations that would require map/reduce/etc in the applicative programming paradigm. I agree that is convenient, but the downside is that only array operations that are supported this way are the ones that are anticipated by the language designer. If you want to split up an array in some way other than intended, you're back to applicative programming, so your language needs good support for that too.

I would argue that the more important thing that makes something an "array language" is that scatter/gather operations on arrays have syntactic support, as described here:

http://prog21.dadgum.com/141.html

As a personal anecdote: In my thesis work I switched most of my data analysis workflow from MATLAB to R. In terms of paradigms I'd say R is a slightly worse array-language (although it does have builtin syntax for things that require messing with sub2ind() in MATLAB) but a much better functional language than MATLAB. The empirical result is that I finished my first analysis project -- including the time it took to learn R -- faster than it had taken me to do a similar task in MATLAB, using 1/3 the code. Add to that that there is presently more active development of open graphics and statistical analysis libraries in R, and I haven't really looked back, though I occasionally think of picking up an APL-derivative like J to play with.


the downside is that only array operations that are supported this way are the ones that are anticipated by the language designer. … I occasionally think of picking up an APL-derivative like J to play with.

You really should -- it provides a nice counterexample to the "downside" you worry about. Having mapping over arrays (even with mismatched dimensionality) built into the mechanics of function application means that any function the programmer writes is automatically supported by the array mapping.


I can see that from this distance -- I should have been more specific about is being a downside of the way MATLAB is only half-assedly array-oriented, not necessarily a feature of all array oriented languages.


That sounds really nice. I must try J.


>The idea appears to be that array languages have builtin functions for operations that would require map/reduce/etc in the applicative programming paradigm. I agree that is convenient, but the downside is that only array operations that are supported this way are the ones that are anticipated by the language designer. If you want to split up an array in some way other than intended, you're back to applicative programming

I can't see why this should hold for all array languages in general.

If the language has some mechanism to let you add new operation that are on the same "class / level" as the built-ins, then the above does not hold.


It also allows the file to be widely disseminated and mirrored before revealing what it contains, as an anti-DDOS measure.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: