Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In R this is rather straightforward, and you can also achieve it without additional libraries:

  weight <- read.csv("weight.csv")
  person <- read.csv("person.csv")
  merge(weight, person, all=TRUE)
Of course, nowadays you would use data.table, but still the merging logic would be exactly the same.


That loads all the data into memory, though. Fine if you're joining all of it, but with the Postgres example you can throw the whole SQL language at the data, and it will be streamed and pipelined efficiently just like with normal Postgres tables.


I never got deep into R as we had python or R as options in school for data projects so forgive me but where is the database join in this scenario?


Original article doesn't say anything about "database join". It's about joining two datasets by some common ID.

R in this case fits the bill and even allows for some relational algebra here (e.g. INNER JOIN would be merge(X, Y, all=FALSE), LEFT OUTER JOIN: merge(X, Y, all.x=TRUE), etc...)


Merge joins the two dataset on identical columns. All=True is an outer join.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: