Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A great article addressing a difficult idea. The article slightly overgeneralizes for simplicity.

One omission is that the author doesn't mention the "data table" concept when he talks about SQL tables. Data frames in R or Python/pandas map directly onto the SQL table idea.

And thus the article explains, without stating it explicitly, why Python is such a good data analysis language. You get the LISP "everything is an object pointer" data model in the core language[1], and FORTRAN arrays (numpy) and SQL tables (pandas) are for practical purposes part of the language. (One could also argue that COBOL records are dealt with by pandas, but in practice people use both namedtuples and pandas tables for large named records, and both are not perfect for this purpose.)

Having those "Big Three" data models at your fingertips in Python gives great power, and even though python is not fast, the big three data models make Python useful for almost all forms of data analysis.

[1]. When python says "everything is an object", it really means "everything is a pointer to an object". You can tell because everything is passed by reference, not value. Even "print(genericObject)" gives you the address of the object.



Thank you for the compliment.

That's a good point about R and pandas.

Nowadays numpy supports the nested-record style of memory organization, too: https://docs.scipy.org/doc/numpy/user/basics.rec.html

Whether numpy and pandas are part of the language depends greatly on how you use Python. There are still probably lots of people writing Django webapps without importing either of them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: