Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Early on, pandas made some unfortunate design decisions that are still biting hard. For example, the choice of datetime (pandas.Timestamp) represented by a 64-bit int with a fixed nanosecond resolution. This choice gives dynamic range of +- 292 years around 1970-01-01 (the epoch). This range is too small to represent the works of William Shakespeare, never mind human history. Using pandas in these areas becomes a royal pain in the neck, for one constantly needs to work around pandas datetime limitations.

OTOH, in numpy one can choose time resolution units (anything from attosecond to a year) tailoring time resolution to your task (from high energy physics all way to astronomy). Panda's choice is only good for high-frequency stock traders, though.



Pandas was started by a quant working for AQR Capital, so it's not surprising if "Panda's choice is only good for high-frequency stock traders".


An illustrative example of how reasonable short-term and narrow-scope considerations can be really bad in long-term and/or at a larger scope.


This assumes that all projects should be built with the larger scope in mind.

Sometimes you just need a shovel, not a Bagger 288.


Why should he care about other use-cases?

It’s not his responsibility to make sure his package is as wide as possible before opensourcing.


The problem is not with the Wes' original decision but with the fact that it was never revisited even when pandas took off at much larger scope. Should had been fixed before 1.0 release.


This belief is quite common in the Opensource space.

It’s far easier to criticize than it is to submit a pull request.


It's like there is some strange belief now that software should be "finished" before a 1.0 version now. When did that start?


I'm glad you posted about this because I didn't know, but my reflexive response was 'well guess that won't work for [project idea], guess I'll roll my own or just use the NumPy version.'

I personally don't mind the lack of one-size-fits-all. If Pandas were to be part of the Python Standard Library I think you'd have a stronger argument, since the unspoken premise of a SL is that you can leave for a desert island with only that and your IDE and still get things done.


Most data is not 300 years old or in the distance future, in fact ranges 1970+-292 years are very common. That is to say, panda's choice is good for lots of people, including outside high-frequency stock traders.


> Most data is not 300 years old or in the distance future, in fact ranges 1970+-292 years are very common.

In what domains? Astronomy, geology, history call for larger time range. Laser and High Energy physics need femtosecond rather than nanosecond resolution. My point is that a fixed time resolution, whatever it is, is a bad choice. Numpy explicitly allows to select time resolution unit and this is the right approach. BTW, numpy is pandas dependency and predates it by several years.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: