Hacker News new | past | comments | ask | show | jobs | submit login

Maybe there is no real need for supporting large CSV files? Typically large amounts of data will be stored in a database (in which case you can query with SQL), or you will be using large-data oriented file formats like parquet. Excel's CSV support is just good enough for 99% of the real world use cases.



Large CSV files do occur 'in the wild'. Whether they should or not is beside the point. Sometimes CSV is the only option to import or export data from ancient 'Enterprise' horror systems, purely because it was easy for the original developers to implement. Excel's CSV support has been demonstrated to not be fit for the purpose, as one of the other commenters here points out.

I'd not heard of parquet before today, but a cursory glance reveals it to be a stupid format. It's sold as 'smaller than csv', but size isn't the problem CSVs are solving. It's that with the CSV format it's trivial to output or read data. With Parquet it's not.

I'd imagine if you were storing data on a server it would be better to import it into a proper database rather than storing it as a file on something like S3. Even compressing a CSV file with gzip would reduce the file size similarly and in a more standardized way if that's what you really need to do.


You'd hope so, but the UK government used Excel to manage some COVID data which it then lost because there were too many rows (65k+) for the format to handle.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: