Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A database is typically an alive, running program that requires maintenance. There is a strong coupling for most “databases” between disk format and executable code. It’s not easy to read from a random Postgres database sitting on disk. You could not do in Python, “import postgres” and then “postgres.open(‘mydb’)”.

I’m no data scientist, and have only worked with data lakes a couple times, but I can see why data science tends to be done with very predictable (if inefficient) data formats such as CSV, JSON, and JSONL.

Edit: SQLite is the best of both worlds. It’s a database, but it’s also “just a file.” It’s easy to work with, and many languages & frameworks are getting good support for it. SQLite’s reliability-first approach means many of the kinks that arise from involving databases (so much complexity!!) are ironed out and don’t arise as issues. (Things like auto-indexing, auto-vacuuming, avoiding & dealing with corruption, backwards & forwards compatibility, …)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: