Hacker News new | past | comments | ask | show | jobs | submit login

Great approach. I ctrl-F'd for databases, good info there generally. The only thing that gave me pause: a startup doesn't need to focus on SQL vs. NoSQL in 2025 with such good json support in the most popular SQL databases. Just use PostgreSQL or MySQL -- whichever your engineers have more experience with -- use CloudSQL or RDS which will take care of the hard stuff like backups and replication for you, use read replicas for BI with a good visualization tool, you'll be good with that for a good while before you need to fork over 5/6 figures for Snowflake or anything else.



> use read replicas for BI with a good visualization tool

Put up 2 or 3 read replicas, split your queries so writes happen to main and reads come from replicas (supported out of the box by many modern ORMs), and you can scale to millions in daily active users for most startup workloads.

Really the hard part of BI is that folks who need the info don’t wanna learn SQL. The ones who can do SQL, will struggle to keep up with your changing schema.


I give them Metabase. Metabase pointed to read-replica-3; and via Metabase API one can add lots of meta-data about tables and fields so the BI folk can point & click to build reports (and keep up with schema changes (which I mostly resolve with views anyway))


The hard part of BI is application developers not wanting to support a stable data model and changing the schema all the time, often made harder by BI people not knowing what they want and being stuck with a brittle integration.

Add analytics reporting views in your app database as the 'API' is the way.


> Really the hard part of BI is that folks who need the info don’t wanna learn SQL.

Data analysts are fine with SQL though. Every "get into data analysis as a career" course will teach you SQL (about 70% of what the querynomicon teaches [1]).

[1] https://github.com/gvwilson/sql-tutorial


> Data analysts are fine with SQL though.

Yes! I haven’t seen startups hiring these though. Somehow I always end up doing this as a side-gig on my engineering job.


Definitely - I've been surprised at some very complex pipelines built with pandas, etc. because someone didn't want to use SQL...


I was just commenting to a colleague recently about the significant improvements RDBMS have gotten for json support over the last decade. For instance, keys below the first level in Postgres jsonb fields were not indexable around a decade ago. Now you can do GIN index and other options that are rather sophisticated.


Agreed. I can't think of anything that would convince me today to use a document store over Postgres as the primary (or likely only) database. Most of the time JSON fields augmenting the RDBMS seems like the way to go.


My default position nowadays is “Postgres” and engineering should have to justify why it is insufficient if engineering wants to use something else. It’s worked pretty well


Hahaha, that is good, not justify why to use a certain tech, but rather justify why not just use postgres


This should be default decision-making process. If your proposition is to move the compromise scale please not only provide the benefits, but also the drawbacks and analysis on transition. This forces proposal to analyze existing status quo and reasons behind it, which is often enough for the proposal to be withdrawn.

Distant relative of 5 whys. We need NoSql document store -> so we can store json blobs -> so we can do databasing at app level -> because DBAs with their insistence on schemas are slow. Oh, so we can solve the problems by hiring one DBA and maybe training two devs instead of hiring full dev team and refactoring stuff for a year?


As someone working with datastore/firestore in a product first created around 10 years ago I wish you could have been there at the time. Running a migration to add a boolean field to all existing documents of a certain type took ~40 hours.

Funny thing is we are now migrating stuff out of datastore (and new stuff is not in datastore to begin with) into an RDBMS, but we are doing it microservice style with each microservice having its own separate database. So relationships are now cross-services concerns...

Not that having EVERYTHING in a single DB is the best approach always, but IMO we should default to keep everything in one single DB.


Yep, this is a sneaky great feature. Where previously you’d have a sequential scan unless you put in multiple indexes or a bloom filter, you can now get great performance and easy of maintenance at the same time.


> use read replicas for BI with a good visualization tool

Ugh. That sounds good on paper, but in practice it can become a problem. You're making your _database_ schema a part of the public API. It's an example Hyrum's Law, people will, sooner rather than later, start depending on internal details of the data representation.

And your development velocity will crater, as you'll now need to update all the reports (that are not necessarily even tracked in version control!).

Investing some time early to add code to pull out the data relevant for analytics can be worthwhile.

There's also a question of the personal information.


It can definitely become a problem. But if you’re at that point, you don’t need a guide that explains SQL databases to you. :p

Realistically this guide should be bifurcated in terms of scale.


>with such good json support in the most popular SQL databases

Wait, was that the reason people were doing NoSQL? JSON support? I thought it was about sharding, write scalability, etc.


Ah yea the old “web scale” phase. I think everyone’s more or less accepted that very, very few startup-level (or even SMB-level) workloads need more scalability then Postgres/mysql gives.

My favorite example is that Twitter used mysql for all tweets, writing ~5k/s 24/7/365, until about 2016ish. Well into being a public company with billions in revenue and 300mm+ MAUs.


Has everyone accepted that?

3/4 companies in the Bay Area senior software engineer interviews require a System Design interview where they will tell you "what if you had 10m users" and expect a distributed write-heavy sharding answer


You’re not wrong in the literal sense. But the “inside baseball” of that question is just that it’s a prompt to talk about how you would horizontally scale a system should the need arise. It’s not a prompt to start questioning whether 10mm or 200mm is the specific limit.


Well that's the thing. You don't need a NoSQL database to design a data tier that scales to accommodate distributed write-heavy workloads.


Lots of people were mad that my employer developed a new distributed NoSQL database engine, but it was literally just an API to encapsulate what an application doing "sharded MySQL" would do in its own data tier. A lot of this is a question of framing and storytelling.


Sharding, write scalability, and similar are the technical advantages that can matter at scale (and mattered a lot more before SSDs became so common), but I think for most users the only tangible ?benefit? was the schema less nature.


> use read replicas for BI

Yes this is good advice, until you get really large scale you don't need anything more fancy than some SQL in a read replica.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: