While it's truly a great ongoing effort and I am grateful to all the contributors, it's not nearly complete. You may think you're using the correct type until, surprise, you are not.
> like binding a macro to the MS Word save button to make it conditional
You have no idea how much I'd love that feature. Inasmuch as "save" is still a thing anyway. I don't miss explicit saves in IDEA, I see commit as the "real" save operation now, and I don't mind being able to hook that in an IDE-independent way.
I think the UX of git hooks has been sub-par for sure, but tools like the confusingly named pre-commit are helping there.
Because if you haven’t auto-formatted, lined, etc. then it’s a very easy way to do that so you don’t waste time watching CI fail for something stupid like trailing comma placement.
I don’t want to think about formatting, I just want everything to be consistent. A pre commit hook can run those tools for me, and if any changes occurred, it can add them to the commit.
There's a long set of steps to making a tool mandatory in a development environment, but the final step should always, always be, "And you will find yourself on a PIP if you refuse to use the mandatory tools."
If people want to die on a hill that is demonstrably causing problems for all of their coworkers then let em.
We have 100000s tables per database (1000s of those) (think sensor/iot data with some magic sauce that 0 of our competitors offer) that are heavy on the changes. And yes, maybe it's the wrong tool (is it though if it works without hickups?) for the job (but migrating would be severe so we would only attempt that if we are 100% sure it will work and if the endresult would be cheaper; remember; we are talking decades here, not a startup), but mysql has been taking this without any issues for decades with us (including the rapid growth of the past decade) now while far smaller setups with postgres have been really painful and all because of vacuum. We were postgres in 1999 when we ran many millions of records through it, but that was when we could do a full vacuum at night without anyone noticing. The internet grew a little bit, so that's not possible anymore. Vacuum improved too like everyone says here, and i'm not spreading the gospel or whatever; just fans (... what other word is there) blindly stating it can do loads 'now' they never considered is, well weird.
I'd generally call this amount of tables an antipattern - doing this basically implies that there's information stored in the table names that should be in rows instead, like IDs etc. -- But I'll admit that sensor related use cases have a tendency to stress the system in unusual ways, which may have forced this design.
Especially back when we started. Now we would've done it differently, but still think postgres wouldn't really work. Guess we will never now as even far smaller data sets do not work in the way we need them.
If you have 100 services in your org, I don't have to have 100 running at the same time in your local dev machine. I only run the 5 I need for the feature I'm working on.
We have 100 Go services (with redpanda) and a few databases in docker-compose on dev laptops. It works well when and we buy the biggest memory MacBooks available.
Your success with this strategy correlates more strongly with ‘Go’ than ‘100 services’ so it’s more anecdotal than generally-acceptable that you can run 100 services locally without issues. Of course you can.
Buying the biggest MacBook available as a baseline criteria for being able to run a stack locally with Docker Compose does not exactly inspire confidence.
At my last company we switched our dev environment from Docker Compose to Nix on those same MacBooks and CPU usage when from 300% to <10% overnight.
Have any details on how you've implemented Nix? For my personal projects I use nix without docker and the results are great. However I was always fearful that nix alone wouldn't quite scale as well as nix + docker for complicated environments.
Hi Jason! Like many others here I'm looking forward to that blog post! :-)
For now, could you elaborate on what exactly you mean by transitioning from docker-compose to Nix? Did you start using systemd to orchestrate services? Were you still using Docker containers? If so, did you build the images with Nix? Etc.
When we used docker-compose we had a CLI tool which developers put in their PATH which was able to start/stop/restart services using the regular compose commands. This didn’t accomplish much at the time other than being easy to remember and not requiring folks to know where their docker-compose files were located. It also took care of layering in other compose files for overriding variables or service definitions.
Short version of the Nix transition: the CLI tool would instead start services using nix-shell invocations behind pm2. So devs still had a way to start services from anywhere, get logs or process status with a command… but every app was running 100% natively.
At the time I was there, containers weren’t used in production (they were doing “App” deploys still) so there was no Docker target that was necessary/useful outside of the development environment.
Besides the performance benefit, microservices owning their development environment in-repo (instead of in another repo where the compose configs were defined) was a huge win.
several nixy devtools do some process management now
something we're trying in Flox is per-project services run /w process-compose.
they automatically shutdown when all your activated shells exit, and it feels really cool
I've been on this path and as soon as you work on a couple of concurrent branches you end up having 20 containers in your machine and setting these up to run successfully ends up being its own special PITA.
What exactly are the problems created by having a larger number of containers? Since you’re mentioning branches, these presumably don’t have to all run concurrently, i.e, you’re not talking about resource limitations.
Large features can require changing protocols or altering schemas in multiple services. Different workflows can require different services, etc. Keep track of different service versions in a couple branchs (not unusual IMO) and it just becomes messy.
You could still run the proxy they have that lazy boots services - that’s a nice optimisation.
I don’t think that many places are in a position where the machines would struggle. They didn’t mention that in the article as a concern - just that they struggled to keep environments consistent (brew install implies some are running on osx etc).
I think it’s safe to assume that for something with the scale and complexity of Stripe, it would be a tall order to run all the necessary services on your laptop, even stubs of them. They may not even do that on the dev boxes, I’d be a little surprised if they didn’t actually use prod services in some cases, or a canary at any rate, to avoid the hassles of having to maintain on-call for what is essentially a test environment.
I don’t know that’s safe to assume. Maybe it is an issue but it was not one of the issues they talk about in the article and not one of the design goals of the system. They have the proxy / lazy start system exactly so they can limit the services running. That suggests to me that they don’t end up needing them all the time to get things done.
Technologies: Python, Django, Terraform, Docker, all things AWS, Jenkins, Github Actions, Postgres/RDS/Aurora
Resume: upon request
Email: glik22@gmail.com
Hi! I bring over a decade of experience as a software engineer. I spent a solid portion of that time as a backend engineer building out APIs in Django. In my current position, I have built upon those skills and acquired plenty of new ones as a platform engineer. A significant part of my role has involved evolving and optimizing our infrastructure (AWS) with a focus on cost savings and maintainability with improved Terraform practices. In addition, I have built and improved multiple internal libraries and services to make our engineers' lives easier. In my next role, I'm looking to leverage these skills and help advance a team to the next level.
If you have pgstattuple [0], you can check the bloat of indices. Otherwise, you can just make it a cron on a monthly / quarterly / semi-annually / whatever basis. Since PG12 you can do `REINDEX INDEX CONCURRENTLY` with zero downtime, so it really doesn't hurt to do it more often than necessary. Even before PG12, you can do an atomic version of it:
I find average leaf density to be the best metric of them all. Most btree indexes with default settings (fill factor 90%) will converge to 67.5% leaf density over time. So anything below that is bloated and a candidate for reindexing.
You can measure "bloat" in the index. It's essentially the wasted space in pages.
You can also have bloat in the heap for the same reasons.
You may also want to cluster if your pg_stat.correlation is low since that indicates your heap isn't in the same order as your index anymore. pg_repack can do all of this without blocking, but you can reindex just an index concurrently on version >= 12.
How did you go about stopping and restarting applications which reach out to the database? We have a number of tasks running in ECS which can take a minute to spin down and a few minutes to spin back up.
For our web service, we didn't stop anything. They had a few seconds of errors though it seems like some sessions were just buffered or paused and experienced high latency.
We also had background worker services. For the very high throughput ones, we spun down the # of tasks to a bare minimum for <5 minutes and let the queue build up, rather than have a massive amount of errors and retries. For the other ones where throughput wasn't high, we just let them be, and during the downtime they errored and retried and the retries mostly succeeded.