Hacker Newsnew | past | comments | ask | show | jobs | submit | yeswecatan's commentslogin

While it's truly a great ongoing effort and I am grateful to all the contributors, it's not nearly complete. You may think you're using the correct type until, surprise, you are not.


Unfortunately people will use --no-verify to bypass hooks.


I don't understand commit hooks - they're like binding a macro to the MS Word save button to make it conditional.


> like binding a macro to the MS Word save button to make it conditional

You have no idea how much I'd love that feature. Inasmuch as "save" is still a thing anyway. I don't miss explicit saves in IDEA, I see commit as the "real" save operation now, and I don't mind being able to hook that in an IDE-independent way.

I think the UX of git hooks has been sub-par for sure, but tools like the confusingly named pre-commit are helping there.


Because if you haven’t auto-formatted, lined, etc. then it’s a very easy way to do that so you don’t waste time watching CI fail for something stupid like trailing comma placement.

I don’t want to think about formatting, I just want everything to be consistent. A pre commit hook can run those tools for me, and if any changes occurred, it can add them to the commit.


There's a long set of steps to making a tool mandatory in a development environment, but the final step should always, always be, "And you will find yourself on a PIP if you refuse to use the mandatory tools."

If people want to die on a hill that is demonstrably causing problems for all of their coworkers then let em.


Oh how I wish engineering leadership would actually mandate certain things such as this.


They always pick the wrong things to mandate don't they.


Enforce on CI. Autofix in pre-commit hooks. Lefthook is fantastic for this.

Example config: https://github.com/anttiharju/vmatch/blob/9e64b0636601c236a5...


You can put hooks on the server side of git. It can do pretty much anything that CI/CD can.


That requires Github Enterprise (if using GH, of course), no?


Well it requires a server and 5 minutes of your time :) I guess you can always have it as a mirror for your GH repository. Gitlab has push mirroring, not sure about GH: https://docs.gitlab.com/user/project/repository/mirror/push/


Then they'll lose time for the same verifications to fail in the PR?


What is your use case?


We have 100000s tables per database (1000s of those) (think sensor/iot data with some magic sauce that 0 of our competitors offer) that are heavy on the changes. And yes, maybe it's the wrong tool (is it though if it works without hickups?) for the job (but migrating would be severe so we would only attempt that if we are 100% sure it will work and if the endresult would be cheaper; remember; we are talking decades here, not a startup), but mysql has been taking this without any issues for decades with us (including the rapid growth of the past decade) now while far smaller setups with postgres have been really painful and all because of vacuum. We were postgres in 1999 when we ran many millions of records through it, but that was when we could do a full vacuum at night without anyone noticing. The internet grew a little bit, so that's not possible anymore. Vacuum improved too like everyone says here, and i'm not spreading the gospel or whatever; just fans (... what other word is there) blindly stating it can do loads 'now' they never considered is, well weird.


I'd generally call this amount of tables an antipattern - doing this basically implies that there's information stored in the table names that should be in rows instead, like IDs etc. -- But I'll admit that sensor related use cases have a tendency to stress the system in unusual ways, which may have forced this design.


Especially back when we started. Now we would've done it differently, but still think postgres wouldn't really work. Guess we will never now as even far smaller data sets do not work in the way we need them.


I came to ask the same thing. We use docker-compose to describe all our services which works fine.


This does not scale to a large number of services with a certain amount of RAM/processing per service.


If you have 100 services in your org, I don't have to have 100 running at the same time in your local dev machine. I only run the 5 I need for the feature I'm working on.


We have 100 Go services (with redpanda) and a few databases in docker-compose on dev laptops. It works well when and we buy the biggest memory MacBooks available.

https://moov.io/blog/education/moovs-approach-to-setup-and-t...


Your success with this strategy correlates more strongly with ‘Go’ than ‘100 services’ so it’s more anecdotal than generally-acceptable that you can run 100 services locally without issues. Of course you can.

Buying the biggest MacBook available as a baseline criteria for being able to run a stack locally with Docker Compose does not exactly inspire confidence.

At my last company we switched our dev environment from Docker Compose to Nix on those same MacBooks and CPU usage when from 300% to <10% overnight.


Have any details on how you've implemented Nix? For my personal projects I use nix without docker and the results are great. However I was always fearful that nix alone wouldn't quite scale as well as nix + docker for complicated environments.

I've used the FROM SCRATCH strat with nix:

https://mitchellh.com/writing/nix-with-dockerfiles

Is that how you implemented it?


Buying the biggest Mac’s also lets developers run an electron app or three (Slack, IDE, Spotify, browser, etc) while running the docker-compose stack.


You're right. My coworkers remarked that they could run Slack and do screensharing while running the apps locally when we removed docker-compose.


That's a huge win -- has your team written about or spoke on this anywhere?


No but I'd be happy to (I maintained the docker-compose stack, our CLI, and did the transition to Nix).


I'm curious about the # of svc's / stack / company / team size -- if you have your own blog -- would love to read it when you publish

could be a cool lightning talk (or part of something longer)

maybe it's a good piece for https://nixinthewild.com/ ?

I'm @capileigh on twitter and hachyderm.io if you wanna reach out separately -- here is good tho too


Hey, I plan to reach out when I get some time :)

I can see there’s interest in the topic.


sounds good! it's a cool anecdote :)


Hi Jason! Like many others here I'm looking forward to that blog post! :-)

For now, could you elaborate on what exactly you mean by transitioning from docker-compose to Nix? Did you start using systemd to orchestrate services? Were you still using Docker containers? If so, did you build the images with Nix? Etc.


When we used docker-compose we had a CLI tool which developers put in their PATH which was able to start/stop/restart services using the regular compose commands. This didn’t accomplish much at the time other than being easy to remember and not requiring folks to know where their docker-compose files were located. It also took care of layering in other compose files for overriding variables or service definitions.

Short version of the Nix transition: the CLI tool would instead start services using nix-shell invocations behind pm2. So devs still had a way to start services from anywhere, get logs or process status with a command… but every app was running 100% natively.

At the time I was there, containers weren’t used in production (they were doing “App” deploys still) so there was no Docker target that was necessary/useful outside of the development environment.

Besides the performance benefit, microservices owning their development environment in-repo (instead of in another repo where the compose configs were defined) was a huge win.


Thanks for elaborating!

By pm2 you mean https://www.npmjs.com/package/pm2 ?


Yep!


the pm2 thing via a custom cli is interesting

several nixy devtools do some process management now

something we're trying in Flox is per-project services run /w process-compose. they automatically shutdown when all your activated shells exit, and it feels really cool


I’d like to learn more about switching compose to nix. We will hit a wall with compose at some point.


I've been on this path and as soon as you work on a couple of concurrent branches you end up having 20 containers in your machine and setting these up to run successfully ends up being its own special PITA.


What exactly are the problems created by having a larger number of containers? Since you’re mentioning branches, these presumably don’t have to all run concurrently, i.e, you’re not talking about resource limitations.


Large features can require changing protocols or altering schemas in multiple services. Different workflows can require different services, etc. Keep track of different service versions in a couple branchs (not unusual IMO) and it just becomes messy.


What does this have to do with running locally vs. on a dev server? You have to properly manage versions in any case.


You could still run the proxy they have that lazy boots services - that’s a nice optimisation.

I don’t think that many places are in a position where the machines would struggle. They didn’t mention that in the article as a concern - just that they struggled to keep environments consistent (brew install implies some are running on osx etc).


I think it’s safe to assume that for something with the scale and complexity of Stripe, it would be a tall order to run all the necessary services on your laptop, even stubs of them. They may not even do that on the dev boxes, I’d be a little surprised if they didn’t actually use prod services in some cases, or a canary at any rate, to avoid the hassles of having to maintain on-call for what is essentially a test environment.


I don’t know that’s safe to assume. Maybe it is an issue but it was not one of the issues they talk about in the article and not one of the design goals of the system. They have the proxy / lazy start system exactly so they can limit the services running. That suggests to me that they don’t end up needing them all the time to get things done.


Location: San Francisco Bay Area

Remote: Preferred

Willing to relocate: No

Technologies: Python, Django, Terraform, Docker, all things AWS, Jenkins, Github Actions, Postgres/RDS/Aurora

Resume: upon request

Email: glik22@gmail.com

Hi! I bring over a decade of experience as a software engineer. I spent a solid portion of that time as a backend engineer building out APIs in Django. In my current position, I have built upon those skills and acquired plenty of new ones as a platform engineer. A significant part of my role has involved evolving and optimizing our infrastructure (AWS) with a focus on cost savings and maintainability with improved Terraform practices. In addition, I have built and improved multiple internal libraries and services to make our engineers' lives easier. In my next role, I'm looking to leverage these skills and help advance a team to the next level.


How do you know when you need to reindex?


If you have pgstattuple [0], you can check the bloat of indices. Otherwise, you can just make it a cron on a monthly / quarterly / semi-annually / whatever basis. Since PG12 you can do `REINDEX INDEX CONCURRENTLY` with zero downtime, so it really doesn't hurt to do it more often than necessary. Even before PG12, you can do an atomic version of it:

`CREATE INDEX new_<index_name> CONCURRENTLY;`

`RENAME INDEX <index_name> TO old_<index_name>;`

`RENAME INDEX new_<index_name> TO <index_name>;`

`DROP INDEX CONCURRENTLY old_<index_name>;`

[0]: https://www.postgresql.org/docs/current/pgstattuple.html


I find average leaf density to be the best metric of them all. Most btree indexes with default settings (fill factor 90%) will converge to 67.5% leaf density over time. So anything below that is bloated and a candidate for reindexing.


You can measure "bloat" in the index. It's essentially the wasted space in pages.

You can also have bloat in the heap for the same reasons.

You may also want to cluster if your pg_stat.correlation is low since that indicates your heap isn't in the same order as your index anymore. pg_repack can do all of this without blocking, but you can reindex just an index concurrently on version >= 12.

https://wiki.postgresql.org/wiki/Show_database_bloat


How does one find these auctions?


how large is your cache and how long does the pull/push take?


i haven't looked into setting up a buildkit server. would it be easier to just attach an ebs volume?


you mean run an EC2 instance with EBS as buildkit server storage dir?

sure, it should work nicely. (I just prefer the local disk, it's just a cache after all.)


How did you go about stopping and restarting applications which reach out to the database? We have a number of tasks running in ECS which can take a minute to spin down and a few minutes to spin back up.


For our web service, we didn't stop anything. They had a few seconds of errors though it seems like some sessions were just buffered or paused and experienced high latency.

We also had background worker services. For the very high throughput ones, we spun down the # of tasks to a bare minimum for <5 minutes and let the queue build up, rather than have a massive amount of errors and retries. For the other ones where throughput wasn't high, we just let them be, and during the downtime they errored and retried and the retries mostly succeeded.


Presumably you don't stop them, and they throw errors during the cutover.


You aren’t supposed to have to change anything in the application code. The same database URL should work.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: