Hacker News new | past | comments | ask | show | jobs | submit login

PostHog cofounder here. This affected users that did not have a specific version of the JS library pinned and deployed a new version, or were using the snippet, and had network capture enabled, (a feature we introduced very recently and is only enabled on 3% of projects), and had recordings enabled on that particular session (for most customers, only a small percentage of sessions are recorded due to sampling or billing limits)

This outage was definitely disruptive and we shouldn't have let this happen. We will be doing a full post mortem write up, but this affected a small percentage of our users, so the comparison with Crowdstrike isn't fair.




Random guy here. This affected users that

- Used your recommended way of implementing PostHog [1]

- Used a feature of the product

- Used a feature of the product

The comparison to CrowdStrike is not fair, you're right. But this attempt to shed responsibility still leaves a sour taste.

[1] See "This is the simplest way to get PostHog up and running. It only takes a few minutes." from your website, which is the first method suggested when clicking the "installation" tab


Just to be clear, those are AND not OR conditions.

Definitely not trying to shed responsibility here, we messed up and we'll make sure this doesn't happen again.


Not your customer, just a random person on the Internet, but I hope you can see that a lot of that is through luck more than judgement.

I personally would have like to see a bit more contrition rather than trying to minimise the issue.


>This affected users that did not have a specific version of the JS library pinned and deployed a new version

Par for the course honestly. The amount of garbage that gets called "production" these days is mindboggling. No blue/green or canary deployments, shipping code that has nothing pinned, no clear rollback, etc. This is what happens when anyone can become an EngineerTM after a two week JavaScript boot camp.


No, actually, it's because Posthog explicitly recommends that as the way to do it, makes their standard npm package unpinnable (as it will always lazy load the most recent version of its modules) and calls version pinning via npm as an "advanced" installation[1].

The ecosystem has plenty of versioning and best practices, but they do jack squat when you recommend to your customers to bypass them and trust that you'll never break your latest build.

[1] https://posthog.com/docs/libraries/js


Sure, but just because _they_ suggest that you set your website to depend on https://us.i.posthog.com/static/array.js doesn’t mean you’re off the hook for following that (bad) advice.


>No, actually, it's because Posthog explicitly recommends that as the way to do it

Just because a project recommends "curl whatever | bash" to get started doesn't mean it's something you should productionize. You need an engineer that's done more than a bootcamp to understand code pinning, packaging, and deploying in order to ship a supportable, observable system. You're making my point for me.


You're trying to phrase this as if those conditions make it any less bad, but they don't. This affected users that were using the latest version and used... features? Give me a break. Every product has bugs, but trying to downplay the issue after you've just read a distressed user of yours struggle with it is definitely not what you should be doing.


There's certainly a failure to test properly from PostHog, as in they have production features that aren't being tested before a release.

On the other hand the author of the article did the exact same thing. They either pushed a release without testing, or they automatically just pull in the latest version of an external library, without any testing or verification. Now I lean towards this being the latter, as if they pushed a release and then the site broke, they would have considered a rollback. Kinda hard to blame others for failing to do testing that you also didn't do.

Edit: So others have pointed out that PostHog will just pull down the latest version on it's own, unless you actively disable that feature. That seems like a brave move.


Yeah, honestly not a good look to come in and “well… actually”. It’s certainly far from a “crowdstrike moment” but tact is still needed when you’ve clearly affected multiple people and their customers with your bug.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: