Reddit Migrates Comment Back End from Python to Go

p2detar · 2025-11-29T19:53:05 1764445985

> What was unexpected was the underlying differences in how Go and Python communicate with the database layer. Python uses an ORM to make querying and writing to our Postgres store a bit simpler. We don’t use an ORM for our Golang services at Reddit, and some unknown underlying optimizations on Python’s ORM resulted in some database pressure when we started ramping up our new Go endpoint. Luckily, we caught on early and were able to optimize our queries in Go.

A few weeks ago I was evaluating Hibernate for Java again, since my product is expected to support 2 different databases next year. In the end I decided to keep the codebase ORM free, because it’s much easier to me to directly debug our SQL queries than trying to find what and why Hibernate does exactly. I think I’m done using ORMs for the foreseeable future.

Zizizizz · 2025-11-30T15:25:13 1764516313

I don't think "Python" uses an ORM, moreso they did. You can write the SQL the same way there. It's why there is SQLAlchemy Core Vs SQLAlchemy ORM.

p2detar · 2025-12-01T09:19:59 1764580799

That’s actually poorly written on their part. I did assume they were using an ORM Python library, not that ORM is somehow implicit in Python. But after rereading it, I can see how someone without any Python experience might think otherwise.

dzonga · 2025-11-30T14:40:16 1764513616

yeah - largely a reason why people went to nosql. due to the performance hit you get with ORMs. in ruby-verse you can see this active-record vs sequel(a better orm) the performance gap.

now there's better tools like sqlc etc that have orm ergonomics without the performance hit.

at times just use nosql

whinvik · 2025-11-29T22:11:50 1764454310

The actual lesson I learnt from this post is that Python was good enough for Reddit until 2024 from 2005.

rvz · 2025-11-29T19:38:53 1764445133

> By 2024, the legacy Python service had a history of reliability and performance issues. Ownership and maintenance of this service had become more cumbersome for all involved teams. Due to this, we decided to move forward into modern and domain-specific Go microservices.

Using Python for a backend system to "scale" really is just pure cope and was unscalable in the first place and in the long run as Reddit just found out. They knew they needed a lot more than just fake optimizations from an interpreted language to improve the performance and a Golang rewrite unsurprisingly solved those issues.

This once again clearly shows that other than in the prototyping stage of an MVP, it really makes no sense to scale with backends written in these interpreted languages in 2025.

Switching to safe, highly performant mature languages such as Golang tells you that it not only generally improves performance, but correctly handles concurrency which Python has always struggled at, especially at scale which is why in Reddit's case, the race conditions now revealed themselves more clearly before the rewrite.

watchful_moose · 2025-11-29T21:51:57 1764453117

Reddit was founded in 2005, Go was released in 2007.

They picked the tech that was available and mature at the time, and enabled them to scale for 20 years (to 100M+ DAUs + IPO) - seems like a pretty good choice to me.

You know which other platform was built on Python? Youtube.

Python isn't a bad choice if you're building [certain kinds of] billion-dollar businesses.

array_key_first · 2025-11-30T01:43:44 1764467024

Python is pretty much universally a bad choice for a backend. It's not even "easier" than other backend frameworks - Java Spring and dotnet are right there, and are specifically optimized for this use case.

If you want to build an app that's easy to maintain, Python is a bad choice because of it's dynamic typing and struggles around tooling. If you want to build an app that's performant enough to scale, Python is a bad choice because Python. If you want to build an app that will get you a website ASAP, Python is still not the best choice, because other languages have larger batteries-included frameworks.

In 2005, even PHP would have been a better choice and would probably still be performant enough to run today.

Today, the story is much worse. Python has use cases! Experimentation, anything science related, "throwaway" code. Applications is just not one of the use cases IMO.

duped · 2025-12-01T00:57:06 1764550626

They actually started in LISP and rewrote it in Python (and also apparently, did not pick any of the "mature" web frameworks).

http://www.aaronsw.com/weblog/rewritingreddit

diath · 2025-11-29T20:08:47 1764446927

> unscalable

Instagram, which is significantly bigger than Reddit, disagrees.

Nextgrid · 2025-11-29T21:27:28 1764451648

Here we go, someone's throwing around the S word again. Based on my experience, every time someone mentions "scalability" in a web application context I smell red flags.

Now we don't have details on what the comments service in Reddit entails - maybe it does indeed do a lot of CPU-intensive processing, in which case moving to Golang will definitely help.

But maybe it's also just a trivial "read from DB, spit out JSON", in which case the bottleneck will always be the DB, and "scalability" is just an excuse to justify the work.

The fact this is part of a move off a "legacy" system to "modern" "microservices" suggests there's a huge amount of developers having fun and are incentivized to justify continuing getting paid to have fun replacing a perfectly functioning system, rather than an actual hard blocker to scalability that can't be solved in a simpler way like by throwing more hardware at it.

watchful_moose · 2025-11-29T21:54:27 1764453267

> The fact this is part of a move off a "legacy" system to "modern" "microservices" suggests there's a huge amount of developers having fun ...

I don't think it suggests that at all. This is their press release, so of course they're going to spin it that way.

gnaman · 2025-11-29T21:02:14 1764450134

reddit is like one of the most visited websites and they've now felt the need to migrate off python. the average user's website is fine on python.

whattheheckheck · 2025-11-29T19:41:31 1764445291

What scale do you think python breaks down?

knodi · 2025-11-29T19:52:38 1764445958

Reddit scale. For most of the world python is fine. Go is also so easy to work with it makes sense it was the go to after python.

I been saying it for almost 10yr, go is the future for backends.

rasz · 2025-11-30T03:01:09 1764471669

At the very moment person in charge says "ok this works, now make it not slow". Python is modern age BASIC. Easy to write and good for prototypes, scripting, gluing together libraries, fast iterations. If you want performance and heavy data processing anything else will be better. PHP, Java, even JavaScript.

For example Python is struggling to reach real time performance decoding RLL/MFM data off of ancient 40 year old hard drives (https://github.com/raszpl/sigrok-disk). 4GHz CPU and I cant break 500KB/s in a simple loop:

    for i in range(len(data)):
      decoder.shift = ((decoder.shift << data[i]) + 1) & 0xffffffffff
      decoder.shift_index += data[i]
      if decoder.shift_index >= 16:
       decoder.shift_index -= 16
       decoder.shift_byte = (decoder.shift >> decoder.shift_index) & 0x5555
       decoder.shift_byte = (decoder.shift_byte + (decoder.shift_byte >> 1)) & 0x3333
       decoder.shift_byte = (decoder.shift_byte + (decoder.shift_byte >> 2)) & 0x0F0F
       decoder.shift_byte = (decoder.shift_byte + (decoder.shift_byte >> 4)) & 0x00FF

Too · 2025-11-30T09:02:55 1764493375

To optimize that code snippet, use temporary variables instead of member lookups to avoid slow getattr and setattr calls. It still won’t beat a compiled language, number crunching is the worst sport for Python.

Spivak · 2025-11-30T10:47:11 1764499631

Which is why in Python in practice you pay the cost of moving your data to a native module (numpy/pandas/polars) and do all your number crunching over there and then pull the result back.

Not saying it's ideal but it's a solved problem and Python is eating good in terms of quality dataframe libraries.

rasz · 2025-11-30T22:14:10 1764540850

All those class variables are already in __slots__ so in theory it shouldnt matter. Your advice is good

     self.shift_index -= 16
     shift_byte = (self.shift >> self.shift_index) & 0x5555
     shift_byte = (shift_byte + (shift_byte >> 1)) & 0x3333
     shift_byte = (shift_byte + (shift_byte >> 2)) & 0x0F0F
     self.shift_byte = (shift_byte + (shift_byte >> 4)) & 0x00FF


 but only for exactly 2-4 milliseconds per 1 million pulses :) Declaring local variable in a tight loop forces Python into a cycle of memory allocations and garbage collection negative potential gains :(

    SWAR                           :     0.288 seconds  ->    0.33 MiB/s
    SWAR local                     :     0.284 seconds  ->    0.33 MiB/s

This whole snipped is maybe what 50-100 x86 opcodes? Native code runs at >100MB/s while Python 3.14 struggles around 300KB/s. Python 3.4 (Sigrok hardcoded requirement) is even worse:

    SWAR                           :     0.691 seconds  ->    0.14 MiB/s
    SWAR local                     :     0.648 seconds  ->    0.14 MiB/s

You can try your luck https://github.com/raszpl/sigrok-disk/tree/main/benchmarks I will appreciate Pull requests if anyone manages to speed this up. I give up at ~2 seconds per one RLL HDD track.

This is what I get right now decoding single tracks on i7-4790 platform:

    fdd_fm.sr 0.9385 seconds
    fdd_mfm.sr 1.4774 seconds
    fdd_fm.sr 0.8711 seconds
    fdd_mfm.sr 1.2547 seconds
    hdd_mfm_RQDX3.sr 1.9737 seconds
    hdd_mfm_RQDX3.sr 1.9749 seconds
    hdd_mfm_AMS1100M4.sr 1.4681 seconds
    hdd_mfm_WD1003V-MM2.sr 1.8142 seconds
    hdd_mfm_WD1003V-MM2_int.sr 1.8067 seconds
    hdd_mfm_EV346.sr 1.8215 seconds
    hdd_rll_ST21R.sr 1.9353 seconds
    hdd_rll_WD1003V-SR1.sr 2.1984 seconds
    hdd_rll_WD1003V-SR1.sr 2.2085 seconds
    hdd_rll_WD1003V-SR1.sr 2.2186 seconds
    hdd_rll_WD1003V-SR1.sr 2.1830 seconds
    hdd_rll_WD1003V-SR1.sr 2.2213 seconds
    HDD_11tracks.sr 17.4245 seconds <- 11 tracks, 6 RLL + 5 MFM interpreted as RLL
    HDD_11tracks.sr 12.3864 seconds <- 11 tracks, 6 RLL + 5 MFM interpreted as MFM

drowning_sushi · 2025-11-29T20:03:54 1764446634

This is very subjective. Using Python influences your architecture in ways you would not encounter with other languages.

I maintain a critical service written in Python and hosted in AWS and with about 40 containers it can do 1K requests/sec with good reliability. But we see issues with http libraries and systemic pressure within the service.

Nextgrid · 2025-11-30T00:29:25 1764462565

1k requests/sec over 40 containers, meaning 25 RPS per container. Are you using synchronous threads by any chance (meaning if you're waiting on IO or a network call you are blocked yet your CPU is actually idle)? If so you might benefit from moving to gevent and handle that load with just a handful of containers.

nly · 2025-11-29T21:37:59 1764452279

1K requests/sec doing what?

That's a really low rate in my world.

I write software handling a couple of million of messages per second on a single core on a single machine

tkfoss · 2025-11-30T00:56:42 1764464202

We get 1K on a single small hetzner VPS, with Flask behind Nginx ¯\_(ツ)_/¯

Spivak · 2025-11-30T10:24:22 1764498262

Yeah, 25 req/sec/process is abysmally slow. You can write slow in any language.

You don't end up seeing these kinds of complaints about Ruby backends and Ruby is the same order of magnitude in terms of speed.

Nextgrid · 2025-11-29T21:31:55 1764451915

The scale where throwing more hardware to run your CPU-intensive Python part (and not the part that just wait on a DB, IO or other networked service - that won't change with Golang) starts costing more than paying developers to write it in a new language and incurring the downside of introducing another language into the stack, throwing away all the "tribal knowledge" of the existing app and so on.

Modern hardware is incredibly fast, so if you wait for said scale it may never actually happen. It's likely someone will win the push for a rewrite based on politics rather than an actual engineering constraint, which I suspect happened here.