Challenging algorithms and data structures every programmer should try

Xeoncross · on Dec 24, 2022

> Google asked me two interview problems involving these back when I was in college. Have I ever needed this in practice? Nope!

This is the shame of the Big Tech. I've used bloomfilters, tries, and many other of these multiple times on side projects with constraints, but would something like this ever make it into a deployed system at FAANG by all the regular devs?

No, just use dynamo, spanner or a bigger cluster + more workers, more queues, add more k8s nodes, plus whatever cloud tech you should be using to "horizontally scale" and be "distributed".

Not that there aren't awesome devs at these companies which do code responsibly, but I don't often see it.

I can't believe how unperformant the systems I've often seen in Big Tech are. I want more people to read the article about fitting the whole level into a single byte[1] and stop using big budgets as a way to be irresponsible.

1: https://news.ycombinator.com/item?id=34095954

armchairhacker · on Dec 24, 2022

Bloom filters, skiplists, tries, recursive descent, etc. are actually used in many systems and I think it’s important to understand how they work at a high-level

Now, implementing one of them on a whiteboard or without access to internet, knowing all of their intricacies and edge cases…that is stuff I doubt anyone needs to know except a very specific type of developer. If i want an optimized trie I use a library, and when it breaks and i need to fix the low-level implementation, i look it up online

tptacek · on Dec 24, 2022

Recursive descent isn't an oddity; it's a universally useful parsing approach. It isn't the kind of thing where you can "just throw more k8s nodes" at the problem instead; rather, it's one of the simplest ways to throw a parser together quickly.

I'm not sure I've ever seen a skip list outside of an algorithms book.

lazide · on Dec 25, 2022

Having been in the industry for literally decades, I can’t remember ever having reason to write a parser (except for basic CSV type data or web scraping) in real life. shrug

I still know how to do it, but I’d have to really stop and ask myself why I thought it was a good idea first. Because it probably wouldn’t be.

justinlloyd · on Dec 25, 2022

Having been in the industry literally decades I have worked on three compilers, two assemblers, a couple of DSLs (before we had some of the newer, fancier tools), a few linkers, a couple of tools that needed to figure out dependencies for assembling a build, a couple of databases and a number of video games that specifically required a recursive descent tree walking algorithm. I consider this class of algorithms to be essential alogrithms in the areas where I have worked, but I also know there are algorithms in use other areas of software development that I have absolutely no practical use for, even though, to reiterate, I have literally been in software for decades. Just because an algorithm is of no use to me, doesn't mean it is no use to you. And vice versa.

Jensson · on Dec 25, 2022

I think you get what you look after. If you don't feel you want to try to push the boundaries of computing and instead prefer working with people then there are way more opportunities there, you don't even have to be good as long as you can connect clients to databases people will pay you. However to me that isn't why I got into programming, that sort of job isn't anything like the jobs I've had, and it is a bit sad to see people in those jobs drowning out most other opinions about programming.

Conversely if you love algorithms and are sufficiently good at them then you can spend your entire career with that and never ever talk to non-technical people. Some people here thinks those jobs basically don't exist, but they do exist and are really important and well paid. And once you start working there you can just continue forever since there are so few who have experience working with those. Like, maybe Google asks algorithm questions since they want more people to improve their algorithms? I know they had low hanging fruit everywhere, they just lacked people who were competent enough at algorithms to solve them, their hiring bar is way too low to solve their hard problems. But they can't really raise it either, there aren't enough people that could pass them then.

_qzu4 · on Dec 25, 2022

> Conversely if you love algorithms and are sufficiently good at them then you can spend your entire career with that and never ever talk to non-technical people.

I think there are two types of programmers. Ones that are like you that like algorithms for algorithms sake. And the other that like to solve real world problems. I got into software development because I want to solve problems for people. And not all of us likes algo. We prefer to learn stuff that we do use day to day because that has direct impact. For example, I just spent two months understanding functional programming. I use it in my code almost everyday, but it would be worthless in an interview where you are judge by your algo understanding. It’s frustrating when you realize how this standard is forced on every technical role, on companies that don’t even need algo experts because they are too small.

tptacek · on Dec 25, 2022

I've never written a parser because I love parsing. I actually kind of hate it, and have always been baffled by the people who nerd out on it. But, like, when I'm working on a problem that needs a parser, I'm glad I can bang one out! Like I said elsewhere: that's happened a bunch of times in my career.

setr · on Dec 25, 2022

There’s another aspect of this that the knowledge and comfort of being able to put together a parser properly means you’re more likely to identify places where it would be useful, and consider it worth the trouble to do so.

That is, to one without a hammer, nothing looks like a nail.

_qzu4 · on Dec 25, 2022

> That is, to one without a hammer, nothing looks like a nail.

They irony of this is that for one with a hammer, everything looks like a nail.

The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming. Donald Knuth.

Knowing when to apply something is just as important as know where

benj111 · on Dec 25, 2022

I think the metaphor still works.

A lack of hammer implies only having a screwdriver.

The point is you should have a tool box (and garage and workshop and spare bedroom) full of tools so you have the exact tool you actually need.

What the Phillips head and pozidrive represent in this metaphor are left as an exercise for the reader. As is the adjustable spanner.

tptacek · on Dec 25, 2022

That's super interesting. I had to within my first couple years of starting my career (I got tasked with writing a scripting language for a product). I'd guess I've parsed something-or-other --- firewall rules, some weird config file or other, C --- about every couple years. It seems super routine to me. But that's just me!

gfxgirl · on Dec 25, 2022

I skipped that by choosing lisp as my scripting language. took a day to implement. no complex parser required

for config I used ini files, super simple to parser (and now json so always can use an existing solution)

giaour · on Dec 25, 2022

There are plenty of users for whom having to write configuration in JSON is a significant barrier, either because they find the syntax obtuse or because they want things like comments, multiline strings, or expressions. Using a DSL can be a huge UX improvement in many cases.

tptacek · on Dec 25, 2022

I would do it differently today than I did in 1997.

okaram · on Dec 25, 2022

I think it's one of those things, where if you have a hammer in your toolset you'll probably bang on a lot more things :)

I've seldom implemented parsing, since we had Lisp, XML, and now YAML. However, I use language translation all the time.

lazide · on Dec 25, 2022

Generally there has always been some pre-existing parser or a basic regex that took 30 seconds to write that I’ve done that with. Maybe I’m blessed?

tptacek · on Dec 25, 2022

Could be! I've never had to draw a Bresenham line, but I'm sure lots of people have. :)

In-memory skiplists, I'm still skeptical of.

joshuamorton · on Dec 25, 2022

There's lots of cases where people use regexes to do some kind of validation, where simple parsers work better, if you know how to do them.

rukuu001 · on Dec 26, 2022

Huh, I did, many times. I think it’s more where we end up working, than universal applicability of these things.

(And I didn’t know about any of the data structures or algos mentioned in the post - I’d never have been hired at a FAANG)

seanmcdirmid · on Dec 25, 2022

I’ve written a few tens of parsers during my career using recursive descent…along with lots of type checkers. There are lots of tricks here.

Ever since I became a SWE rather than a researcher, I haven’t found a case to write another one, however. It’s much easier just to embed the DSL in kotlin.

wglb · on Dec 26, 2022

When we wrote our compiler, it would have been kinda hard to do so without writing a parser.

morelisp · on Dec 25, 2022

Skip-lists are a common way to implement unflushed data in persistent LSM trees, for example.

tptacek · on Dec 25, 2022

Shows to go you! I'm super weak on durable data structures.

This field we're working in, it's pretty neat.

rad_gruchalski · on Dec 25, 2022

Redis ordered set is built on top of a skip list: https://github.com/redis/redis/blob/unstable/src/t_zset.c.

armchairhacker · on Dec 25, 2022

AFAIK skip-lists are in some high-performance systems. See: https://en.wikipedia.org/wiki/Skip_list#Usages

I also thought there were skip-lists used in some part of the kernel or a driver but I don't remember: searching brings up https://lwn.net/Articles/551896/

Jailbird · on Dec 25, 2022

There was one at work in the 90s. Few of us had heard of it but the search properties against our data in memory were practical and useful (it wasn't just spinning the propeller on our beanies). For years, whenever I'd mention it, no one knew what it was....I suppose it's become more known in the past 10 years.

dukoid · on Dec 25, 2022

Tries are not mentioned in the OP and I have yet to see an example where they are actually working well.

metachris · on Dec 25, 2022

They are very useful and widely used in blockchains.

didibus · on Dec 24, 2022

They're different problems, you've got programming in the small, medium and large.

Computer Science tends to focus on programming in the small to medium, and software engineering tends to focus more on programming in the medium to large.

It's one thing to optimize a small piece of functionality fully, but it's another thing to put together a product that has thousands of live users, hundreds of features, multitude of configuration modes, distributed globally, etc.

If I ask you to clean a bathroom in 3 hours, and then I ask you to clean the whole house in 3 hours, your approach to the cleaning will be very different between the two, and the first thing that will be cut is your attention to details.

Jensson · on Dec 24, 2022

I think the problem with slow user interfaces is that the programmers who knows how to make things fast usually want to work on more interesting algorithms instead of optimising UI interactions. I built model evaluators at Google so I used a lot of these algorithms, specifically topological sort is useful in so many places and you can't really make a generic solution for it to put in a library, and different parsing methods. When doing this I had to look at how many nanoseconds different ways to do things costs, because things are run billions of times on large models in production so it is very expensive.

The people on that team were really good at making things run fast, but I haven't seen that kind of people doing user interfaces, instead even at Google some internal pages took minutes to load with barely any data, I could run a batch job spinning up thousands of servers over terabytes of data in the time it took just to view my account in that service. The silver lining is that Google has many alternatives for everything so I could use something else instead.

I guess a part of it is that Google pays for server compute, but not client compute, so your computer running slowly isn't a big deal compared to their server costs, so the optimizing programmers gets placed on backends. I did work on message routers as well, they have to be blazing fast, so I know how to make network requests fasts, the only reason the UI's are slow is that the companies aren't prioritizing it.

mythhouse · on Dec 24, 2022

> system at FAANG by all the regular devs?

dynamo, spanner ect were written by devs at FAANG . Do they have special interview channels for 'regular devs' and another for people who can write dynamo ?

squeaky-clean · on Dec 25, 2022

When I interviewed for a high level engineering position at Facebook, the questions I got were were definitely harder than the ones some friends of mine got when interviewing at lower levels. I also had a "System Design Interview" which they did not have.

Ironically I failed the System one because I think the interviewer didn't actually understand the algorithms or solutions and was working off a checklist of expected buzzwords.

I spent about it 30 minutes working through a design with him putting down all my decisions. Halfway through he tells me I don't seem to be aware of this certain data structure which is necessary, so he'll tell it to me: A quadtree.

My original choice was an R-Tree, I tried to tell him it's faster for windowed queries, more accurate when it comes to large locations. The only real downside relative to a quadtree is slower inserts and deletes [0]. He just kept looking at his second monitor and telling me "Well this says a quadtree is the most efficient method". Couldn't tell me why, just that his notes say so.

[0] we had discussed at the start of the question, expectations of 1B queries/day, with an acceptable update latency of 24 hours.

ahh · on Dec 24, 2022

> Do they have special interview channels for 'regular devs' and another for people who can write dynamo ?

Not special interviews, but certainly different pieces of headcount. There are Googlers and there are Googlers. Being the second is a lot more work, but a lot more interesting.

lukeramsden · on Dec 25, 2022

I’d love to know more about how one gets to be in the second group at all as a non-Googler - might make working at a FAANG worth it.

Jensson · on Dec 25, 2022

The way I went: First get good at algorithms in some way. Then you join an unsexy infrastructure team at Google, there are tons of hard problems and potential for impact there. Then you use your accomplishments from the unsexy infrastructure team to join a sexy team, even at Google the kind of people who can actually improve the hard parts of Google are rare so that move isn't hard to make.

Some people directly join sexy teams, but that probably means they have some accomplishments from before, the hard part is getting a foot in that door and the unsexy infrastructure teams are a great place to start since there are tons of problems to solve there and not much competition for that kind of work. Most people just want to work with data models or publicly visible products, not the invisible services that moves millions of requests per second, but anything that moves millions of requests per second will be easy to have impact with.

taeric · on Dec 24, 2022

I took the point to be that maybe scale of database engine can get you passed a lot of nuance.

Not much different from databases of old. Writing a good sorting algorithms feels excessive in many applications. The number of such algorithms that the planner of a database picks between is rather large.

wnolens · on Dec 25, 2022

> I can't believe how unperformant the systems I've often seen in Big Tech are. I want more people to read the article about fitting the whole level into a single byte[1] and stop using big budgets as a way to be irresponsible

Time to market and solving business problems don't care about performance you'll never see. This is so obvious, I can't relate to your writing.

If I see a developer pack a data structure as tiny as possible I will not sign off on CR until it's done in a more clearly reading and maintainable way. Unless you're working on high performance applications or similar, you're pissing in the wind.

taeric · on Dec 24, 2022

I mean, you aren't wrong. But there is more diversity and challenge in five minutes of many modern games than all of pit fall.

Same goes for many of these algorithms. They are amazing, but also require a simplicity of problem that is hard to pull back to. Worse, they often solidify the program into something that is very rigidly defined and not conducive to change.

flobosg · on Dec 25, 2022

Bloom filters have several applications in bioinformatics: https://en.wikipedia.org/wiki/Bloom_filters_in_bioinformatic...

juancn · on Dec 26, 2022

Somebody implements that infra, and in many cases you need to use them to build it, and it's important to understand them.

I'm not asking you implement one from memory, but at least have some knowledge of its properties and the reasoning behind their design.

For example Bloom filters are a probabilistic data structure, even if you don't use one as is, it's a representative algorithm of whole class of techniques that can be used when exact answers are not required. For example, at scale, if you have something that can cheaply shed 90% of the load without falling back to a more expensive precise algorithm is a big win.

Or Splay trees, binary trees, etc. the notion of self balancing structures and the concept of "divide and conquer" as a problem solving strategy.

All these add something to your toolbox and in many cases can make a huge difference on the type of solutions that you can come up with other than "throw more money at it".

Many people miss the point on these, it's like learning math, it's the way you learn to think that matters, not the particular piece of math you learn.

analyst74 · on Dec 24, 2022

I work on a database team at Google, tricky algorithms are definitely used. Now did I write any of those? No, they were already implemented in places where it makes sense before I joined the team.

I'm sure the dynamo team, the spanner team and etc are using all the techniques to make their database performant.

javaunsafe2019 · on Dec 25, 2022

Makes sense or made sense? Non native here sry.

tylersmith · on Dec 25, 2022

You can do either if those choices still make sense. Those choices made sense in the past, and they might still make sense currently.

PaulDavisThe1st · on Dec 24, 2022

Although it is somewhat embarrassing to admit this, when I wrote JACK [0] I was unaware of topological sort, mostly due to a lack of any formal CS education. This was a major error/failing on my part, because that codebase was more or less made for TS. There have been few other instances where a lack of formal CS training has been an impediment, but not knowing that there was a well-known algorithm for ordering node execution in a directed graph was ... bad.

[0] https://jackaudio.org/

samsquire · on Dec 24, 2022

Wikipedia is sometimes enough to implement an algorithm. I implemented multiversion concurrency control and btrees due to Wikipedia which do not have pseudocode. If you can implement an algorithm from an English description that is really useful. It relies on the algorithm being well explained.

I also suggest trying to implement a btree, they're really useful data structures for data retrieval and many database products use them extensively for indexes.

With any algorithm, I think there is a critical insight (that might be different for each person) where the algorithm clicks.

I didn't understand mergesort properly until I worked on implementing it but I understood quicksort. When the underlying mental model of the problem being solved becomes clear, we can reason about the algorithm and write code to implement what we want to do.

withinboredom · on Dec 25, 2022

I also recommend this. It can really help when trying to understand why a query may be slow even though you’re using indices.

For example, one index was on a Boolean field and the query took several seconds. Most rows were “true” and very rarely did we query for false, and when we did query for false, latency was acceptable. If you know a btree is being used under the hood, you can imagine what the tree looks like and that it is very lopsided. From there, as soon as you see the EXPLAIN, you know how to fix it: drop the index.

ComputerGuru · on Dec 24, 2022

At this point, everyone knows about bloom filters simply because it’s the thing to write about any time someone asks “lesser known data structures/algorithms?” - it’s hard to find a list that doesn’t feature them!

Still some good entries in that list. I would add skip lists, not because they’re structurally anything special but because they open your mind to the possibly entirely-new-to-the-reader world of probabilistic data structures.

As someone that’s been around the block, something I’ve learned to be less surprised by is how often simply adding an element of randomness to a difficult problem can sometimes give you results comparable to a 10- or 100-times more complicated solution, with the advantage of being more maintainable, making better use of the i$, and avoiding a lot of work. (Eg compare a cache dropping randomly selected items to LFU or LRU caches.)

PartiallyTyped · on Dec 24, 2022

> I’ve learned to be less surprised by is how often simply adding an element of randomness to a difficult problem can sometimes give you results comparable to a 10- or 100-times more complicated solution

We tend to use "Las Vegas" and "Monte Carlo" to classify randomised algorithms, where the former means using randomness to get a speedup, and the latter means using randomness to get an approximately correct solution.

An example of the former is QuickSort, where we shuffle the elements, giving the Expected nlogn time complexity* . An example of the latter is bloom filters. Other examples are found all over approximation algorithms like taking random cuts off graphs.

* Assuming something like Yates shuffling with O(N) complexity is used.

chii · on Dec 25, 2022

and once in a while, the shuffle puts the elements in a pattern that makes quicksort perform worst case (n^2). But i guess that is rare, the larger the list! So in practice, it works.

PartiallyTyped · on Dec 25, 2022

The larger the list it gets exponentially less likely that this occurs if the shuffling occurs at every iteration.

We have 2/(n!) probability that the shuffle sorts the list either way. The probability that it happens on the second level is 2/((n-1)!).

So the probability that it happens twice is 4/(1^2 * 2^2 … (n-1)^2 * n)

Which ends up being prod_{i=1} ^N i ^ i

It’s so astronomically unlikely, that it just never does.

chii · on Dec 25, 2022

> Eg compare a cache dropping randomly selected items to LFU or LRU caches

I think it's not that randomly (no pun intended) adding randomness to an existing algorithm or solution could improve it, but that sometimes a random sampling process mimics the underlying problem.

For caching, the assumption that LRU item should be ejected is that - an assumption. If you checked this assumption and found that it was wrong, but that instead the access patterns are random, then dropping random items _do_ mimic the problem domain, and that makes it a better solution!

ComputerGuru · on Dec 25, 2022

It’s not quite that simple. It’s also probabilistically arriving at the asymptote of the ideal solution for many cases. If an item is accessed often, it’ll be re-inserted quickly when it is accidentally removed but only at risk of eviction at a rate of 1/n. On average, the items you use often will be in the cache and the items you don’t, won’t. It has nothing to do with specifically actually random access pattern. Because you access it a lot, the fact that it got evicted one of those n accesses is completely amortized by the other accesses (even more so when you take into account that you never evict the item that you are currently accessing).

The solution fails for specific non-random access scenarios. Eg you access an item exactly every n-1 accesses: in an LRU, it’ll always be in the cache but in a probabilistic eviction scenario, it’s most likely not going to be in there.

pfdietz · on Dec 25, 2022

Another place where randomness comes in handy is testing.

Suppose you partition your test input set into N different partitions of equal size, and you want, for each partition, to have a test input from that partition. If you just generate random tests then on average after generating N log N tests you will have hit each partition at least once. And this is true not just of the partitions you choose, but ANY N partitions of equal size.

Jensson · on Dec 24, 2022

Randomness is very useful to handle edge cases, quicksort is the poster child here, it has some edge cases were it runs extremely slowly but randomising how you apply the algorithm means that you can ignore them since they are so unlikely. If you try to use deterministic code then those rare edge cases often comes up in the real world.

taeric · on Dec 24, 2022

Fun list! Other things I'd recommend trying:

    Huffman coding 
    Binary decision diagrams
    Simplex method
    General modeling of a problem to SAT

The last one there is particularly fun. Finding a way to solve a problem fast by mapping it to a silly large set of variables is surprisingly fun.

pfdietz · on Dec 25, 2022

Suffix trees

evouga · on Dec 25, 2022

Yes! Most of the examples of “challenging” algorithms in the blog post are actually fairly elementary stuff covered in undergrad algorithms. Suffix arrays though (especially linear-time construction) are the real deal.

zelphirkalt · on Dec 24, 2022

I often find myself thinking: "But how would I do any of these without reaching for mutation? So that I can later parallelize them?" And then I come up empty. I still need to work through "Purely Functional Data Structures" by Okasaki. Hopefully then I will have an idea, of how to come up with a functional version to mostly single-core algorithms or at least locking-requiring algorithms.

I skipped ahead once and looked at functional red-black trees and the solution was to involve one more color! How ingenious is that?! I have no idea, how people got to these solutions.

In many places algorithms are described implicitly with mutation assumed. However, this can be very hard to parallelize later and might hit you in the future. In the times of many cores, this seems no longer appropriate to me. I wish we spent more time on learning how to get it done without mutation.

Another great post that was linked at some point on HN was this: https://nullprogram.com/blog/2020/04/30/ -- A comparatively simple change in approach and yet it allows for massive parallelization without locks! This is the kind of stuff we should learn more about and then make more use of our many available cores, moving on from the single core world of algorithms.

senderista · on Dec 25, 2022

Okasaki’s book is generally not practical and he even expressed some surprise at its popularity due to that fact. There are some practical purely functional data structures but you’ll have to find them elsewhere.

zelphirkalt · on Dec 25, 2022

Can you go into a bit more detail, of why it would not be practical? Are there better implementations of the data structures in the book? The book I believe is from 1995 or so, so I could well imagine, that in the meantime progress has been made. But what exactly makes the algorithms in the book impractical for usage within a functional program?

mrkeen · on Dec 24, 2022

Often mutation on a single core is faster than persistence on multiple cores.

samsquire · on Dec 24, 2022

This is the problem I'm working on.

Independent threads are as independently fast as a single core and they slow down overall when synchronization is introduced. Amdahls law means that the cost of sequential tasks approaches the majority of time compared to the parallel speed up.

My design is to shard by thread. Each thread or machine maintains a sorted shard of the data. Erlang takes a similar approach. If you want a total order view, then you N way mergesort.

I have a lock free algorithm that can do about 1.2 million synchronizations a second across 11 threads. My lock benchmark does 3,444,152 3.4 million requests per second with 11 threads so my lock free algorithm isn't as good as locks. My lock free algorithm doesn't block though.

I'm thinking of calibrating message rate by increasing latency to increase throughput and limit interference for the lock. If I increase message rate to 2500-25000 messages then more messages get transferred per synchronization event.

samsquire · on Dec 25, 2022

I did some tweaking with my benchmark.

I picked 11 threads due to my CPU having 12 logical cores.

With 100 threads and incrementing 10 at a time, the LockBenchmark achieves 3,482,053 requests per second.

With 100 threads the lock free benchmark achieves 17,754,858 requests per second, 10 messages sending per synchronization event at a time.

In other words, the lock free algorithm scales better.

LockBenchmark code https://github.com/samsquire/multiversion-concurrency-contro...

Lock free actor2 algorithm https://github.com/samsquire/multiversion-concurrency-contro...

zelphirkalt · on Dec 25, 2022

I think this is to be expected. If you have a lock anywhere, the risk is, that the lock will become the bottleneck at some point when scaling to more and more concurrently running processes wanting to acquire the lock.

withinboredom · on Dec 25, 2022

Why merge them at all? Can a lazy merge increase performance by the fact you are usually iterating over them and at that point most work will be done by the iteration and not by the merger. That gives you some time to work out which thread/machine to pull from next while the main thread works on the result of the last iteration (ie, preempt the iteratie). You could even buffer/stream them from each thread much faster than any code could do real work.

In my experience, lock-free solutions tend to not do well when threads share a physical core (vs virtual cores), but ymmv.

srcreigh · on Dec 25, 2022

How'd you pick 11 threads? How do the algorithms scale as you change # of threads? T=2, 10, 100, 1000, 10k

samsquire · on Dec 25, 2022

https://news.ycombinator.com/item?id=34126789

I replied to your comment here.

Jensson · on Dec 24, 2022

You can parallelize with mutations, just split the data and do a tiny bit of extra work when the data overlaps. There aren't that many real world cases where you want to run expensive computations on highly connected data, and in those cases you mostly just write GPU code instead of bothering with functional CPU languages. That is what the engineers who works on the extreme end of performance does, if you want to learn functional programming that is efficient and performant then go and learn GPU programming, the techniques you learn there also works for many CPU applications, there are decades of top end industry practices behind those techniques.

zelphirkalt · on Dec 25, 2022

There is certainly some kind of middle ground between single core CPU and massive parallelization on a GPU. What am I going to use my 16 CPU cores for? Should they sit idle forever, only ever handing parallel work over to the GPU?

Also not all work is parallelizable in the way graphics are parallelizable or follow the relatively simple fork-join structure. What about things like actor systems, where each actor may run on another core, each doing different kind of work?

Sometimes one might want to also have good single core performance in parallelized settings, so that the work for a single core gets finished quickly.

I am not convinced the picture is really as black and white as "do parallelized things on GPU, single core things on CPU".

samsquire · on Dec 24, 2022

I am deeply interested in parallelism. Would enjoy talking to you about it. I think the parallelism provided by Haskell is really interesting. And its software transactional memory.

I think parallelisation can be generalised but we need to study it further to understand it better. There must be a set of primitives that parallelise very well but they've not been found out yet. Rather than trying to parallelise something that was invented to be sequential, we parallelise something that is parallelisable.

I'm today working on parallelising a total ordered list view over X sorted lists that are maintained by separate threads or independent computer servers. My idea is to use N way mergesort to retrieve up to N items sorted items. Servicing requests is extremely fast since there is no synchronization needed within a thread or server but if you need total order, you need to use the N way mergesort. I'm thinking you have a cluster for ordered views and a cluster for within-a-thread view.

My other problem I'm working on is how to paralellise integer updates. If you have 8000000000 accounts (which DOES fit on a single machine) but you have more operations overall than a single machine can process? You could shard by account but I think there's an approach that shards by integer and per server. You store a portion of the integer on each machine, hopefully enough to handle operations. Then when you need to check if account integer > deduction amount I just need to learn how to a strongly consistent read cheaply. Essentially we load balance the integer in the account.

For example, fibonacci is often used as an example to parallelise something but it's never implemented in a parallelisable manner. It just does the same calculation in different threads. If we knew the formula of fibonacci that didn't depend on previous iterations, we could parallelise fibonacci. But I'm not a mathematician so I I'm not sure how to find out that formula.

Jensson · on Dec 24, 2022

Sorting is a standard distributed algorithm, it is basically quicksort but you move some data between servers between the steps. The aim is to create ordered buckets of data that you then send to servers. The first step is to find good pivots, so sort the data and select data points at the median and different percentiles from each server, then fan out those pivot candidates to every server. Now since all servers has the same pivots they can create the same buckets, and send the data they have of each bucket to the corresponding server. The buckets wont have perfect balance, so you typically want a few times more buckets than you have servers to ensure that you don't overload one of them, that is the fast way. Alternatively you can do another fan out step computing the number of items in each bucket among the servers, and split large ones into smaller buckets using the same technique as before, this will always work but requires a bit more communication between the servers.

As for distributing heavy computations, Google has done that to compute your search results since almost the beginning. Fan out a request to a huge cluster of servers that all have different parts of their data in ram, then they filter and return the parts of the data that is relevant to you. It is totally possible to do it and return a response to the user in a fraction of a second. Typically it takes a few milliseconds to send data between servers in a data center, you can easily fan out a request to a few hundred servers, let them compute and return something 15 milliseconds later. It costa bit to run that though, Google can afford it since search is worth so much per request, so you got figure out if your usecase is worth that much compute. But it doesn't cost that much more than running the same work on a single server, it depends a bit on how well your servers are collocated.

Btw, frameworks for distributed computing like Apache Spark are way too slow for this, you have to hand roll those. But it really isn't that hard to do, there are programming competitions involving those, a correct solution to these kind of things takes like 30 minutes to write if you know what to do and you have done all the boring infrastructure work to ensure you can contact servers and access the data. That is why most programmers never work with complex algorithms, because the boring and tedious work on all the other parts takes up most of the time and manpower.

convolvatron · on Dec 24, 2022

assuming that local access is cheaper than distribution, wouldn't a merge sort be better?

Jensson · on Dec 25, 2022

I assumed the data is spread out in a set of databases. Reading even just a few terabytes of data into a single server is slow and expensive, and then you'd still need to spend a few days cpu time to sort it locally. And for a bit larger datasets you wont be able to handle it with most computers, I think most large companies has some data sets at about a petabyte, so I don't think assuming that amount of data is unreasonable. I worked with data at about an exabyte, not many has that much data, but it starts to get hard to do it locally at just one millionth that amount. But maybe I'm over estimating how common that is.

Another question is how much time it takes. Lets say we have small data, a few gigabytes or so. Sorting that would still take about 10 seconds to a minute, which is annoyingly slow to users. If that data is distributed over many computers that can each read a subset, then the distributed sorting will go much much quicker so you can do a full sort over the entire dataset within a normal 100ms request SLO. But this assumes that the user needs to sort using arbitrary comparisons, otherwise it is of course much better to just pre sort the dataset and serve it that way.

And even if you have all data locally, if that data doesn't fit in memory distributed would still help. Network within a server cluster is typically faster than communicating with local persistent storage, so distributed you can read it once and send it to the different workers instead of having to read and write it over and over as you'd have to do if you sorted it locally. So in other words, making a computer read X data from disk takes about the same time as getting that data to a distributed set of computers, since you can send data faster than you read it, and this is the worst case for distributed.

Note: This assumes that we can't precompute the sort, meaning the user asks for some ordering they want the items to be sorted in and we have to provide that on the fly. Pre-sorting data once is easy, so I'm not sure why we would discuss that case. Also networking tends to cost a ton more when you run it on others computers like AWS or Azure or Google cloud, then it might not be as viable. Don't they take like 10x more than the cost of networking? But when I worked at Google we got to run distributed jobs juggling a petabyte of data without anyone asking questions, so it can't be that expensive. As long as you didn't make an infinite loop in your distributed program or so nobody cares, and even when a guy did that his job ran for a couple of weeks before people noticed that our compute quota ran out, so I'm pretty sure it can't be expensive. But that amount of networking would probably cost a fortune in Gcloud.

dmurray · on Dec 25, 2022

> If we knew the formula of fibonacci that didn't depend on previous iterations, we could parallelise fibonacci. But I'm not a mathematician so I I'm not sure how to find out that formula.

You find it on Wikipedia! The insight that a closed-form solution might exist - "knowing what to Google" - is really all the mathematical insight you need.

But maybe Fibonacci is a stand in for something else in your explanation. I don't follow the definition of the integer updates problem.

samsquire · on Dec 25, 2022

Thank you for your kind words and that critical insight - "closed solution".

Sorry for my poor explanation.

When you shard data, you could shard the record identifier and keep the data on a particular server -OR- you can shard the data itself and mapreduce it.

For example, a file system locally shards the file data into inodes. The sum of all the inodes = the data of the file.

I am trying to split an integer into shards, the sum of the parts is the value.

If account balance is defined as the SUM of all server's amounts, then each server stores a proportion of the data for that user.

If there are 8 servers and a user has £8000 in their bank balance, then you try store £1000 in each server, so any server can serve requests without coordination. If you user wants to spend £25, then that's fine, just deduct the local data. If you the user wants to spend £1001, then you need a globally consistent read since that would drop the balance below 0 for the local replica, and you need to check the other replicas to see if the user has enough.

I think it is related to the ID generation problem. If you have 1000 servers and you need to generate monotonically incrasing ID numbers without clashes between servers, how do you do it without synchronization? Do you split an integer / 1000 and give each server that batch to assign ID numbers out of?

Writing this out has lead me to think that you could run two clusters. One cluster serves sharded data. The other serves a stream of updates from the other servers and keeps a globally consistent value.

With probability, most transactions shall be below the server's proportion and shall be served locally without coordination. Coordination is only necessary rarely - for large purchases.

djur · on Dec 24, 2022

What I would be interested in is some kind of resource (book, video series, etc.) where someone presents a reasonably "real-world" problem, discusses the thought process in choosing an implementation, and steps through the process of implementing, testing, and iterating on the solution. It's easy to find sources for hard problems to solve, and it's easy to find sources for explanations of algorithms and data structures, but I haven't found much to bridge the gap.

sitkack · on Dec 25, 2022

I thought of something similar, but going over how early computers were used, from precalculating ballistics tables, to solving cut-and-cover for building roads. I always loved the story problems because they showed how math was useful for figuring things out, they gave examples of predictive power.

You might really like the lectures from https://www.csail.mit.edu/person/erik-demaine

Or the AOSA book, https://aosabook.org/en/

justinlloyd · on Dec 25, 2022

That is a very good idea. I think I have the germ of an idea for a new set of videos.

galkk · on Dec 24, 2022

This is related to one of my fun stories. During an interview to one company, I got asked topological sort question. I passed, but in couple of months I needed to troubleshoot why host app has a problem loading addons. I started digging and found that their implementation of topological sort was wrong…

layer8 · on Dec 25, 2022

Sounds like they made the right choice in hiring the expertise they were missing. ;)

melony · on Dec 24, 2022

I recommend https://web.stanford.edu/class/cs168/index.html

todd8 · on Dec 24, 2022

This is an interesting list. I'd add hashing and hash tables to the list. There are lots of interesting variations of these as well.

I would put hash tables ahead of piece tables because of their generality. For text editors I've always thought that gap buffers were so straightforward that ropes and piece tables seemed like extra unnecessary work. I see that VS Code uses piece tables so perhaps someone can explain the performance differences to me.

I've never sat down and benchmarked various implementations of text buffers. However, it seems to me that memmove(3) on modern processors would be so fast that moving a buffer gap on even a 1GB file wouldn't be perceptible, and in return operations over the whole buffer like string searching or parsing would be much more efficient in a gap buffer because the program could count on the buffer being in contiguous memory locations (after closing the gap). In buffers implemented using a part table or ropes, every character access involves extra layers of indirection. Furthermore, gap buffers are going to be more cache friendly for the way that most editing operations take place in a text editor.

Perhaps someone that has actually done some comparisons could enlighten me.

TYMorningCoffee · on Dec 24, 2022

I think the author excluded hashing because it is too well known.

todd8 · on Dec 25, 2022

That makes sense, thanks.

peterfirefly · on Dec 25, 2022

Piece tables give you fast and easy undo.

rrgok · on Dec 24, 2022

I like algorithms and data structures. One of the thing I, personally, find hard about Algos and DS, is that I don't know when to use which. For example, I could never ever come up with my own, that the traveling salesman problem can have a good solution with ant colony optimization or simulated annealing. How do I learn that skill? I still don't know when to use a graph for what kind of problem.

I hope someone can shed some light on this.

jpitz · on Dec 24, 2022

The Algorithm Design Manual, which maybe should have been named The Algorithm Selection Manual.

https://www.goodreads.com/book/show/425208.The_Algorithm_Des...

Jensson · on Dec 24, 2022

You get good at this from a long life of curiosity in the subject. Basically, if you spent your life testing many different ways to do things, think thousands of different ways to compute or parse or view data etc, then you have that huge knowledge bank to aid you when you see a new problem, so you compose a few of those things to solve it.

It is the same process as learning the elementary algebra in math. You learn a few basic techniques, like moving numbers and variables from one side to the other, and then you compose those to solve expressions you haven't seen before. The only difference is that algorithms is a much larger space, there are way more techniques and problems there so it takes longer to get to a point where it is useful to you.

TheEndIsNigh · on Dec 24, 2022

You need to solve hard problems. Code Jam is an amazing source for this. Just don't expect to knock them out in 5 minutes. Pick a hard problem and stay a week with it trying different ideas and compare the time complexity.

djur · on Dec 24, 2022

So far most of the resources I've found for this kind of thing don't really provide enough guidance in what algorithms to choose and why, mostly because they're usually designed as competitions. I'm not interested in competing with other programmers, I want to learn from them.

TheEndIsNigh · on Dec 25, 2022

Because guidance can't be given its learned. You don't have to compete.

Go on LC and try various approaches compare time-complexities and it will be obvious why you have to use a certain algorithm.

anigbrowl · on Dec 24, 2022

Nicely written. Hits a spot that a lot of instructional and documentary material misses - telling the reader up front what common problem each technique solves.

askvictor · on Dec 25, 2022

In my most recent job, my first real task was to implement a persistent (i.e. flash-disk-based) queue. In (micro)python. Seems simple enough, until real world performance constrains kick in, and suddenly things get much, much more interesting. Need to keep flash happy, so wear-levelling is a concern; not a problem using LittleFS (as it's designed for flash disks), but that ends up taking massive amounts of time for seeking. And we have an upper bound on allowable time for an operation, which throws out any naive solution.

Implementing algorithms is easy. Implementing algorithms with constraints... not so much (but still fun though)

nikisweeting · on Dec 24, 2022

Surprised HyperLogLog didn't get mentioned considering Bloom filters were such a popular suggestion.

https://en.wikipedia.org/wiki/HyperLogLog

kens · on Dec 25, 2022

I'd put Fourier transforms on the list of algorithms that are extremely useful to know. (Well, not exactly the algorithm itself as much as the concept.) Many things make a lot of sense if you understand Fourier transforms, such as computer graphics, audio processing, signal processing, image processing, data analysis, filtering, and electronics. But Fourier transforms get hardly any attention on HN.

e9 · on Dec 25, 2022

Agreed, here is a good introduction with many examples: http://www.dspguide.com/pdfbook.htm

nayuki · on Dec 25, 2022

The OP's list is decent. Here are my suggestions for moderately challenging algorithms to understand and implement:

    B-tree
    Cooley-Tukey radix-2 FFT
    CRC-32
    DEFLATE decoder
    Gauss-Jordan elimination
    JSON parser
    SHA-1 hash

koinedad · on Dec 24, 2022

Thanks for the list. First time I coded a Trie I thought it was super cool. Topological sort is such a useful one as well.

Turing_Machine · on Dec 24, 2022

Tries are probably the one I've used the most in the real world, so I guess if I were making such a list it'd be on the main one, not in the honorable mention section. I'm not sure which one I'd remove to make space for it, though.

I second the recommendation for Jeff Erickson's book. He is an awesome in-person instructor, too.

yeputons · on Dec 25, 2022

> Segment tree

Note that there are at least two data structures with the same name (three if you also count interval tree): the one used in competitive programming for fast queries on subsegments of a changing array, and two used for storing segments/intervals on a line in slightly different ways. Wikipedia describes the latter.

charlieyu1 · on Dec 24, 2022

Thank you. Never really figure out how parser works but maybe I should try and write one

tgv · on Dec 25, 2022

I've always found writing parsers very rewarding. There are different parsing methods. The article mentions top-down recursive descent parsing. That's the easiest to do by hand, but you can't parse everything that way (or not easily). Bottom-up parsing is another strategy, which works (in principle) for all unambiguous grammars. The lazy solution is backtracking (try a rule, back up if it fails), and add memoization to keep the runtime acceptable. This is the basis for the so-called PEG parsers.

There's a lot of formal literature. E.g., top-down recursive descent is a class known as LL(k), bottom-up is LR(k), and there are algorithms to derive parsers from grammars. It is however not the easiest starting point.

m0rissette · on Dec 24, 2022

> Alright, so we are all spending our leisure time reading about algorithms, right?

Obviously, right?

keepquestioning · on Dec 24, 2022

If you aren't preparing for your next interview this holiday period, you are seriously screwing up.

TheEndIsNigh · on Dec 24, 2022

Not exactly challenging, they're covered in a decent undergrad Algorithms course.

cinntaile · on Dec 25, 2022

They were challenging when you learned them. It doesn't really matter when or where that was.

EnderShadow8 · on Dec 25, 2022

Segment trees are the foundation upon which all of competitive programming is built.

mybrid · on Dec 25, 2022

I don't have a problem with the article. I do have a problem with the "every programmer". Not every programmer writes sophisticated algorithms. In fact, in my experience working on web sites most programmers are front end developers coding web sites or back end developers shuffling data around. The most sophisticated algorithm they need is a for loop. Occasionally one needs and algorithm developer to work on scale out.

Next time try, "challenging algorithms and data structures every algorithm programmer should try."

TrackerFF · on Dec 24, 2022

Call me cynical, but every time I see interesting threads like these - I can't help but to think that somewhere, a hiring manager is thinking to himself "Nice, I'll add these to the technical (interview) questions"

waynecochran · on Dec 24, 2022

I would add KD-Tree and BSP-Tree as generalizations of a binary search tree in higher dimensions.