Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What's the best paper you've read in 2020?
705 points by luizfelberti on Dec 8, 2020 | hide | past | favorite | 192 comments
I know there are classics that get posted every time this question comes around, so bias them towards more recent ones :)



Here's a wonderful one I read a little over a year ago:

"Estimating the number of unseen species: A bird in the hand is worth log(n) in the bush" https://arxiv.org/abs/1511.07428 https://www.pnas.org/content/113/47/13283

It deals with the classic, and wonderful, question of "If I go and catch 100 birds, and they're from 20 different species, how many species are left uncaught?" There's more one can say about that than it might first appear and it has plenty of applications. But mostly I just love the name. Apparently PNAS had them change it for the final publication, sadly.


> But mostly I just love the name. Apparently PNAS had them change it for the final publication, sadly.

Big game from an organization with that acronym.


We used to joke in grad school that PNAS stands for Paper Not Accepted in Science.


We used to joke it was Probably Not Actual Science


when you get to pick your own reviewers like they academy members do it might as well just be an opinion column. lol.


Yeah they're a real P-NAS.


Big game or fowl play?


Ruffled feathers


I bet they were just too chicken to publish it.


I really gobble up this kind of word play.


You're such a turkey


This sounds like a similar problem to an exercise in the famous probability textbook by William Feller about estimating the total number of fish in a lake by catching N of them and tagging them, and then throwing them back in the lake and coming back later to catch another N fish. You check how many of those fish are tagged and derive your estimate from that using the hypergeometric distribution. See the pages here:

https://imgur.com/oRF0OD3 https://imgur.com/UtCZL5A


I appreciate the footnote that basically says "Whoa, this is more practically useful than we realized!"


Nice. My math-fu is very weak. I dimly recall a notion for estimating the number of unfound bugs for a code base. Is this similar?


Yeah, exactly. If you wanted to know that your code was bug free, how could you do it? Set a team of experts to each independently scour for bugs. But when do you stop? The quick answer is that you should keep going until every bug you've found, has been found at least twice. I think of this as being that you "just barely" found a bug if only one person identified it, so there are probably still bugs you have "just barely" not found remaining.


I thought it was more like:

Set a team of experts to find bugs independently for some specified amount of time. Then look at how many of the same bugs were found by multiple experts.

If most of the bugs were found by multiple experts, then there are probably not that many more bugs than the total number that they found. If most of the bugs were found by only one expert, then there are probably a lot more than the total that they found.

With some math you can pin down the 'probablies' to numeric ranges.


Is that a direct application of the paper or something else? (sorry I didn't read it)

Just wondering because this rule of thumb sounds intuitively wrong to me. Depending on the difficulty of the bugs and the skill levels of the experts, it seems possible for them to find every "easy" bug at least twice while having none of them finding the hardest bug even once. (real world example would be some obscure zero-day security bug)


I should have said "at least until" rather than "until", sorry. It's from a result that predates the paper, due to Fisher (using butterflies, not code bugs).

The actual guarantee from the result is not that the number of unobserved "species" is small, but that the total population of all unobserved species is small. If you go back to the birds example, then you could say something like "at most 0.1% of all birds are from species that we haven't identified" but maybe those 0.1% of birds are from a million different species each with incredibly tiny populations. In the code bug example, the very rare species would be the bugs that are very unlikely to be found, i.e. it's more about estimating how many more bugs you will find if you continue to analyze it than how many are really there.


agreed, I thought of the same issue. Seems like there has to be some assumption about even difficulty of finding bugs / catching species of birds


Does that not assume equal probability of finding bugs? Seems to me that both teams would find the same easily identified bugs (say '=' instead of '==' or revealed by compiler warnings).


Is it similar to the german tank problem?



The German tank problem is easier because it assumes the tanks have sequential serial numbers.


Builds systems à la carte: Theory and practice

https://www.cambridge.org/core/services/aop-cambridge-core/c...

I've always hated build systems. Stuff cobbled together that barely works, yet a necessary step towards working software. This paper showed me there's hope. If we take build systems seriously, we can come up with something much better than most systems out there.


Thanks for sharing!

I've recently started learning C++ and have had to grapple with the complexities of CMake. I'm only a few pages into the paper but it's already done a great job at distilling the problem domain and the core components of a build system.

I also found the beginning to be a great introduction to `make` and the build dependency tree.


This is my favorite paper now for 2 years in a row :)


I’ve loved build and deploy systems. Sure they are complex but when you have a working system, they are quite a joy.


This Nature paper,

Non-invasive early detection of cancer four years before conventional diagnosis using a blood test

https://www.nature.com/articles/s41467-020-17316-z

Major breakthrough in cell-free diagnostics. The methylation pattern of DNA can be used to identify early-stage cancer, i.e. circulating tumor DNA (ctDNA) has a distinct methylation pattern.

The results are based on data from a ten year study which must have cost a fortune to run.


Small correction: the paper is in Nature Communications and not Nature.


So we could, in principle, selectively demethylate those regions of DNA and cure some cancers?


Cancer isn't caused by abnormal methylation patterns.

The ctDNA are just fragments of DNA which have been shed by the tumor and are now circulating in the blood stream.


That's great news!

So now after we know that, what are the treatment options at that stage ?


Probably many of the same ones that are used in the later stages.

My understanding is that the prognosis is much better the earlier one begins treatment.


[flagged]


What's with the sugar?


Combine that with the breakthrough in protein folding and mRNA vaccines and we could have a rapid pipeline for custom, targeted immunotherapies for not just new bugs, but new cancers.


The pair of these papers: (Don't read them in full.)

1.Attention is not explanation (https://arxiv.org/abs/1902.10186)

2.Attention is not not Explanation (https://arxiv.org/abs/1908.04626)

Goes to show the complete lack of agreement between researchers in the explainability space. Most popular packages (allen NLP, google LIT, Captum) use saliency based methods (Integrated gradients) or Attention. The community has fundamental disagreements on whether they capture anything equivalent to importance as humans would understand it.

An entire community of fairness, ethics and Computational social science is built on top of conclusions using these methods. It is a shame that so much money is poured into these fields, but there does not seem to be as strong a thrust to explore the most fundamental questions themselves.

(my 2 cents: I like SHAP and the stuff coming out of Bin Yu and Tommi Jakkola's labs better..but my opinion too is based in intuition without any real rigor)


As a laymen, I don't understand how "attention" and "explanation" are used here. Would you be able to summarize the terms and the contention?


These are deep neural net papers, specifically in NLP.

Explanation: why the model (the deep neural net, that is) is doing what it's doing.

Attention: a particular technique used in certain deep net models, invented a few years ago, that originally showed remarkable performance improvements in NLP (natural language processing) tasks, but has recently been applied in vision and elsewhere.


recently, a lot of neural network models, especially those in NLP (like GPT-3, BERT, etc.) use "attention" which basically is a way for neural networks to focus on certain subset of the input (the neural network can focus its "attention" to a particular part of the input). Explanations just refers to ways for explaining the predictions of the neural networks.


Some more links provided at a NeurIPS tutorial this week

https://explainml-tutorial.github.io/papers

I saw the first paper mentioned in the tutorial chat.

What is your impression of anchor? https://github.com/marcotcr/anchor


I am following Tommi Jakkola in this space too. Also would recommended Ameet Talwalkar's group, esp the papers with Gregory Plumb like MAPLE [1].

[1] [PDF] https://papers.nips.cc/paper/2018/file/b495ce63ede0f4efc9eec...


Jaakkola – with two a's. Finnish names are weird.


I think one of the most interesting papers I read this year was Hartshorne & Germine, 2015, "When does cognitive functioning peak? The asynchronous rise and fall of different cognitive abilities across the life span": https://doi.org/10.1177/0956797614567339

There are lots of good bits, such as: 'On the practical side, not only is there no age at which humans are performing at peak on all cognitive tasks, there may not be an age at which humans perform at peak on most cognitive tasks. Studies that compare the young or elderly to “normal adults” must carefully select the “normal” population.' (italics in original)

This seems to me to comport with the research suggesting that most or all of the variance in IQ across the life span can be accounted for by controlling for mental processing speed; i.e., you are generally faster when you are younger, but you are not more correct when you are younger.


Apologies for pasting all of this, but the excerpt has always stuck with me. It seem correct. Are there alternative explanations other than mental processing speed and so on? (For example, later in life, you're less likely to be in a position to do the same sort of work. But that seems to have been tested by e.g. the institute of advanced study.)

As far as I can tell, this section might be one of those facts that people try not to think about too much. I don't worry about it, but I end up thinking about it a lot.

There was recently some headline news about an older mathematician that made a significant breakthrough. Other than that one outlier, have there been many important contributions made by people after the age of, say, 45?

--

"I had better say something here about this question of age, since it is particularly important for mathematicians. No mathematician should ever allow himself to forget that mathematics, more than any other art or science, is a young man's game. To take a simple illustration at a comparatively humble level, the average age of election to the Royal Society is lowest in mathematics.

We can naturally find much more striking illustrations. We may consider, for example, the career of a man who was certainly one of the world's three greatest mathematicians. Newton gave up mathematics at fifty, and had lost his enthusiasm long before; he had recognized no doubt by the time that he was forty that his great creative days were over. His greatest ideas of all, fluxions and the law of gravitation, came to him about 1666, when he was twenty-four—'in those days I was in the prime of my age for invention, and minded mathematics and philosophy more than at any time since'. He made big discoveries until he was nearly forty (the 'elliptic orbit' at thirty-seven), but after that he did little but polish and perfect.

Galois died at twenty-one, Abel at twenty-seven, Ramanujan at thirty-three, Riemann at forty. There have been men who have done great work a good deal later; Gauss's great memoir on differential geometry was published when he was fifty (though he had had the fundamental ideas ten years before). I do not know an instance of a major mathematical advance initiated by a man past fifty. If a man of mature age loses interest in and abandons mathematics, the loss is not likely to be very serious either for mathematics or for himself.

On the other hand the gain is no more likely to be substantial; the later records of mathematicians who have left mathematics are not particularly encouraging. Newton made a quite competent Master of the Mint (when he was not quarrelling with anybody). Painlevé was a not very successful Premier of France. Laplace's political career was highly discreditable, but he is hardly a fair instance, since he was dishonest rather than incompetent, and never really 'gave up' mathematics. It is very hard to find an instance of a first-rate mathematician who has abandoned mathematics and attained first-rate distinction in any other field.1 There may have been young men who would have been first-rate mathematicians if they had stuck to mathematics, but I have never heard of a really plausible example. And all this is fully borne out by my own very limited experience. Every young mathematician of real talent whom I have known has been faithful to mathematics, and not from lack of ambition but from abundance of it; they have all recognized that there, if anywhere, lay the road to a life of any distinction.

1 Pascal seems the best."


First, a note: math is one of the specific areas where humans generally peak really quite young. Math (quantitative reasoning/logic) is not the only area of psychometric intelligence testing (e.g., IQ) and not even a majority of it. So, it may be that it really is a bit harder for older mathematicians to make breakthroughs? I don't know. At any rate, citing mathematics as an indicator is probably not ideal, because math ability does indeed generally peak early.

However, as of 2011[1] the mean age of physics Nobel winners at the time of their achievements across the entire period of the award was 37.2 and since 1985 the mean age was 50.3.

According to the same paper, by the year 2000, Nobel-level achievement in physics before age 40 was only 19% of cases. It also appears that awards in chemistry and medicine are similarly increasing in mean age.

Is this dispositive? Certainly not. Maybe the Nobel committee prefers to award old scientists because of some unknown bias?

However, it does indicate that high achievement is both possible and normal in middle age and beyond.

[1] https://www.pnas.org/content/108/47/18910.full


Specifically responding to the increasing average age of Nobel prize winners: this is in part due to the increasing complexity of problems to solve. With our current ways of solving problems, the new problems become harder and harder. The existing human knowledge is also becoming harder and harder to understand, requiring somebody working in a field to spend much longer studying and catching up to the state of the art before being able to make a significant contribution to the field.

This is one of the reasons that I'm personally so excited about (and working on) the potential of spatially immersive media like VR to understand complex concepts. Taking a step back, tools like a graph plot enabled humans to understand complex concepts like differentials and projectile motion at a much younger age. Could a breakthrough with new ways of understanding human knowledge effectively do the same with knowledge that is today considered complex (eg, quantum mechanics)? If such a breakthrough happens, could we bring the average age of significant contribution in subjects like physics back down?

I don't know, but I hope so. :)

Edit. I also remember reading thoughts by either Michael Nielsen on the increasing age of Nobel-worthy contributions in physics, but I can't find it in my current sleep deprived state. I shall tomorrow if somebody else hasn't pointed to that article by then.


Maybe, but I somewhat doubt it. Nobel-level work is usually derivative of only a few basic concepts, but is otherwise quite daring. The academic guild system has become nearly impossible to get through, and all corporate or government research depends on passing through that system first. Nobody can just get to work. First you have to get in. 99% of applicants are mostly concerned with prestige or career opportunities. 1% wants to do research. Then you need funding. This comes by helping professors on their ideas, not yours. Then you need a job. Better pick a popular field and find ‘business value’. Now it’s time to buy a house. Maybe you’ll get back to that big idea you had once you pay it off. Student loan availability turned the academic pipeline into a job requirement, and a job is then required to pay it off. We’re just entering the era of Nobel winners that started school during the Vietnam boom. It seems to me that the Nobel will cease to have any meaning in a decade or so. The trend is that contemporary prizes are given to politically valued choices from a huge field of contributors, or as an honorarium for ‘famous’ professors. The last 20 years of minted professors are far more focused on job security than great research, and it would be surprising to see a lot of individual breakthroughs at the Nobel level. And then there are corporate labs, which are run by professional managers handing out nebulous quarterly objectives with a side of panic. Forget about it. The biggest hope for ambitious research may be self-funded entrepreneurs. There must be, somewhere along that path to human colonization of space, a Nobel for Elon.


Good point. I call this problem the Giant's Shoulder Climbing Problem. Isaac Newton said he could only see farther because he was standing on the shoulders of giants. By giants he meant all the knowledge amassed by previous generations. The problem is, nowadays the giants got so big, that one can spend the better part of a life just climbing the damn giant, many failing to reach the ever receding shoulders.

I've pondered on this problem a bit before. To solve it, I reached your same conclusion, that we need some breakthrough with new ways of understanding human knowledge simplifying knowledge that is today considered complex, or in other words, we need at least try to build some sort of elevator.

IMHO, the most promising possible breaktroughs I could find were:

(I) a reform in math education, with early introduction of schoolchildren to computer algebra system (CAS) software, shifting curriculum away from tedious manual computations and trick learning. When in university, for example, I learned lots of integration tricks, and forgot most of them a few years later. Would my time had better invested just learning SymPy instead of all those tricks? This idea is pushed by Conrad Wolfram. For example. See his talk at https://youtu.be/jE9lU4E52Vg

(II). a reform of physics education to replace vector algebra with proper geometric algebra, as advocated by David Hestenes. Vector algebra as taught in physics today is actually a hack pushed by Gibbs, that only works well in 3D and demands a lot of shoehorning to work in problems with higher dimensionality. Geometric algebra scales well in any number of dimensions, and many problems become easier. The four Maxwell equations, for example, became one. See the discussion in https://physics.stackexchange.com/a/62822 :

"Now, the contention is that Clifford algebra is under-utilized in basic physics. Every problem in rigid-body dynamics is at least as easy when using Clifford algebra as anything else — and most are far easier — which is why you see quaternions being used so frequently. Orbital dynamics (especially eccentricity!) is practically trivial. Relativistic dynamics is simple. Moreover, once you've gotten practice with Clifford algebra in the basics, extending to electrodynamics and the Dirac equation are really simple steps. So I think there's a strong case to be made that this would be a useful approach for undergrads. This could all be done using different tools, of course — that's how most of us learned them. But maybe we could do it better, and more consistently. No one is claiming that Clifford algebra is fundamentally new; just that it could be bundled into a neater package, making for easier learning. Try teaching a kid who is struggling with the direction of the curl vector that s/he should really be thinking in terms of the algebra generated by the (recently introduced) vector space, subject only to the condition that the product of a vector with itself is equal to the quadratic form. Or a kid who can't understand Euler angles that this rotation is better understood as a transformation generated (under a two-fold covering) by the even subalgebra of Cl3,0(R). No one here is arguing that that should happen. GA is just a name for a pedagogical approach that makes these lessons a whole lot easier than they would be if you sent the student off to read Bourbaki. Starting off with GA may be slightly harder at the beginning, but pays enormous dividends once you get to harder problems. And once teachers and textbooks get good at explaining GA, even the introduction will be easier."


This is a phenomenal answer to something I've wondered about for so long. Thank you for presenting data!


>There was recently some headline news about an older mathematician that made a significant breakthrough. Other than that one outlier, have there been many important contributions made by people after the age of, say, 45?

There is one other effect beyond ability: people over 45 rarely take up new interests. There are plenty of cases of people who continue working in the same field through their 40s and 50s and continue to make advances, but there are many fewer cases of an individual beginning to work in a field after 40 and going on to make a major discovery. In pure math, possibly the most aesthetic-driven technical field (it is impractical by definition), this effect might be especially strong.

Richard Hamilton (not to be confused with William Rowan Hamilton of action-principle fame) initiated the application of the Ricci flow to the geometrization conjecture in 1982 at 39 and continued his work through the '90s, being credited by Perelman as making crucial contributions to the final solution of the Poincare conjecture in three dimensions. He's probably the most prominent example.

>It is very hard to find an instance of a first-rate mathematician who has abandoned mathematics and attained first-rate distinction in any other field.

Chomsky? Szilard? Maybe even Wolfram?


For me, it was "Erasure Coding in Windows Azure Storage" from Microsoft Research (2016) [0]

The idea that you can achieve the same practical effect of a 3x replication factor in a distributed system, but only increasing the cost of data storage by 1.6x, by leveraging some clever information theory tricks is mind bending to me.

If you're operating a large Ceph cluster, or you're Google/Amazon/Microsoft and you're running GCS/S3/ABS, if you needed 50PB HDDs before, you only need 27PB now (if implementing this).

The cost savings, and environmental impact reduction that this allows for are truly enormous, I'm surprised how little attention this paper has gotten in the wild.

[0] https://www.microsoft.com/en-us/research/wp-content/uploads/...


The primary reason why you should be using 3x or higher replication is the read throughput (which makes it only really relevant for magnetic storage). If the data is replicated 1.6x then there's only 1.6 magnetic disk heads per each file byte. If you replicate it 6x then there's 6 magnetic disk heads for each byte. At ~15x it becomes cheaper to store in SSD with ~1.5x reed-solomon/erasure code overhead since SSD has ~10x the per-byte cost of HDD.

(there are also effects on the tail latency of both read and write, because in a replicated encoding you are less likely to be affected by a single slow drive).

(also, for insane performance which is sometimes needed you can mlock() things into RAM; the per-byte cost of RAM is ~100x the cost of HDD and ~10x the cost of SSD).


Everything you just said is on point, but I think that's an orthogonal thing to what the paper is going for. Hot data should absolutely have a fully-materialized copy at the node where operations are made, and an arbitrary number of readable copies can be materialized for added performance in systems that don't rely on strong consistency as much.

However for cold-data, there really hasn't been (or at least I am unaware of) any system that can achieve the combined durability of 1.5x Reed-Solomon codes + 3x replication, with such a small penalty to storage costs.

Like you said though, it's definitely not the thing you'd be doing for things that prioritize performance as aggressively as the use-cases you've suggested.


~1.5x reed solomon is the default these days, again, unless you need read throughput performance. It is awesome :)

Also, these days the storage of the data doesn't have to be at the same machine that processes the data. A lot of datacenter setups have basically zero transfer cost (or, alternatively, all the within-DC transfer cost is in the CAPEX required to build the DC in the first place), ultra low latency, and essentially unlimited bandwidth for any within-datacenter communication. This doesn't hold for dc1->dc2 communication, in particular it is very very far from the truth in long distance lines.

One way to think about the above is that datacenters have become the new supercomputers of the IBM era - it's free and really fast to exchange data within a single DC.

Also2, this is completely independent of consistency guarantees. At best it relates to durability guarantees, but that I want from all storage solutions. And yes, properly done reed solomon has the same durability guarantees as plain old replicated setup.

Also to the above also2, single-DC solutions are never really durable as the DC can simply burn down or meet some other tragic end, you need geographic replication if your data cannot be accidentally lost without serious consequences (a lot of data actually can be lost, in particular if it is some kind of intermediate data that can be regenerated from the "source" with some engineering effort). This is not just a theoretical concern, I've seen "acts of God" destroy single-DC setups data, ay least partially. It is pretty rare, though.


I'm confused, as you don't seem to be replying to any point I've made...

> ~1.5x reed solomon is the default these days, again, unless you need read throughput performance

I'm not surprised that Reed-Solomon is the "default these days" given that it exists since the 1960's, and that the most widely available and deployed open-source distributed filesystem is HDFS (which uses Reed-Solomon). However I don't see how that is to be taken as a blind endorsement for it, especially given that the paper in reference explicitly compares itself to Reed-Solomon based systems, including concerns regarding reconstruction costs, performance, and reliability.

> Also, these days the storage of the data doesn't have to be at the same machine that processes the data

Even though what you said here is correct, I don't see how that's relevant to the referenced paper, nor do I think I implied that I hold a contrary belief in any way from what I said.

> Also2, this is completely independent of consistency guarantees

My comment about consistency referred only to the fact that you cannot "simply" spin up more replicas to increase read throughput, because consistent reads often have to aqcuire a lock on systems that enforce stronger consistency, so your comments regarding throughput are not universally true, given that there are many systems where reads cannot be made faster this way, as they are bottle-necked by design.

> Properly done Reed-Solomon has the same durability guarantees as plain old replicated setup

This is not true unless the fragments themselves are being replicated across failure domains, which you seem to address with your next comment with "you need geographic replication if your data cannot be accidentally lost without serious consequences". All of this, however, is directly addressed in the paper as well:

> The advantage of erasure coding over simple replication is that it can achieve much higher reliability with the same storage, or it requires much lower storage for the same reliability. The existing systems, however, do not explore alternative erasure coding designs other than Reed-Solomon codes. In this work, we show that, under the same reliability requirement, LRC allows a much more efficient cost and performance tradeoff than Reed-Solomon.


It's not even the reduction in storage costs in this paper that is groundbreaking. They talk about a way to not only reduce storage costs, but optimize for repairs. Repairs are costly at scale and reducing resources where possible: network, cpu, disk reads, etc is ideal.


Indeed: erasure coding is easy (they have been doing it since the 60s). Your real problem is the repair problem.


On this same note I would also suggest some papers which show you can do so much better than simple erasure coding -

[1] Clay Codes - https://www.usenix.org/conference/fast18/presentation/vajha . This paper was also implemented on Ceph and the results are shown in the paper.

and, [2] HeART: improving storage efficiency by exploiting disk-reliability heterogeneity - https://www.usenix.org/conference/fast19/presentation/kadeko... . This paper talks about how just one erasure code is not enough and employing code conversions over the disk-reliability we can get up to 30% savings!


The Google File System (GFS) paper from 2003 mentions erasure codes. Which isn't to say they did it then, but rather that the technique of using erasure coding was known back then. (And surely before GFS too, I just picked it as an example of a large data storage system that used replication and a direct predecessor to the systems you mentioned.)

https://static.googleusercontent.com/media/research.google.c...


CDs (remember those? lol) also implemented Reed-Solomon erasure codes for the stored data, erasure codes in storage systems aren't new at all, and that's not what this paper is about.

I actually found out about this paper because it was referenced in a slide presentation from Google about Colossus (which is the successor to GFS). GFS indeed uses erasure coding with a 1.5x factor, but erasure coding alone does not guarantee durability, and thus needs to be combined with replication to satisfy that requirement, and erasure coding is not the same thing as replication.

The innovation here is explicitly the combination of a new erasure coding algorithm (LRC) AND replication, with a combined storage amplification that is much lower than the previous SOTA.

The paper explicitly compares the new algorithm (LRC) with GFS and other alternatives, and explains why it's better, so this is really not something that is comparable to the 2003 GFS paper in any way (or to any other prior art really), as this is not just a trivial application of erasure coding in a storage system.

There's also this paper [0] from 2001 which digs a bit deeper into the Erasure Codes vs Replication idea that I can recommend if you're interested

[0] http://fireless.cs.cornell.edu/publications/erasure_iptps.pd...


The paper is from 2012, not 2016 (see https://dl.acm.org/doi/10.5555/2342821.2342823)


I think for the major players you mentioned the 2016 paper was retrospective. Everyone was already doing it. Even mid-tier players like Dropbox Magic Pocket were using erasure coding by 2016, and their scheme was mostly written by ex-Google engineers influenced by Colossus.


Oh I am absolutely aware that erasure codes are an old thing, Reed-Solomon codes exist since the 1960's, but this is not simply a trivial application of erasure coding to a storage system: erasure codes alone don't provide the same durability guarantees that replication does. [0]

This is a combination of erasure coding AND replication, whose combined storage amplification is dramatically lower than previous SOTA.

I gave a longer explanation in a sibing comment to yours [1]

[0] http://fireless.cs.cornell.edu/publications/erasure_iptps.pd...

[1] https://news.ycombinator.com/item?id=25351678


Thanks for the clarification. I still think these techniques were somewhat widespread already ... see for example this figure from US Patent 9292389 describing nested, aka layered coding that to my thinking is isomorphic with the "LRC" or "pyramid code" described by the Microsoft researchers.

By the way, not at all trying to say this paper isn't interesting. I keep it in my filing cabinet to show my colleagues when I need to describe this technique, since Google hasn't ever bothered to describe Colossus in a way I can reference.

https://imgur.com/a/gi2Xdl0


Erasure coding was already used in storage tapes in 1985.

> some clever information theory tricks is mind bending to me

It's a pretty trivial first-degree linear function, y = ax + b


And why all the downvotes?


A paper that profoundly influenced my language design: “Programming with Polymorphic Variants” https://caml.inria.fr/pub/papers/garrigue-polymorphic_varian...

And the earlier paper “A Polymorphic Type System for Extensible Records and Variants” https://web.cecs.pdx.edu/~mpj/pubs/96-3.pdf

Row types are magically good: they serve either records or variants (aka sum types aka enums) equally well and both polymorphically. They’re duals. Here’s a diagram.

              Construction                Inspection

    Records   {x:1} : {x:Int}             r.x — r : {x:Int|r}
              [closed]                    [open; note the row variable r]
    
    Variants  ‘Just 1 : <Just Int|v>      case v of ‘Just 0 -> ...
              [open; note the row var v]  v : <Just Int>
                                          [closed]
Neither have to be declared ahead of time, making them a perfect fit in the balance between play and serious work on my programming language.


I love polymorphic records/variants. Variants particularly are amazing for error propagation. Records of course are useful in many of the same places tuples and structs are. The main reluctance I have is whether to allow duplicate entries in records. If you allow them, many things become much easier, but they make records inherently ordered when they weren’t previously


> (...) my programming language.

https://www.unisonweb.org/ ?


Not that, but Unison is one direct inspiration. I wrote a small overview here https://community.inflex.io/t/the-inflex-language/20


Attention Is All You Need

https://arxiv.org/abs/1706.03762

It's from 2017 but I first read it this year. This is the paper that defined the "transformer" architecture for deep neural nets. Over the past few years, transformers have become a more and more common architecture, most notably with GPT-3 but also in other domains besides text generation. The fundamental principle behind the transformer is that it can detect patterns among an O(n) input size without requiring an O(n^2) size neural net.

If you are interested in GPT-3 and want to read something beyond the GPT-3 paper itself, I think this is the best paper to read to get an understanding of this transformer architecture.


“it can detect patterns among an O(n) input size without requiring an O(n^2) size neural net”

This might be misleading, the amount of computation for processing a sequence size N with a vanilla transformer is still N^2. There has been recent work however which has tried to make them scale better.


You raise an important point. The proposed solutions are too many to enumerate, but if I had to pick just one currently I would go for "Rethinking Attention with Performers" [1]. The research into making transformer better for higher dimensional inputs is also moving fast and is worth following.

[1] https://arxiv.org/abs/2009.14794


It's clearly important but I found that paper hard to follow. The discussion in AIMA 4th edition was clearer. (Is there an even better explanation somewhere?)


I found it difficult to read too. Here's an annotated version, with code, which helps:

https://nlp.seas.harvard.edu/2018/04/03/attention.html


It's crazy to me to see what still feel like new developments (come on, it was just 2017!) making their way into mainstream general purpose undergraduate textbooks like AIMA. It's this what getting old feels like? :-\

I start to understand what you always hear from older ICs about having to work to keep up, or else every undergrad coming out will know things you don't.


I would argue that input scaling is not fundamental to Transformers.

Recurrent neural network size is also independent of input sequence length.

The successful removal of inductive bias is really what differentiates this from previous sequence-to-sequence neural networks.


Which inductive bias?


Presumably that the output at step (n) is conditioned only the output of step (n-1).


Three papers stick out for me in the IML / participatory machine learning space this year:

1) Michael, C. J., Acklin, D., & Scheuerman, J. (2020). On interactive machine learning and the potential of cognitive feedback. ArXiv:2003.10365 [Cs]. http://arxiv.org/abs/2003.10365

2) Denton, E., Hanna, A., Amironesei, R., Smart, A., Nicole, H., & Scheuerman, M. K. (2020). Bringing the people back in: Contesting benchmark machine learning datasets. ArXiv:2007.07399 [Cs]. http://arxiv.org/abs/2007.07399

3) Jo, E. S., & Gebru, T. (2020). Lessons from archives: Strategies for collecting sociocultural data in machine learning. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 306–316. https://doi.org/10.1145/3351095.3372829

Also a great read related to IML tooling for audio recognition:

1) Ishibashi, T., Nakao, Y., & Sugano, Y. (2020). Investigating audio data visualization for interactive sound recognition. Proceedings of the 25th International Conference on Intelligent User Interfaces, 67–77. https://doi.org/10.1145/3377325.3377483


These do seem interesting, thanks for sharing.

Also, what do you mean by "participatory" in the context of machine learning? Is there a seminal paper that defines it?

I ask as in HCI and other fields, participatory had a VERY defined meaning that in short, I'd about equal power, democracy, and inclusivity. I can't understand how that applies to ML and would like to learn more, hence asking you.


I think "participatory" means something similar here within an ML context. It favors building community-based algorithmic systems and focuses on lowering the barrier to participation, so that non-expert users can be involved during the machine learning development cycle.

I'm not aware of any seminal papers per-say, although here are a few that I've read recently... first one is something I maintain at $DAYJOB:

1) Halfaker, A., & Geiger, R. S. (2020). Ores: Lowering barriers with participatory machine learning in Wikipedia. ArXiv:1909.05189 [Cs]. http://arxiv.org/abs/1909.05189

2) Martin Jr. , D., Prabhakaran, V., Kuhlberg, J., Smart, A., & Isaac, W. S. (2020). Participatory problem formulation for fairer machine learning through community based system dynamics. ArXiv:2005.07572 [Cs, Stat]. http://arxiv.org/abs/2005.07572

Also checkout PAIR: https://research.google/teams/brain/pair/


Thank you! We have very similar interests! Especially the first one and the interactive sound recognition one. Any other work in IML you'd recommend?


Alot of IML seems to focus on building interfaces, so this one was pretty good:

1) Dudley, J. J., & Kristensson, P. O. (2018). A review of user interface design for interactive machine learning. ACM Transactions on Interactive Intelligent Systems, 8(2), 1–37. https://doi.org/10.1145/3185517


Measuring the predictability of life outcomes with a scientific mass collaboration.

http://www.pnas.org/lookup/doi/10.1073/pnas.1915006117

You might think that it's possible to use machine learning to predict whether people will be successful using established socio-demographic, psychological, and educational metrics. It turns out that it's very hard and simple regression models outperform the fanciest machine learning ideas for this problem.

The way this study was done is also interesting and paves the way for new kinds of collaborative scientific projects that take on big questions. It draws on communities like Kaggle, but applies it to scientific questions not just pure prediction problems.


> simple regression models outperform the fanciest machine learning ideas for this problem

This reminds me of a classic paper: "Improper linear models are those in which the weights of the predictor variables are obtained by some nonoptimal method; for example, they may be obtained on the basis of intuition, derived from simulating a clinical judge's predictions, or set to be equal. This article presents evidence that even such improper linear models are superior to clinical intuition when predicting a numerical criterion from numerical predictors."

Dawes, R. M. (1979). The robust beauty of improper linear models in decision making. American psychologist, 34(7), 571.

https://core.ac.uk/download/pdf/190386677.pdf


Luck/Fortune is so critical.

Genetics and net worth can be blown away by a good or bad group of friends. And unfortunately you start having friends before you are concious enough to realize the impact.


One of my favorites is definitely A Unified Framework for Dopamine Signals across Timescales (https://doi.org/10.1016/j.cell.2020.11.013), simply because of its experimental design. They 'teleported' rats in VR to see how their dopamine neurons responded, to determine whether TD learning explains dopamine signals on both short and long timescales. Short answer: it does.


fyi, TD = Temporal Difference

"Temporal difference (TD) error is a powerful teaching signal in machine learning"



Interesting I'll have to take a look at this paper.


On the Measure of Intelligence, François Chollet [1]

Fellow HNer seems to have liked a lot of ML paper, this is not breaking the trend. This is a great meta paper questioning the goal of the field itself, and proposing ways to formally evaluate intelligence in a computational sense. Chollet is even ambitious enough to propose a proof of concept benchmark! [2] I also like some out of the box methods people tried to get closer to a solution, like this one combining cellular automata and ML [3]

[1] https://arxiv.org/abs/1911.01547 [2] https://github.com/fchollet/ARC [3] https://www.kaggle.com/arsenynerinovsky/cellular-automata-as...


Big fan of Chollet. Really enjoyed the paper.

Also a big fan of Hutter prize. Good AGI is lossless compression.


Meaningful Availability, Hauer et al.: https://www.usenix.org/system/files/nsdi20spring_hauer_prepu...

A good incremental improvement in service level indicator measurements for large-scale cloud services.

Obligatory The Morning Paper post: https://blog.acolyer.org/2020/02/26/meaningful-availability/


Even if not implemented in such a sophisticated manner, "meaningful availability" is a better metric than pure uptime/downtime for most websites.

At one startup we worked at we had availability problems for some time, with the service going down in a semi-predictable manner ~2 times a day (and the proper bugfix a few weeks away). Because once a day the service went down was in the middle of the night with no one on call, pure availability was 80-90%. Given that it was a single country app with no one trying to do any business during the night, meaningful availability was ~99%. Knowing that gave us peace of mind and made tackling the problem a much more relaxed ordeal than the crunch time for a few weeks I've seen at other companies in similar situations.


Keeping CALM: When Distributed Consistency Is Easy

In computing theory, when do you actually need coordination to get consistency? They partition the space into two kinds of algorithm, and show that only one kinds needs coordination.

CACM, 9/2020. https://cacm.acm.org/magazines/2020/9/246941-keeping-calm/fu...


This is excellent stuff, as is all of Peter's (and others) prior work in this area. It's a great step from "you can't do X" (like https://groups.csail.mit.edu/tds/papers/Lynch/MIT-LCS-TM-394..., also a great read) to "ok what can I do then?"

Highly recommended reading.


Not a paper, but a fantastic talk by Samy Bengio "Towards better understanding generalisation in deep learning" at ASRU2019.

Some pretty mind blowing insights - ex: if you replace one layer's weights in a trained classification network with the initialisation weights for the layer (or some intermediate checkpoint as well), many networks show relatively unaffected performance for certain layers ... which is seen as a generalisation since it amounts to parameter reduction. However, if you replace with fresh random weights (although initialisation state is itself another set of random weights), the loss is high! Some layers are more sensitive to this than others in different network architectures.

I recently summarised this to a friend who asked "what's the most important insight in deep learning?" - to which I said - "in a sufficiently high dimensional parameter space, there is always a direction in which you can move to reduce loss". I'm eager to hear other answers to that question here.


Nice. Love the Bengio Bros. Yoshua especially was right there with Geoffrey Hinton, Yann LeCun, Andrew Ng as the earliest pioneers of successful deep learning, for over 20 years. (While most technologists were crazy about this thing called the World Wide Web in the late 1990's, these guys were shaping brain-inspired AI algorithms and representations.)

Anyways, one of the papers by Yoshua that was really influential on my master's thesis was published in 2009, has received 8956 citations to date on Google Scholar, called "Learning Deep Architectures for AI". For many young researchers, even though this paper pre-dates much of the hyped architectures of the current era, I would still recommend it for its timeless views on deep representations as equivalent to architectural units of learning and knowledge, including its breakdown of deep networks as compositions of functions.

"Learning Deep Architectures for AI" by Yoshua Bengio (2009) - https://www.iro.umontreal.ca/~lisa/pointeurs/TR1312.pdf


thats not surprising to me: if we view the weights in the whole network as "individual cells" in a population, and if we pretend that at each update of network weights, before the update each weight undergoes cell division such that one daughter cell / weight is an increase in weight, and the other a decrease, then each component of the gradient descent vector can be viewed as the fitness function for that specific cell or weight: an increase or a decrease. From this perspective each cell forms its own niche in the ecosystem, and its no surprise that replacing a cell with its ancestor is roughly compatible with the final network cells: the symbiosis goes both ways.

The reason for Bengio demonstrating this on a complete layer is obviously to demonstrate that this is NOT due to redundantly routing information to the next layer (think holographically, for robustness). And using non-ancestor random weights illustrates that the ecosystem fitness suffers if redundant / holographic routing is prevented while also using non ancestral cells / weights...


Do you know if a video of the talk is available anywhere? I was able to find the slides here: https://www.dropbox.com/sh/4sat5w5exw288zf/AABTC_j9GkRVEChpn...


Unfortunately no .. and my searches haven't been productive. I was referred to this talk by my professor who attended the conference and I got to see only the slide deck as well. .. but the slide deck is very good and easy to follow.


1) The original MapReduce paper https://static.googleusercontent.com/media/research.google.c...

2) Snowflake and its tiered storage, among other things http://pages.cs.wisc.edu/~yxy/cs839-s20/papers/snowflake.pdf


A Conceptual Introduction to Hamiltonian Monte Carlo (2017) https://arxiv.org/abs/1701.02434


Good paper indeed. And here's a nice blog post with nice visualizations for MCMC and Hamiltonian MC.

https://arogozhnikov.github.io/2016/12/19/markov_chain_monte...


Especially the figures do such a great job of illustrating and clarifying the subject. Wonderful paper!


I'm a big fan of the various "gradual" approaches so this paper really caught my eye.

Gradualizing the Calculus of Inductive Constructions (https://hal.archives-ouvertes.fr/hal-02896776/)

I'm not sure if this is precisely the direction things should go in order to improve the utilisation of specification within software development but it's a very important contribution. As yet my favourite development style has been with F-star but F-star also leaves me a bit in a lurch when the automatic system isn't able to find the answer. Too much hinting in the case of hard proofs.

Eventually there will be a system that lets you turn the crank up on specification late in the game, allows lots of the assertions to be discharged automatically, and then finally saddles you with the remaining proof obligations in a powerful proof assistant.


https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7062204/

Chen, Y.W.; Yiu, C.B.; Wong, K.Y. Prediction of the SARS-CoV-2 (2019-nCoV) 3C-like protease (3CL (pro)) structure: Virtual screening reveals velpatasvir, ledipasvir, and other drug repurposing candidates. F1000Research 2020, 9, 129.

This paper (based on a machine learning-driven open source drug docking tool from Scripps Institute) from Feb/Mar formed the basis for the agriceutical venture I started for supporting pandemic management in Africa. We’re in late stage trialing talks with research institutes here in East Africa.

https://www.emske-phytochem.com


I'm a former biophysicist bumbling my way into distributed systems; was learning rust and bumped into Frank McSherry's blog posts.

Thought the Naiad project is really cool!

https://cs.stanford.edu/~matei/courses/2015/6.S897/readings/...


But At What COST?!


Murray S. Davis: "That's Interesting!: Towards a Phenomenology of Sociology and a Sociology of Phenomenology" https://proseminarcrossnationalstudies.files.wordpress.com/2...

An interesting (no pun intended) paper on what makes papers (or anything in general) interesting.


I recently made public a personal project https://42papers.com to surface the top trending papers to read.


"Wait, There's Torture in Zootopia?: Examining the Prevalence of Torture in Popular Movies", 2019. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3342908


One that came across my desk this year was the Archaic Ghost Introgression paper (https://advances.sciencemag.org/content/6/7/eaax5097), which established genetic contribution from an unknown archaic species in modern West African populations. It's notable not only because of the cool findings, but also because the paper is a culmination of a whole number of broader advances.


Fresh today: Chromosome-scale, haplotype-resolved assembly of human genomes ;)

https://www.nature.com/articles/s41587-020-0711-0


Yes, this is a big step forward! It's nice to not need to sequence genomes of both parents in order to resolve haplotypes.


Scott Aaron son’s paper “The Busy Beaver Frontier”. https://www.scottaaronson.com/papers/bb.pdf

It’s fairly accessible to anyone who vaguely remembers their CS theory, and quite fun!


I have to admit to skim reading, but, Finding and Understanding Bugs in C Compilers, by Yang, Chen, Eide, and Regehr, 2011. (Yes it's from 9 years ago.) It's an interesting and approachable read if you're into programming languages and compilers.

https://www.cs.utah.edu/~regehr/papers/pldi11-preprint.pdf


Kleppmann, et al.'s paper on OpSets [1], a specification for building CRDTs with support for an atomic tree move operation, was the best one for me.

Automerge [2] implements a variant of this.

[1] https://arxiv.org/abs/1805.04263

[2] https://github.com/automerge/automerge


As a meta observation, its fascinating how few of these are computer related.

Which is actually great because it gives me something to read on subjects im not familar with.


Abstraction has made Programming a hybrid of Tradition/Authority/Science/Art.

It's nearly impossible to have a scientific paper on anything with abstraction. At best you can create some "after the fact" optimizations using time studies and statistics.


I don't mean programming literally (although there are plenty of papers on programming abstractions, especially in the functional world) but just computer subjects in general whether that is cryptography, complexity theory, hardware or something else.


Discovering Symbolic Models from Deep Learning with Inductive Biases [1] trains graph neural nets on astrophysical phenomena and then performs symbolic regression to generate algebraic formulae to elegantly model the phenomena in a classical physics framework. It's largely gone under the radar but has pretty interesting implications for NLP and language theory in my opinion.

Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures [2] applies DFA, an approach to training neural nets without backprop, to modern architectures like the Transformer. It does surprisingly well and is a step in the right direction for biologically plausible neural nets as well as potentially significant efficiency gains.

Hopfield Networks is All You Need [3] analyzes the Transformer architecture as the classical Hopfield Network. This one got a lot of buzz on HN so I won't talk about it too much, but it's part of a slew of other analyses of the Transformer that basically show how generalizable the attention mechanism is. It also sorta confirms many researchers' inkling that Transformers are likely just memorizing patterns in their training corpus.

Edit: Adding a few interesting older NLP papers that I came across this year.

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding [4]

Do Syntax Trees Help Pre-trained Transformers Extract Information? [5]

Learning to Compose Neural Networks for Question Answering [6]

Parsing with Compositional Vector Grammars [7]

[1] https://arxiv.org/abs/2006.11287

[2] https://arxiv.org/abs/2006.12878

[3] https://arxiv.org/abs/2008.02217

[4] https://arxiv.org/abs/1908.04577

[5] https://arxiv.org/abs/2008.09084

[6] https://arxiv.org/abs/1601.01705

[7] https://www.aclweb.org/anthology/P13-1045/


I yeaaaarn for a future where we will have a general parallelizable training method for neural networks (or even better a principled way to initialize trained weights like the work being done on wavelet scattering). Long training times with backpropagation is a serious obstacle when doing experiments. I had hoped DFA would be it but it doesn't work for image tasks sadly.


I only started reading papers this year and have only read two: An empirical study of Rust (found this on 4chan of all places), and a study of local-first software (found right here on HN). The latter truly got me thinking about the cloud services I was using and how your work really isn't yours if it's stored faraway on some cloud servers and not on your local machine. The introduction to Conflict-free Replicated Data Types (CRDTs) was excellent as well.


Almost 2020. MuZero from DeepMind was a pretty amazing breakthrough. Single algorithm that can play Atari games, chess, go (and a variety of other board games) with super human ability.

https://deepmind.com/research/publications/Mastering-Atari-G...

It felt like a baby step towards general intelligence.


Snap: a Microkernel Approach to Host Networking

https://research.google/pubs/pub48630/


This year discovery of Spinosaurus' tail changed significant the picture of that specie's look and behaviour.

https://www.nationalgeographic.com/science/2020/04/first-spi...


What's Wrong with Computational Notebooks? Pain Points, Needs, and Design Opportunities https://web.eecs.utk.edu/~azh/pubs/Chattopadhyay2020CHI_Note...


Aren't you one of the authors of that paper? It was a good read. Have a look at https://iko.ai, it solves not only many of the pain points you wrote about, but the "critical" pain points (difficult and important).

- No-setup collaborative notebooks: near real-time editing on the same notebook. Large Docker images.

- Long-running notebook scheduling: you can schedule a notebook right from the JupyterLab interface and continue to work on your notebook without context switch. The notebook will run and you can view its state even if you close your browser or get disconnected, and you can view it without opening the JupyterLab interface, even on your mobile phone. https://iko.ai/docs/notebook/#long-running-notebooks

- Automatic experiment tracking: iko.ai automatically detects parameters, metrics, and models and saves them. Users don't have to remember or know how to write boilerplate code or experiment tracking code.

- One click parametrization: you can publish an AppBook to enable other people to run your automatically parametrized notebook without being overwhelmed by the interface. You don't have to use cell tags or metadata to specify parameters. You click a button and an application is created from your notebook. The runs of this application are also logged as experiments in case they generate a better model. https://iko.ai/docs/appbook/

- Easy deployment: people can look at the leaderboard, and click on a button to deploy the model they choose into a "REST API" endpoint. They'll be able to invoke it with a POST request or from a form where they simply upload or enter data and get predictions.

We haven't focused on the stylesheets given that in our previous projects with actual paying enterprise customers, it wasn't CSS that held us back.

Here's an invite link that's valid multiple times: https://iko.ai/invite/lEMzE_hKwJ2SUbfLdnK7SbZb1c3zUCOAQexakL...


I haven't exactly read many this year, but I really liked "An Answer to the Bose-Nelson Sorting Problem for 11 and 12 Channels" [1]. It describes many interesting algorithmic tricks to establish a lower bound for an easy to understand problem. Not exactly immediately practical, but still very interesting.

Note that it has been published on arXiv just yesterday; I helped review an earlier draft.

[1]: https://arxiv.org/abs/2012.04400


https://t.co/EuRuVazuKk

Erik Hoel in this paper offers an audacious hypothesis: Our brain, during the its evolution, has developed dreams as a way to solve over-fitting

Since we’re learning from a limited samples of data in the real world, chances of overfitting (I call it judgement) goes higher. In ML we inject randomness and noise to avoid overfitting. Hoel theory can explain why our dreams are so sparse & hallucinatory


35 EGGS PER DAY IN THE TREATMENT OF SEVERE BURNS (1975)

https://www.jprasurg.com/article/0007-1226(75)90127-7/pdf

Great read. Note if you're not going to read it that you yourself should not eat 35 eggs per day because these patients had calorie requirements of a little under 7000.


Difficult to choose one this year (plenty of things happened + plenty of time to read things due to lock down).

"Equality of Opportunity in Supervised Learning" (https://arxiv.org/abs/1610.02413)

It explain the basic concept about fairness in ML. Very practical exemple in my domain knowledge that show the trade-off between fairness of an algo and overall performance (money). Really make you see what may go wrong with bias in ML. It shows, in my opinion, why we will have to regulate ML as corporation aren't really incentivized to deal with fairness. It also shows that there is different notions of fairness. So there will always be something that feel unfair and also doing something can always be interpreted as positive discrimination.


https://web.stanford.edu/group/dlab/media/papers/chenNBT2020...

Deep brain optogenetics without intracranial surgery

"Achieving temporally precise, noninvasive control over specific neural cell types in the deep brain would advance the study of nervous system function. Here we use the potent channelrhodopsin ChRmine to achieve transcranial photoactivation of defined neural circuits, including midbrain and brainstem structures, at unprecedented depths of up to 7 mm with millisecond precision. Using systemic viral delivery of ChRmine, we demonstrate behavioral modulation without surgery, enabling implant-free deep brain optogenetics."


Very interesting but only on/off stimulation is possible, there is no spatial resolution.


7mm is crazy deep.


Propagation Networks: A Flexible and Expressive Substrate for Computation

https://groups.csail.mit.edu/genesis/papers/radul%202009.pdf


After all the discussion over masks it was nice to see an actual study done on them.

https://www.acpjournals.org/doi/10.7326/M20-6817


“Brain over body”–A study on the willful regulation of autonomic function during cold exposure.

https://booksc.xyz/book/68207697/475742


DBOS: A Proposal for a Data-Centric Operating System https://arxiv.org/abs/2007.11112

Even if a bit impractical in some regrards, I think an operating system/cloud that you interact with like a database is something we should aspirationally strive for. We're spending too much time gluing things together and not enough time being productive. Databases are great at tracking and describing resources (much better than YAML) and stored procedures that are like Lambdas would be neat.


DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning https://arxiv.org/abs/2006.08381

A killer paper presenting an algorithm capable of inductive learning. ("DreamCoder solves both classic inductive programming tasks and creative tasks such as drawing pictures and building scenes. It rediscovers the basics of modern functional programming, vector algebra and classical physics, including Newton's and Coulomb's laws.")


This year is the first year I actually started reading some papers about Computer Graphics. This old paper by Ken Perlin from 1985 "An Image Synthesizer" Inspired me a lot. It really showed me that if you have a real deep understanding of some basic principles like the sin function. you can create beautiful things.

http://www.heathershrewsbury.com/dreu2010/wp-content/uploads...


I only properly read the Lottery Ticket Hypothesis paper[1] properly at the start of 2020.

I think it's going to be years before we understand this properly, but in 2020 we are beginning to see practical uses.

At the moment I think it's a toss up: it's either going to be a curiosity that people read, have their mind exploded but can't do anything with or else it's a good chance to be the most influential deep learning paper of the decade.

[1] https://arxiv.org/abs/1803.03635


This one is from 2008. There's this method from statistics called PCA that let's you reduce high dimensional data into a few (usually meaningless) newly-fabricated dimensions. It's useful to visualize complex data in 2d space.

In this paper, they did that with genes. And the 2d space that was left wasn't meaningless at all. It accurately recreated map of Europe.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2735096/


A Mathematical Model For Meat Cooking (2019): https://arxiv.org/pdf/1908.10787.pdf



Resynthesizing behavior through phylogenetic refinement (https://link.springer.com/article/10.3758/s13414-019-01760-1)

No one knows what attention is (https://link.springer.com/article/10.3758/s13414-019-01846-w)


Out of topic, but could you suggest a good resource for papers? I am interested in software mostly


Here are a couple:

- Papers We Love: https://paperswelove.org/

- The Morning Paper: https://blog.acolyer.org/

[edit: formatting]


"Papers We Love" has a list of other places to find papers:

https://github.com/papers-we-love/papers-we-love#other-good-...


i'n not an expert and only started reading papers this year but this is where i got a few papers that I read from: - 2minutepapers on youtube - mostly ML and CG - Google scholar for finding specific things. - The shadertoy discord channel. People reference to CG papers a lot there.

The last thing is that I save every paper that i see mentioned as a pdf file in a seperate folder on my PC. I use a combination of rip grep all + recoll to search them.


I found this scientific paper interesting, even though it took me a long time to understand the contents.

Molecular repertoire of Deinococcus radiodurans after 1 year of exposure outside the International Space Station within the Tanpopo mission: https://microbiomejournal.biomedcentral.com/articles/10.1186...


A Security Kernel Based on the Lambda Calculus from 1996 (http://mumble.net/~jar/pubs/secureos/secureos.html)

I've been reading up on the object capability security model a lot recently, and was pointed to this paper... I was hooked. A really compelling security model almost from first principles.


From this year I really liked the paper, Recovering Purity with Comonads and Capabilities, Vikraman Chaudhury and Neel Krishnaswami,

https://arxiv.org/abs/1907.07283


MDMA Increases Cooperation and Recruitment of Social Brain Areas When Playing Trustworthy Players in an Iterated Prisoner's Dilemma

https://www.jneurosci.org/content/39/2/307

Abstract ran through a text optimizer:

We administered 100 mg MDMA or placebo to 20 male participants in a double-blind, placebo-controlled, crossover study.

Cooperation with trustworthy, but not untrustworthy, opponents was enhanced following MDMA but not placebo.

Specifically, MDMA enhanced recovery from, but not the impact of, breaches in cooperation.

During trial outcome, MDMA increased activation of four clusters incorporating precentral and supramarginal, gyri, superior temporal cortex, central operculum/posterior insula, and supplementary motor area.

MDMA increased cooperative behavior when playing trustworthy opponents.

Our findings highlight the context-specific nature of MDMA's effect on social decision-making.

While breaches of trustworthy behavior have a similar impact following administration of MDMA compared with placebo, MDMA facilitates a greater recovery from these breaches of trust.


What text optimizer do you use ? I find it somewhat easier to read than the original abstract.


"Hypoxic radiosensitization: Adored and Ignored"

https://pdfs.semanticscholar.org/c26b/4d3156b0c526d16c891ce7...

>"three of the four most cited papers in the journals deal with hypoxia [...] yet its routine clinical use is very limited."


Network topology design at 27,000 km/hour (2017) by Debopam Bhattacherjee, Ankit Singla

https://people.inf.ethz.ch/asingla/papers/conext19.pdf


Making Kin with the Machines, https://jods.mitpress.mit.edu/pub/lewis-arista-pechawis-kite...

'What if we treated AI as equals, like other human beings, not as tools or, worse, slaves to their creators?' That's the premise to this paper, which is a wonderful provocation. It's a really important consideration too, when you consider how many of our decisions we're asking machine sentience to make for us. If algorithmic bias were a human judge, they'd be thrown out of court (you'd hope).


This study is extraordinary in terms of the extent to which it reveals just how little we fully understand about what is actually taking place in the genetic process. Quote from the end of the paper's introduction section: "In particular, the demonstration of highly efficient splicing in mammals in the absence of transcriptional pausing causes us to rethink key features of splicing regulation." https://www.biorxiv.org/content/10.1101/2020.02.11.944595v1....


No second thought about it, stylegan2[1] takes the cake.

[1] https://arxiv.org/abs/1912.04958


Probably few people interested in the subject matter here, but as a piece of gentle snark I found this wonderful:

https://www.researchgate.net/publication/342317256_A_systema...


Not really a football fan... care to explain?


Tactical Periodization is a training program credited for the success of at least one star. This paper says, in short, there are no scientific studies proving it works. We'd give it the benefit of the doubt and wait for proof, but it's been 20 years and still nothing.


Learning Representations by Back-propagating Errors http://www.cs.toronto.edu/~hinton/absps/naturebp.pdf

-- because it was relatively straightforward to understand and convert to code. So it helped me understand backprop.


Monte Carlo Geometry Processing: A Grid-Free Approach to PDE-Based Methods on Volumetric Domains https://www.cs.cmu.edu/~kmcrane/Projects/MonteCarloGeometryP...


I was fascinated by this one: "The Case for a Learned Sorting Algorithm" i.e. invest in a little ML to sort faster afterwards ... https://dl.acm.org/doi/10.1145/3318464.3389752



Awesome robotics paper from Deepmind.

Towards General and Autonomous Learning of Core Skills: A Case Study in Locomotion: https://arxiv.org/abs/2008.12228


It's an old paper by now and probably niche but was very useful to me this year.

http://winsh.me/papers/erlang_workshop_2013.pdf



https://arxiv.org/pdf/2006.08381.pdf A take on Josh Tenenbaum's hard problem of learning


Implicit Neural Representations with Periodic Activation Functions

aka SIREN: https://vsitzmann.github.io/siren/



Tree Notation: an antifragile program notation (2017)


Hey, let's get some interesting humanities papers into the mix, since thanks to COVID I had a lot of extra reading time and "best" is purely subjective:

"Erotic Modesty: (Ad)dressing Female Sexuality and Propriety in Open and Closed Drawers, USA, 1800–1930" https://onlinelibrary.wiley.com/doi/abs/10.1111/1468-0424.00...

When JAmes C. Scott wrote about infrapolitcs in his 1990 work "Domination and the Arts of Resistance: Hidden Transcripts" (https://www.jstor.org/stable/j.ctt1np6zz) and described it as a sort of political resistance that never declares itself and remains beneath what the dominant group can properly perceive until the power shift actually starts to happen, he probably didn't think of a case where the undeclared politics is so literally something not meant to be seen. The theme is very much a progression of how slowly women were able to establish even what parts of their body can be sexualized or not sexualized, and culminates in a sudden burst, or power shift, in the 1910s-1930s after centuries of aggregating individual choices and entirely unseen acts. This particular revolution managed to happen almost entirely outside of organization and public view and while it's by no means over, the progress made in the last twenty years covered in the paper really show how productive the aggregation of individual acts of resistance, done without any open plans, can still bring about so much change. It also showed the limits of such movements, particularly whe the dominant group rhas an active interest in preserving that status quo.

"How Qualified Immunity Fails" https://scholarship.law.nd.edu/cgi/viewcontent.cgi?article=4...

and "The Case Against Qualified Immunity" https://scholarship.law.nd.edu/cgi/viewcontent.cgi?article=4...

These two were both written by UCLA Law professor Johanna Schwartz over the course of about a year and half from 2017-2018, and really got a lot of attention this year when a lot of people for the first time asked "why does it seem impossible to actually hold abusive police to some degree of personal responsibility?" Having worked at a public defender's office and then on federal CJA cases (essentially federal defense work when there is more than one codefendant and the federal defenders would have a conflict of interest defending both), the abusive nature of policing was very much something that I saw constantly for years but it's difficult to quantify just how little potential consequence a police officer may actually face because nobody had done the shoeleather work to collect the data, and police departments tend to have opacity written into their contracts. The actual data collected by Schwartz demonstrating how the multiple layers of shielding negotiated into police contracts and just how much indemnification, which is actually illegal in many jurisdictions but universally ignored, pushes any potential liability onto taxpayers directly, creating a situation where victims' taxes are just getting looped back into the settlements they receive. There are a lot of problems in the criminal justice and really any carceral system this country runs, and most of it are poorly documented on a systemic level and difficult to quantify. It's nice to see that someone put in the work to make the picture a little clearer, as practitioners tend to be entirely focused on their clients to do research like this and this is a particularly unglamorous field of research.


Unix shell command languages by Ken Thompson


Wow, what a read! I hadn't heard of this paper before and it's a treat.

https://susam.github.io/tucl/the-unix-command-language.html


Not so much a classic but relevant to today:

MMR vaccine could protect against COVID-19

https://mbio.asm.org/content/11/6/e02628-20?_ga=2.139230451....



Note: requires a login. Using Google will request access to your contacts. Ugh.


They make it seem like it requires a login, but you can just scroll down to see the paper.


Some CogSci & Neuro papers I found interesting in 2020:

Constantinescu, Alexandra O., Jill X. O’Reilly, and Timothy EJ Behrens. "Organizing conceptual knowledge in humans with a gridlike code." Science 352.6292 (2016): 1464-1468.

Kriegeskorte, Nikolaus, and Katherine R. Storrs. "Grid cells for conceptual spaces?." Neuron 92.2 (2016): 280-284.

Klukas, Mirko, Marcus Lewis, and Ila Fiete. "Efficient and flexible representation of higher-dimensional cognitive variables with grid cells." PLOS Computational Biology 16.4 (2020): e1007796.

Moser, May-Britt, David C. Rowland, and Edvard I. Moser. "Place cells, grid cells, and memory." Cold Spring Harbor perspectives in biology 7.2 (2015): a021808.

Quiroga, Rodrigo Quian. "Concept cells: the building blocks of declarative memory functions." Nature Reviews Neuroscience 13.8 (2012): 587-597.

Stachenfeld, Kimberly L., Matthew M. Botvinick, and Samuel J. Gershman. "The hippocampus as a predictive map." Nature neuroscience 20.11 (2017): 1643.

Buzsáki, György, and David Tingley. "Space and time: The hippocampus as a sequence generator." Trends in cognitive sciences 22.10 (2018): 853-869.

Umbach, Gray, et al. "Time cells in the human hippocampus and entorhinal cortex support episodic memory." bioRxiv (2020).

Eichenbaum, Howard. "On the integration of space, time, and memory." Neuron 95.5 (2017): 1007-1018.

Schiller, Daniela, et al. "Memory and space: towards an understanding of the cognitive map." Journal of Neuroscience 35.41 (2015): 13904-13911.

Rolls, Edmund T., and Alessandro Treves. "The neuronal encoding of information in the brain." Progress in neurobiology 95.3 (2011): 448-490.

Fischer, Lukas F., et al. "Representation of visual landmarks in retrosplenial cortex." Elife 9 (2020): e51458.

Hebart, Martin, et al. "Revealing the multidimensional mental representations of natural objects underlying human similarity judgments." (2020).

Ezzyat, Youssef, and Lila Davachi. "Similarity breeds proximity: pattern similarity within and across contexts is related to later mnemonic judgments of temporal proximity." Neuron 81.5 (2014): 1179-1189.

Seger, Carol A., and Earl K. Miller. "Category learning in the brain." Annual review of neuroscience 33 (2010): 203-219.

Neurolinguistics:

Marcus, Gary F. "Evolution, memory, and the nature of syntactic representation." Birdsong, speech, and language: Exploring the evolution of mind and brain 27 (2013).

Dehaene, Stanislas, et al. "The neural representation of sequences: from transition probabilities to algebraic patterns and linguistic trees." Neuron 88.1 (2015): 2-19.

Fujita, Koji. "On the parallel evolution of syntax and lexicon: A Merge-only view." Journal of Neurolinguistics 43 (2017): 178-192.


Nice to see someone found Bird song book interesting.

Here's a cool paper on bird song during lockdown.

Derryberry, Elizabeth P., et al. Singing in a silent spring: Birds respond to a half-century soundscape reversion during the COVID-19 shutdown (2020) (DOI: 10.1126/science.abd5777)


Thanks for putting this list together! I took some cog sci courses in college and have been meaning to dive more into the research around it and these papers seem like a good place to start. I expect to run into lots of jargon and concepts I don't understand. Would it be possible for me to reach out to you for questions when I'm unable to make sense of the content after having researched the unknown concepts online?


I could help if you are having trouble in animal cognition concepts. Let me know.


Awesome, how would I reach you?


my email is prateek6289 at gmail dot com


sure, my email is in my profile.

I'm not an expert though, just curious about "mind computations".


Thanks :). I saved your email


also on neurolinguistics:

Pulvermüller, Friedemann. "Words in the brain's language." Behavioral and brain sciences 22.2 (1999): 253-279.

Pulvermüller, Friedemann. "Brain embodiment of syntax and grammar: Discrete combinatorial mechanisms spelt out in neuronal circuits." Brain and language 112.3 (2010): 167-179.

Buzsáki, György. "Neural syntax: cell assemblies, synapsembles, and readers." Neuron 68.3 (2010): 362-385.

Lau, Ellen F., Colin Phillips, and David Poeppel. "A cortical network for semantics:(de) constructing the N400." Nature Reviews Neuroscience 9.12 (2008): 920-933.

-->On cognitive maps, spacial & abstract navigation:

Bellmund, Jacob LS, et al. "Navigating cognition: Spatial codes for human thinking." Science 362.6415 (2018).

Peer, Michael, et al. "Processing of different spatial scales in the human brain." ELife 8 (2019): e47492.

Kriegeskorte, Nikolaus, and Rogier A. Kievit. "Representational geometry: integrating cognition, computation, and the brain." Trends in cognitive sciences 17.8 (2013): 401-412.

Mok, Robert M., and Bradley C. Love. "A non-spatial account of place and grid cells based on clustering models of concept learning." Nature communications 10.1 (2019): 1-9.

Chrastil, Elizabeth R., and William H. Warren. "From cognitive maps to cognitive graphs." PloS one 9.11 (2014): e112544.

-->On graph navigation and 'rich club' networks:

Watts, Duncan J., and Steven H. Strogatz. "Collective dynamics of ‘small-world’networks." nature 393.6684 (1998): 440-442.

Kleinberg, Jon M. "Navigation in a small world." Nature 406.6798 (2000): 845-845.

Ball, Gareth, et al. "Rich-club organization of the newborn human brain." Proceedings of the National Academy of Sciences 111.20 (2014): 7456-7461.

Malkov, Yury A., and Alexander Ponomarenko. "Growing homophilic networks are natural navigable small worlds." PloS one 11.6 (2016): e0158162.

Givoni, Inmar, Clement Chung, and Brendan J. Frey. "Hierarchical affinity propagation." arXiv preprint arXiv:1202.3722 (2012).

--> On concept formation, memory & generalization

Bowman, Caitlin R., and Dagmar Zeithamova. "Abstract memory representations in the ventromedial prefrontal cortex and hippocampus support concept generalization." Journal of Neuroscience 38.10 (2018): 2605-2614.

Garvert, Mona M., Raymond J. Dolan, and Timothy EJ Behrens. "A map of abstract relational knowledge in the human hippocampal–entorhinal cortex." Elife 6 (2017): e17086.

Collin, Silvy HP, Branka Milivojevic, and Christian F. Doeller. "Hippocampal hierarchical networks for space, time, and memory." Current opinion in behavioral sciences 17 (2017): 71-76.

Kumaran, Dharshan, et al. "Tracking the emergence of conceptual knowledge during human decision making." Neuron 63.6 (2009): 889-901.

DeVito, Loren M., et al. "Prefrontal cortex: role in acquisition of overlapping associations and transitive inference." Learning & Memory 17.3 (2010): 161-167.

Gallistel, Charles Randy, and Louis D. Matzel. "The neuroscience of learning: beyond the Hebbian synapse." Annual review of psychology 64 (2013): 169-200.

Martin, A., and W. K. Simmons. "Structural Basis of semantic memory." Learning and Memory: A Comprehensive Reference. Elsevier, 2007. 113-130.

Zeithamova, Dagmar, Margaret L. Schlichting, and Alison R. Preston. "The hippocampus and inferential reasoning: building memories to navigate future decisions." Frontiers in human neuroscience 6 (2012): 70.

Tenenbaum, Joshua B., and Thomas L. Griffiths. "Generalization, similarity, and Bayesian inference." Behavioral and brain sciences 24.4 (2001): 629.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: