Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
An automatic theorem proving project (gowers.wordpress.com)
163 points by ColinWright on April 28, 2022 | hide | past | favorite | 76 comments


For context for those outside of mathematics, Gowers is a preeminent mathematician who won a fields medal for his work in functional analysis. He has advocated and worked towards computer-generated mathematics for a long time, mostly along approaches roughly in line with this one. In particular, his interest in this topic predates the deep learning revolution (which I'll say began with alexnet, but wherever you date it the claim holds).


This is the opposite direction to what I expected to see for new theorem proving efforts: "Good old fashioned AI" vs leveraging advances of deep learning for natural language.

I think the direction that is ripe for advance is a two-part process: (1) translating from human-language mathematics proof to computer-verifiable formal proof language combine with (2) GPT3-style automated generation of plausible next-steps in human language proofs.

Gowers emphasizes the success of humans at "pruning" to only consider potentially useful next steps in a proof. I think GPT3 demonstrates such pruning in generating plausible natural language text continuations.

And the success of natural language translation between natural languages suggests natural->formal language translation may also be ripe.

Combined with the success of formal language automated theorem provers, it seems plausible to build on current technologies a stack that produces formally verified natural language proofs.


I don't think Gowers is trying to build a better theorem prover.

An AI researcher would be trying to build a better prover.

Gowers is a mathematician looking for mathematical insight.

This project's success would be based on how well it answers those fundamental questions he posed (i.e How does a human generate proofs, given no algorithm exists to solve the halting problem?)

The output might be more like meta mathematics, with insights provided by software, rather than ML software optimised for a goal.


> How does a human generate proofs, given no algorithm exists to solve the halting problem?

To be fair, this has some trivial answers -- despite being a fascinating question.

For example, given a formal proof verification system (e.g. Lean), you can search for proofs (either by brute force or hopefully using heuristics) and if a proof exists, you will find it after some finite time. The problem is that sometimes no proof exists. You may be thinking of getting "smart", and also trying to prove no proof exists in parallel. But that's just another proof, such that this recursion tower (proof that (proof that (... proof doesn't exist)) doesn't resolve. But I think overall this mostly means you should be able to accept non-existence or arbitrary proving times of proofs (maybe also a sociological problem for mathematicians!). That's of course how human mathematics works: no one now if any famous theorem (e.g. Riemann hypothesis) will be proven within some time frame, or maybe never.


They can still use ML to prune the search space, I suppose?


For a mathematician, it's not the output of pruning the search space, but understanding how the search space is pruned.

Understanding how ML models work is a notorious problem.

A classical algorithm can be formally understood better - I think it's this mathematical understanding Gowers is looking for.


But isn't that how humans do it too? They have a hunch, pursue it, and if it doesn't work out, they backtrack. Of course, some people have better hunches than others, but that doesn't change the global idea.


> They have a hunch, pursue it

There's a lot to unpack in just this.

Can we define an algorithm to generate hunches? (mathematical intuition).

Given infinite hunches, can we define an algorithm to order hunches by priority, to search?

That's the kind of question they'll be thinking about using the symbolic logic/classical algorithms in GOFAI approaches.


...and they largely don't know where those hunches came from.


...well, quite. But that's not really the "move along, nothing to see here" it first sounds like.

"Hunch" is a label we use for "I don't know explicitly how I came up with this".

Generally, when we have a label for a concept the description of which starts with "I don't know...", some people try and follow up with questions: is that a thing it is possible, in principle, to know? How could we find out for certain? If it's knowable, how do we get to know what it is?

Sometimes, they succeed, and generations of such successes is how we end up with the modern world.


GPT3 style automated generation of plausible next steps in human language proofs is a disaster waiting to happen. GPT3 generates vaguely plausible text sequences without understanding the material. It relies heavily on the imprecision of language and the fact that there are many plausible words in sentences. It doesn't even perfectly capture the rules of grammar as it sometimes makes mistakes of a grammatical nature.

Consider that mathematics requires greater precision in that the language has more exacting meanings and fewer plausible alternatives. Also consider that the bar to doing something useful in mathematics is extremely high. We're not trying to GPT3 a plausible sentence now, we're trying to guide GPT3 to producing the complete works of shakespeare.

GPT3 demonstrates a kind of pruning for generating viable text in natural language continuations but I'd argue it is nothing like pruning useful next steps of a proof. The pruning in GPT3 works as a probability model and is derived from a good data set of human utterances. Generating a good dataset of plausible and implausible next steps in mathematical proofs is a much harder problem. The cost per instance is extremely high as all of the proofs have to be translated into a specific precise formal language (otherwise you explode the search space to be any plausible utterance in some form of English+Math Symbols making the problem much harder). Even worse, different theorem provers want to use different formal languages making the reusability of the data set less than typical in ML problems. The dataset is also far smaller. How many interesting proofs in mathematics are at a suitable depth from the initialization of a theorem prover with just some basic axioms? Even if you solve the dataset problem though there are further problems. GPT3 isn't designed to evaluate the interestingness of a sentence, only the plausibility with hopes that the context in which the sentence is generated provides enough relevance.

In short, I'm highly skeptical that benefits in natural language translation will translate to formal languages. I'd also argue the problems you classify into "formal language translation" aren't even translation problems.

I also think very few people see the technologies you've mentioned as related (for good reason) and I think a program that attempts to build on them is likely to fail.


Fair enough to be skeptical. Some responses to your points:

> the bar to doing something useful in mathematics is extremely high

Ah but the bar to do something interesting in automated theorem proving is much lower. Solving exercises from an advanced undergraduate class involving proofs would already be of interest.

> Generating a good dataset of plausible and implausible next steps in mathematical proofs is a much harder problem.

There are thousands of textbooks, monographs, and research mathematical journals. There really is a gigantic corpus of natural language mathematical proofs to study.

In graduate school there were a bunch of homework proofs which the professors would describe as "follow your nose" : after you make an initial step in the right direction the remaining steps followed the kind of pattern that quickly becomes familiar. I think it is very plausible that a GPT3 style system trained on mathematical writing could learn these "follow your nose" patterns.

> problems you classify into "formal language translation" aren't even translation problems

Fair. Going from natural language proofs like from a textbook to a formal language like automatic theorem provers use has similarities to a natural language translation problem but it would be fair to say that this is its own category of problem.


I agree there might be some sort of translation problem that would partially automate the cost of converting all these examples in textbooks, monographs, and research journals from pseudocode in English+Mathematics into the correct formal logic statements. I think this is an interesting and complex problem that could make managing the cost of a dataset manageable. It still comes with a problem that most of these sources start from very far past the axioms so in order to use them you need formal language proofs for each of the things they assert without proof.

I question whether you'd get high enough accuracy out of a pattern matching type model like GPT3 that occasionally chooses an unusual or unexpected word. Given how frequently translating A->B->A yields A* instead of A with GPT3 I wonder if we are actually successfully capturing the precise mathematical statements.


I’m also skeptical of the approach here but for the opposite reason. ML will conquer proofs by an AlphaZero approach of ignoring all human training data. You can generate arbitrary problems easily enough and doing some GAN like system of both generating problems that yet can’t be solved and a solver to solve them seems to obviate the need for human training data. I’d be really hesitant to be a PhD student in the GOFAI or natural language datamining approaches since I think this could easily be solved by ML in the next five or ten years by ML (specifically at least one well known unsolved research problem being solved by AI). I hope that I’m wrong… I like math as a human endeavor.


You can generate arbitrary problems easily enough

The problem with generating arbitrary math problems is that you can generate an infinite number of boring, pointless problems. For example, prove that there is a solution to the system of equations x + 2y + 135798230 > 2x + y + 123532 and x - y < 234. Training an AI system how to solve these problems doesn't do anything.

I think we are in a stage for mathematics similar to where solving Go was in the 80's. For Go back then we didn't even have the right logical structure of the algorithm, we hadn't invented Monte Carlo tree search yet. Once a good underlying structure was found, and we added on AI heuristics, then Go was solvable by AIs. For mathematics I think we also need multiple innovations to get there from here.


Such an "AlphaZero approach" will only knock Gowers's GOFAI approach out of business if the AI can also "justify" its proofs, in the sense that Gowers explains in his blogpost. Do you think that will happen in 5 years?


DeepMind's recently introduced 540 billion parameter language model PaLM can already explain complex jokes and and reason logically by explicitly describing reasoning steps. The results are quite impressive. https://ai.googleblog.com/2022/04/pathways-language-model-pa...

To be sure, this happened just two years after GPT-3. So 5 years for something which Gowers wants, but based on a large language model, seems in fact quite realistic.


I find myself doubting that this goal makes any sense.

One mathematician often cannot "justify" their reasoning process to another if their thinking is even slightly different. For example what is intuitive for one side of the algebra/analysis divide is often not to people on the other side.

It is therefore unlikely that a human level computer program will think enough like humans that it can be justified to any human. And vice versa. For it to do so, it has to be better than the human, also have a model of how the human thinks, and then be able to break down thinking it arrived at one way to something that makes sense to the human.

If a computer succeeded in this endeavor, it would almost certainly come up with new principles of doing mathematics that would then have to be taught to humans to for humans to appreciate what it is doing. Something like this has already happened with Go, see https://escholarship.org/uc/item/6q05n7pz for example. Even so, humans are still unable to learn to play the game at the same level as the superhuman program. And the same would be true in mathematics.


I've found for most substantial proofs I've been exposed to I understood the author's justification of how they found the proof. That doesn't mean I would have been able to come up with that step on my own, but once it was done I understood why they took it. If you get sufficiently far into the esoteric weeds of either analysis or algebra, the justifications won't be understood outside that field. But they don't have to be.

Theorem provers starting from basic axioms are often looking for things that aren't in the esoteric weeds of a specific subfield as it often takes too long for them to generate a suitable set of theorems that establish those fields.


> And vice versa. For it to do so, it has to be better than the human, also have a model of how the human thinks, and then be able to break down thinking it arrived at one way to something that makes sense to the human.

This is more or less what Gowers is after. And that's exactly why he wants GOFAI instead of ML. He wants to understand how "doing mathematics" works?


I think this approach hinges on how you build and define the network that flags the "most promising search state". I'd argue this is a far harder problem in theorem proving than it is in various games. Before AlphaZero all kinds of position evaluation techniques existed in each of these games and were in widespread use for over 25 years.

Comparatively, it's even a hard problem to define anything resembling a board where the structure is fixed enough to apply something similar to positional evaluation to. What you want is a network that flags next steps in theorems as promising and allows you to effectively ignore most of the search space. If you have a good way to do this I think AlphaZero would be a promising approach.

But even if you get AlphaZero to work it doesn't solve the problem of understanding. AlphaZero made go players better by giving them more content to study but it didn't explicitly teach knowledge of the game. Strategically in Go, it's actually rather unsatisfying because it seems to win most of its games by entirely abandoning influence for territory and then relying on its superior reading skills to pick fights inside the influence where it often seems to win from what pro players would previously describe as a disadvantageous fight. This means the learnings are limited for players who have less good reading skills than AlphaZero as they can't fight nearly as well, which suggests abandoning the entire dominant strategy it uses.


I think for short proofs that fit in a GPT-style NLP predictor's input, it could produce an initial list of likely "next steps" which could be further scored by the adversarially-trained scoring model which has some intuition for which proof search strategies are likely to work, the combination allowing each model to have different areas of expertise (NLP being technically competent locally and the scorer providing global judgement). The entire proof search would correspond to a single play in AlphaGo with proof search happening entirely within the monte-carlo exploration of likely next-steps as opposed to a sequence of plays as in Go, and play only happening when an automatic proof-checker validates the final proof. Automatic proof-checking could also prune syntactically invalid continuations from the monte-carlo search tree, and probably provide some similar optimizations.

My guess is that such a model would learn to compose a collection of lemmas that score as useful, produce sub-proofs for those, and then combine them into the final proof. This still mimics a sequence of plays closely enough that the scoring model could recognize "progress" toward the goal and predict complementary lemmas until a final solution is visible.

It may even work for very long proofs by letting it "play" a portion of its early lemmas which remain visible to the scorer but the proof sequence is chunked into as many pieces as the NLP needs to see it all and backtracking behind what was output is no longer possible. Once a lemma and its proof are complete it can be used or ignored in the future proof but modifying it sounds less useful.


I think the probability of a GPT-style approach learning to generate mathematically interesting next steps is close to zero. The structure GPT captures is a sense of plausibility of a next step and while you could argue that it's possible given enough data to train the network to only rate mathematically interesting steps highly, the rate of that training would be extremely slow. GPT-style training is most effective for language where there is a certain degree of flexibility. It tends to learn plausibility and provides relevance mostly from context. This approach seems akin to searching in the language space of English+Mathematics over all proofs and hoping to learn mathematically relevant continuations. I think the search space is way too large and positive feedback is far too rare for the model to converge to anything useful.

I highly doubt you could get such a model to produce a useful collection of lemmas. Even if you could, I don't think a useful collection of lemmas progressing towards a goal has anywhere near the same structure as scoring a position in a game.

Ultimately for any AlphaZero technique to work you need a powerful position evaluation network. Almost all the magic happens in that network's pruning. It's quite remarkable that you can get a search of 80,000 nodes to do better that a search of 35,000,000 nodes. All of that is due to efficient pruning from the network. You haven't convinced me of any plausible approach to getting such a network here.


There's a very big difference between solving one problem (if you through enough compute at it you obviously will succeed) and general proving being solved. The latter essentially means being able to replace all programmers, that'd effectively mean AGI in 5 to 10 years.

>>doing some GAN like system of both generating problems that yet can’t be solved and a solver to solve them seems to obviate the need for human training data.

I don't see how this is supposed to work, what if the generator just outputs unsolvable problems?


Human-language proofs for anything non-trivial are really just proof sketches, that's why you can't really meaningfully separate this into a two step process.


My hunch is that building "justified" proofs that avoid computer-driven brute force search ought to be the same as building proofs in a logic where finding a proof is computationally easy. We know that such logics exist (e.g. the multi-modal logics that have been proposed for use in the semantic web have been specifically chosen for computational tractability; in practice, multi-modal logic can also be viewed as a subset of FOL.). If so the problem is roughly equivalent to reducing useful subsets of math to such logics.

(The problem of tractability also comes up wrt. type systems in computer languages, and of course "modality" in logic has been usefully applied to both human language/cognition and commonly used PL constructs such as monads.)


You can also convert your logic model into something that can be viewed as an image completion problem, eg, by translating categorical diagrams into a adjacency matrix and “completing” a partial one. (Hopefully, but preliminary results are encouraging.)

So far, we’ve started publishing about shape types encoded that way and are hoping to get to Zn group models by the end of the year. (Work is done, but I’m a slow writer.)

https://www.zmgsabstract.com/whitepapers/shapes-have-operati...


Very interesting work. Visual and diagrammatic reasoning are part of this problem, yes. It might turn out that some proofs that might be generally thought of as hard are actually "visually" justified, in a way that could even perhaps increase our power to justify mathematical statements. The extended paper Gowers links in his blogpost discusses one such case in depth, namely the well-known problem of whether a chessboard with two opposite corners removed can be tiled by two-square dominoes.


I've worked with logics where finding proofs are computationally easy and even logics where expensive steps can be explicitly bound and controlled for. I'm always very skeptical about claims that assert "the same".

Generally speaking, some proofs will be easy in those specific logics and other proofs will be hard or impossible. The problem won't be equivalent to reducing useful subsets of math to such logics however as you will often want to prove a lemma in one logic and a theorem in another. The fact that you don't have a good vehicle for dealing with the union of statements from each distinct fragmented logic makes the entire exercise fall apart.

Instead, most theorem provers need to operate in a single logic so that they can easily build on previous results.


> Generally speaking, some proofs will be easy in those specific logics and other proofs will be hard or impossible ... you will often want to prove a lemma in one logic and a theorem in another. ... The fact that you don't have a good vehicle for dealing with the union of statements from each distinct fragmented logic makes the entire exercise fall apart.

Not sure why this would inherently be an issue, since one can use shallow embeddings to translate statements across logics, and even a "union" of statements might be easily expressed in a common logical framework. But perhaps I'm missing something about what you're claiming here.


In my experience unions of logics that are computationally bound and nice don't end up computationally bound and nice. There are lots of cases where step A is only permits things that are computationally nice and step B only permits things that are computationally nice, but being able to do both A and B permits things that are computationally bad for just about any definition of badness you want to use.

I'm claiming the common logical framework won't have all the nice properties that come from the careful restriction in each of the individual logics.


> I'm claiming the common logical framework won't have all the nice properties that come from the careful restriction in each of the individual logics.

Sure, but this kinda goes without saying. It nonetheless seems to be true that if you want to come up with "justified" proofs, you'll want to do that proving work in logics that are more restricted. You'll still be able to use statement A for a proof of B; what the restriction ultimately hinders is conflating elements of the proofs of A and B together, especially in a way that might be hard to "justify".


I'd argue what you ultimately get is a handful of near axiom As and Bs that you can prove in the smaller logics and for any sufficiently interesting statement you end up in combined logics that lose all the nice properties you were hoping to have. The involved proofs won't be justifiable because they lose the properties that gave the justifiability that came from the smaller non-union logics.

It's not a promising approach if the only statements of the prover you can justify are the simplest and most basic ones.


I'd argue that you ought to be able to avoid working in powerful logics; instead you'd end up using A (or an easily-derived consequence of A) as an axiom in the proof of non-trivial statement B and such, while still keeping to some restricted logic for each individual proof. This is quite close to how humans do math in a practical sense.


But avoiding working in the powerful logics is akin to working in a single logic as much as possible without merging any. So you've lost the benefit of multiple logics that you're originally claiming and you're back in my "use one logic" case.


There are real difficulties here, and you're right to point them out. But I'd nonetheless posit that staying within a simple logic as much as possible, and only rarely resorting to "powerful" proof steps such as a switch to a different reasoning approach, is very different from what most current ATP systems do. (Though it's closer to how custom "tactics" might be used in ITP. Which intersects in interesting ways with the question of whether current ITP sketches are "intuitive" enough to humans.)


But when you want to combine A and B and start to work in a logic that is the union of their logics you end up doing a proof in a logic that often loses all the nice properties that were essential to the theorem proving being reasonable. This works if combining A and B is the entire proof, but how do you handle searching for good next steps in the new logic that lacks those properties if there are other pieces of the proof to be done still?


It sounds as if the author is not well informed about the recent advancements and findings from the literature and research community on deep learning.

> However, while machine learning has made huge strides in many domains, it still has several areas of weakness that are very important when one is doing mathematics. Here are a few of them.

Basically all of these claims are wrong. I'm not saying we have maybe perfectly solved them but we have solved them all to a great extend and it currently looks like if just scaling up further probably solves them all.

> In general, tasks that involve reasoning in an essential way.

See Google PaLM. Or most of the other recent big language models.

> Learning to do one task and then using that ability to do another.

Has been done many times before. But you could also again count the big language models as they can do almost any task. Or the big multi-modal models. Or then the whole area of multi-task learning, transfer learning. This all works very well.

> Learning based on just a small number of examples.

Few-shot learning, zero-shot learning, meta learning allows to do just that.

> Common sense reasoning.

Again, see PaLM or the other big language models.

> Anything that involves genuine understanding (even if it may be hard to give a precise definition of what understanding is) as opposed to sophisticated mimicry.

And again, see the big LMs.

Sure, you can argue this is not genuine understanding. Or there is no real common sense reasoning in certain areas. Or whatever other shortcomings with current approaches.

I would argue, even human intelligence falls short by many of that measures. Or humans also do just "sophisticated mimicry".

Maybe you say the current models lack long-term memory. But we already also have solutions for that. Also see fast weights.

The argument that humans just use a small number of examples to learn ignores all the constant stream of data we passively get through our life through our eyes, ears, etc. And also the evolutionary selection which sets the hyper parameters and many wiring paths of our brains.


Language models can only parrot back their training input. There's no generality to them at all; the best they can do is some crude, approximate interpolation of their training examples that may or may not be "correct" in any given instance. There are things that the typical AI/ML approach might be genuinely useful for (e.g. generating "probable" conjectures by leveraging the "logical uncertainty" of very weak and thus tractable logics) but mainstream language learning is a complete non-starter for this stuff.


I suggest looking into the examples of the PaLM blog post and the paper. PaLM is extremely impressive.


So, are you saying I can feed 5 images of a particular type of green cat to a ML model and it'll learn to recognize those green cats? I doubt it.

Reasoning is the same. Where can I find a language model that reads a phrase and explains what grammar rules it violates?


> (I myself am not a fluent programmer — I have some experience of Haskell and Python and I think a pretty good feel for how to specify an algorithm in a way that makes it implementable by somebody who is a quick coder, and in my collaborations so far have relied on my collaborators to do the coding.)

Aw, this sets off the alarm bells for me. My immediate reaction is that this guy has no clue what he is getting into, apart from his "domain experience" (ie. top level mathematician). Run for the hills, I say.


This is not his first rodeo. https://arxiv.org/pdf/1309.4501.pdf


Are we really sure that humans aren't brute force searching for proofs, through a combination of subconscious processing and network effects (i.e. many people working on the same problem and only some of them happen on to the correct path)?


Gowers seems to be arguing that brute-force-searching for proofs at the highest levels of abstraction might be okay, so long as the proofs themselves don't use brute force, even indirectly. (This means that a proof step such as: "pick n=42 and the proof goes through. QED." would somehow have to be disallowed.)


I think it is interesting to compare this to the previous attempt from 2013 by Gowers (and Mohan Ganesalingam) described in a blog series ending with https://gowers.wordpress.com/2013/04/14/answers-results-of-p... .



There was a ton of work on this in the 20th century, but the manifesto lacks a bibliography. How will this succeed where all the other brilliant folks from the past fell short?


To be harsh, one could probably argue that no one as brilliant as Gowers, and with his expertise in the domain, has worked on this in the past.


I have not heard of Gowers, but I have heard of Gödel, Turing, and Church. In fact, Gödel showed for any non-trivial axiomatic system there are true statements which cannot be proved in the system. So there is that.


Well, none of those have worked on automated theorem proving. So there is that.

As for logic, I am sure Gowers knows about Gödel, Church and Turing, so that should not be a problem ...

Furthermore, your statement of Gödel's incompleteness result is wrong, as it directly contradicts Gödel's completeness result. It's not that you cannot prove all true statements for a non-trivial axiomatic formal system. But it is rather that your non-trivial axiomatic formal system does not mean what you might want it to mean as you will always have non-standard models alongside your intended standard model. So that's actually an argument FOR Gowers' approach, because it means that an automated mathematician needs to reach beyond formal logic and capture this elusive notion of intuition.



I wonder if this will be constructivist.


Here's my guess.

Gowers wants to understand how the typical mathematician comes up with a proof. How is the proof found? Where do the ideas come from?

To some extent, this is orthogonal to whether or not you do maths constructively. But since 99.9% of mathematicians have never heard of constructive mathematics, and just use AoC or LEM all over the place, I am quite certain that Gowers is very much interested in how to find proofs in the "ordinary" sense: including the use of AoC and LEM.


No, it won't be.


Rewriting COQ in Rust?


I apologise ahead for posting something in response to this which is technically off topic, and unfortunately also mentions blockchains. The comment by 'yaseer' reminded me of a vaguely related idea, so I figured some people reading this might be interested...

What if instead of a chain-of-proofs-of work, there would be a chain-of-proofs?

And, what if instead of a chain, it was a directed-acyclic-graph-of -proofs?

(From now on I will use the term blockgraph instead of blockchain to disambiguate.)

Lastly: instead of a giant Ponzi scheme of useless deflationary pretend money, what if the blockgraph was simply backed by real dollars used to reward useful proofs?

That last bit is the most important: What's a useful proof? Why would anyone pay money for such a thing? Why would anyone care about a not-a-blockchain that won't make anyone super rich?

I like to imagine a "superoptimising compiler" that has a $1K/annum license. That's peanuts to a FAANG or even a medium-to-large org with a development team. This money would go "into the blockgraph"[1], compensating the authors of the "most useful proofs of more optimal code sequences".

Obviously a lot of effort would have to go into designing the blockgraph to avoid it being gamed or abused, because it is a lot more complex than comparatively trivial proof-of-work linear chains like Bitcoin.

The idea is that each proof published to the blockgraph would be identified by its hash, and would build upon existing proofs in the chain by referencing their hash codes in turn. So each proof would be a series of terms, most of which would be cross-references, as well as some novel steps that are difficult to generate but trivial to verify.

To disincentivise users spamming true but useless proofs such as "1+1=2" and "2+2=4", the system would have to penalise proofs based on the data volume of their novel component plus some smaller penalty of the total number of references, plus an even smaller penalty for indirect (transitive) references.

Existing proofs that are referenced get a "cut" of the real money income granted to proofs that reference them. This makes single-use proofs worthless to publish, and widely usable proofs potentially hugely lucrative.

Publishing shortcuts in the proof graph are rewarded, because then other proofs can be made shorter in turn, making them more profitable than longer proofs.

Etc...

In general, the design goal of the whole system is to have a strong incentive to publish short, efficient proofs that reuse the existing blockgraph as much as possible and can in turn be reused by as many other users as possible. This makes the data volume small, and proofs efficient to verify.

The people putting up real money can get their "problems solved" automatically via a huge marketplace of various proofs systems[2] interacting and enhancing each other.

Want a short bit of code optimised to death? Put the "problem" up on the blockgraph for $5 and some system will try to automatically find the optimal sequence of assembly instructions that provably results in the same outcome but in less CPU time.

A compiler could even do this submission for you automatically. Feed in $1K of "optimisation coins" and have it put up $50 every time there's a major release planned.

Want to optimise factory production scheduling with fiendishly complex constraints? Put up $10K on the blockgraph! If it gets solved, it's solved. If not... you don't lose your money.

Etc...

[1] The obvious way to do this would be to link the blockgraph to an existing chain like Bitcoin or Ethereum, but it could also be simply a private company that pays out the rewards to ordinary bank accounts based on the current blockgraph status.

[2] Don't assume computers! This isn't about rooms full of GPUs burning coal to generate useless hash codes. A flesh and blood human could meaningfully and profitably contribute to the blockgraph. A particularly clever shortcut through the proof space could be worth millions, but would never be found by automated systems.


Related, from Vitalik [0]:

> Proof of excellence

> One interesting, and largely unexplored, solution to the problem of [token] distribution specifically (there are reasons why it cannot be so easily used for mining) is using tasks that are socially useful but require original human-driven creative effort and talent. For example, one can come up with a "proof of proof" currency that rewards players for coming up with mathematical proofs of certain theorems

[0]: https://vitalik.ca/general/2019/11/22/progress.html


My concept is not to try and make a "currency". That would be a distraction from the main purpose of the thing, and wouldn't fit the desired purpose.

You'd just get paid, almost like freelance work. But it would be a public graph, the payments could be millions of micropayments tracked by the graph, etc...


I actually thought about this for a while, but I ended up not thinking that it'd work, because (among some other things which you also touched on)

> To disincentivise users spamming true but useless proofs such as "1+1=2" and "2+2=4", the system would have to penalise proofs based on the data volume of their novel component

I believe that this stated goal of defining a metric that decides which theorems are "interesting" is a lot more difficult than finding proofs for theorems.

I think at some point proving things will to a large part be automatic, and mathematicians will mostly concern themselves with finding interesting theorems, definitions, and axioms, rather than wasting time on proving-labor.

But what do I know.


"Interesting" cannot be defined in an automatically verifiable way.

"Useful & short" however can be. Useful proofs are ones that solve the problems that are placed on the chain with real money rewards. The length/size of the proof is trivially verifiable. Useful AND short is the combination that's required. "Useful" alone would result in duplicate spam. Merely "short" would result in bulk generation of useless proofs. "Useful and short" means that the core of the graph would be optimised much like ants looking for food. Shorter AND useful paths to many goals are automatically rewarded. Longer paths can exist, but as they get more used, the incentive to try and optimise them rises dramatically to the point where even humans might want to contribute clever shortcuts manually.


Yeah man, but I'm not talking about the proof. I'm talking about the statement that has to be proved.


I don't know about theorem provers but computational algebra systems work a bit like a blockhain. At least they both use a merkle tree to store their graph/tree/list, but somewhere in your description there is a reformulation of the halting problem. It is impossible to know what rule is going to be useful next, CAS systems will often just use the length of the expression to decide this order. Like knuth said once, maybe P=NP but P is just impractically big.



Interesting, most theorem provers like lean [1], coq [2] and Isabelle [2] are concerned with formalization, and interactive theorem proving.

I initially thought "why do we need another one of these", like rolling your eyes at another programming language or JS framework. There's even another large-scale research project already underway at Cambridge [3].

I thought this extract from Gower's longer manifesto [4] captured the problem they're trying to solve:

"(just consider statements of the form “this Turing machine halts”), and therefore that it cannot be solved by any algorithm. And yet human mathematicians have considerable success with solving pretty complicated looking instances of this problem. How can this be?"

The heart of this project is understanding this problem - not creating another proof formalisation language. To understand algorithms that can prune the search space to generate proofs, using a "Good Old Fashioned AI" (GOFAI) approach, rather than machine learning.

Gowers makes a point to contrast their GOFAI approach with ML - they're interested in producing mathematical insights, not black-box software.

[0] https://leanprover.github.io/

[1] https://coq.inria.fr/

[2] https://isabelle.in.tum.de/

[3] https://www.cl.cam.ac.uk/~lp15/Grants/Alexandria/

[4] https://drive.google.com/file/d/1-FFa6nMVg18m1zPtoAQrFalwpx2...


>I initially thought "why do we need another one of these", like rolling your eyes at another programming language or JS framework. There's even another large-scale research project already underway at Cambridge [3].

I had a complete opposite reaction to yours. I feel like automatic proving languages are very quirky and stem from the creators not really knowing much about programming. At my university a bunch of professors who really liked Prolog made a formal proving language and guess what syntax it had? Yeah... terrible stuff.

Personally it's been a few years since I started thinking about this matter, but one of my personal objectives in live is to create a decently simple and intuitive (for both mathematicians and programmers) environment for formally proving their theorems


Are you implying that giving prolog-like syntax to a language suggests ignorance about programming? Why would that be terrible?

IMO, syntax is really not the main attraction, neither is it the main problem to solve when writing a theorem prover. Prolog has the advantage of a regular syntax, just like lisp.

Instead of wanting to make your miracle-own-thing, it would be far better to contribute to something like Idris. It's brilliant and in dire need of libraries.


Who are automatic theorem provers aimed at? Computer scientists with an interest in maths or mathematicians with an interest in CS? If you are a mathematician starting off with something like Coq is a nightmare. Nobody learns OCamel in college, math majors usually learn R, Python and maybe Java or C.

Making formal proving simple and intuitive is the first step to have it heavily adopted. It should look as close as possible to writing a proof in pure first order logic.


IMO, you overstate the issue of syntax. As a hobbyist in both programming and math, Coq's syntax has never been the reason I failed to complete a proof. But perhaps I'm just too dumb and my difficulties lie elsewhere, so that's just my 2c.

I think there's room for a spectrum of theorem provers made for academic pure mathematicians, industry programmers and everything in between. Those should perhaps not have identical syntax, neither should they have the same goals.

To support my point, here's an example of an exotic theorem prover: https://github.com/webyrd/mediKanren It is aimed at medical researchers, and computes proofs about the medical literature, no less! This is a very different system and audience than which you are thinking about, but it's still a theorem prover.


It's not really the syntax but rather that until very recently theorem provers have been quite niche even in formal disciplines, so the tooling isn't quite there.

In an analogy with traditional programming I'd say we have goto but we are yet to have structured loops. Enormously powerful but still hard to apply to either real mathematics or real problem.


We share this objective! I am sharing my thoughts on that here: https://practal.com


> Gowers makes a point to contrast their GOFAI approach with ML - they're interested in insights, not black-boxes

But I have the impression Gowers dismisses the insights of what he calls machine-oriented ATP. These systems were --- perhaps are --- optimized on all levels of abstraction. From advances in the theory of which parts of the search space could be eliminated without loss of generality to optimizing the layout of c structs to improve cache locality.


I agree he seems quite disinterested in 'machine-oriented ATP'.

But he's a mathematician, not an AI researcher - they have different goals.

Many AI researchers are interested in creating tools that solve problems (or proofs in this case).

Most mathematicians find proofs interesting for the insights they provide, not just the solution output.

You could say Gowers is looking for meta-proofs that provide insights on proof generation. It makes sense for him to emphasise the symbolic logic approach of GOFAI, here.


But machine-oriented ATP is based on a symbolic-representation of formulas and terms. And the output is a proof formulated in the underlying logical calculus not just yes or no.


That doesn't mean it is a proof that human have the slightest chance of understanding. It gets out of hand quickly.


Exactly - machine-formulated ATP often ends up like machine-code, very similar to the proofs of Russell's Principia Mathematica. Extremely low-level formalisms great for procedural reasoning, but poor for human understanding.

I would speculate Gowers is looking at higher-level abstractions that capture the essential semantics mathematicians are interested in - very much like a higher level programming language that humans understand.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: