SAT solving SHA256 is a dead end - I was researching this for the purpose of Bitcoin mining (before this article came out), along with many others, for profit.
SAT solving typically relies on reducing a function down to boolean expressions where clauses are minimally connected with AND operations (a.k.a. Conjunctive Normal Form).
The math basis for this is that in GF2 (Galois Field 2 - a number system with only 2 values):
AND is equivalent to Multiply (1 * 1 = 1; 1 * 0 = 0, etc.)
XOR is equivalent to Addition (1 + 1 = 0; 0 + 1 = 1, etc.)
This is useful because we can re-express a boolean function as multiplication and addition for complexity analysis.
The difficulty comes when you mix AND and XORs; take the Ch function within SHA256 -
Ch(E, F, G) == (E & F) | (~E & G)
Which can be rewritten as:
Ch(E, F, G) == (((F ^ G) & E) ^ F)
Rewritten in GF2 it is equivalent to:
Ch(E, F, G) == ((F + G) * E) + F
Because of distributive laws, we know there is zero way to simplify this further.
Its trivial to see that to work backwards, you have to "consider" way more input combinations for every known output. Even if the output is 0 (which you want when Bitcoin mining), you have this problem where there are 4 possible input combinations.
Now unroll the entire SHA256d function Bitcoin uses and you realize the effort to work backwards is far more than the effort to brute-force the input keyspace. This explosion in complexity comes from the fact that the whole function feeds forward, so there is little that can be done to "resolve out" intermediate values as there are no cyclic value dependencies.
SHA256d is effectively a giant rat's nest of multiplies and adds, all stacked on top of each other, that cannot be reduced, due to distributive laws.
I would like to respectfully disagree. I think you're simplifying the reason too much.
I don't think it's the Ch function itself (and the rest of the compression function) that causes problems for the SAT solver. I think it's the combination of the compression function and the message schedule that does it. If the message schedule had just been free variables (or simple repetitions of the message), I'm pretty sure the SAT solver would be able to unwind the compression function just fine.
The problem is the interaction between the message schedule and the compression function; once you commit to a value in some round of the compression function, you actually impose a set of constraints on the message schedule for all the other rounds simultaneously. So in a way, committing to a value in some round of the compression function means you're also indirectly influencing the possible values in every other round too. It creates this weird sort of dependency between the rounds which wouldn't exist if you only had a long series of Ch functions.
This only works when feeding the data forward. When SAT-solving (i.e. working backwards), each clause has multiple possible inputs, meaning there is nothing to resolve away.
When you say "clause", do you mean a disjunction of literals in propositional logic? Because if so, I don't understand what you mean. Clauses don't have inputs. And when you say "resolve away", do you mean applying the resolution rule?
I recently (naively) dug into it to try and understand where it's magic comes from. I was under the impression that it also gets strength by flipping back and forth between F2 and Z+. Where AND and XOR are linear in F2, they're nonsense in Z+, where addition in Z+ is nonsense in F2 (not to forget the whole mod 2^32 part).
Yes, we don't have good ways to unify the flipping between GF32 and GF2 beyond down-converting to GF2 (which yields a massive increase in clause-complexity).
GF32 addition in GF2 basically looks like a giant ripple-carry-adder.
> GF32 addition in GF2 basically looks like a giant ripple-carry-adder.
I tried to implement addition in GF32 through Kogge-Stone adder because the dependency chain has length 5 instead of 32. No success. The intermediate variables got so complex that the BDD library I used at the time couldn't even synthesize the expression for a 32-bit adder.
I bought a couple of the cheaper USB miners earlier this year just to play around with...have spent quite a bit of time since then mining alts. Bitcoin mining is absolutely a fool's game at this point. The hardware vendors have a terrible track record (cf. the recent FTC raid on BFL), and there is a slim margin wherein a few people somehow manage to earn more than they spent on the devices -- the rest of the time a purchase results in a substantial loss due to hardware obsolescence and exponentially increasing difficulty. At some point I figure the technology will stabilize and the prices will make mining a reasonable option again, but for the moment you are far better off just buying some coins.
Things were much nicer when GPU mining was the market leader. Bitcoin mining was usually profitable for a few weeks during/after a major price jump, and your main cost was power. Once the profitability window ended, just turn off your mining hardware.
Now that we have ASICs and scammy companies with months of lead time making them, things are way more volatile. Power is cheap, but the hardware is thousands of dollars, and the preorder/crowdfunding scheme is pretty reprehensible.
Would scrypt with harder parameters make a better ASIC-resistant hash?
>Initial tests showed that block 218430 with considerably higher difficulty is solved more efficiently than the genesis block 0 for a given nonce range.
This is very misleading. 99.99999999% of the work of finding a valid block is finding the right block header to mine on. You have 2^32 possible nonces, and your odds of finding a block are 1 in 2^67.
By giving his program the block in advance, he's taking a shortcut.
An algorithm that could quickly decide whether a block header contained a valid solution would be very valuable indeed.
The part I find most interesting is that (if I understand it correctly), bruteforce gets harder as SAT solving gets easier. That is - the more zeros are required, the fewer variables are in the SAT problem. The less zeros are required, the higher chance of bruteforcing a correct result.
(edit: actually not fewer variables, but... I'm not sure what the correct term is, but the tree of operations gets smaller, because more results have to be fixed)
Actually, this doesn't follow at all (although it is what the article author claims). The exact inverse is true.
In the extreme case, where we require no zeros, the problem is trivial (just hash any old data), and any good SAT solver will finish fairly quickly (perhaps with a little guidance). If we require all zeros, we are effectively asking the solver to break SHA2, which no SAT solver is going to do well at at present, or the forseeable future.
While assigning variables does decrease the worst-case complexity, no number of zeroes can be brute forced by exhaustive search before the heat death of the universe, and assigning zeroes makes finding solutions harder in a search problem which is still massive.
Actually I'm not sure that claim is accurate (they state it as an intuition). The fact that there are more constraints (that was the word you're looking for) does not necessarily mean that SAT will be easier. It could be that the constraints make finding a solution more difficult.
It is true in an absolute, mathematical sense that the search space gets smaller when you fix a variable to a particular value.
However, by fixing a variable (and effectively removing it), you are also removing potential solutions from that search space.
In a sense, you could view the difficulty of a SAT problem as the fraction of valid solutions in the whole search space. By fixing a variable, you are decreasing both the number of valid solutions and the size of the total search space. But the fraction could end up being either greater or smaller, depending on the variable and the value you set it to.
The constraints will definitely make finding a solution more difficult. If not, Bitcoin hash difficulty would be totally broken. The question is what the _rate_ of difficulty increase is like, compared to the brute force approach.
"A very intriguing, and perhaps unintuitive property of the algorithm proposed is that with increasing bitcoin difficulty, or equally lower target, the search could become more efficient, at least in theory. This is because we can assume more about the structure of a valid hash -- a lower target means more leading zeros which are assumed to be zero in the SAT-based algorithm."
This is actually pretty misleading. There might be fewer variables, but that's irrelevant. Yes, there are actually fewer variables in the actual file you feed to the SAT solver because you can assume they are 0 and propagate that to the other constraints involving those variables. But think about it -- you're only going to get rid of a couple of hundred variables at most, in a problem with ~250,000 variables. So you're only really reducing the full search space by an incredibly small percentage.
In practice, the running time of the SAT solver increases exponentially with the number of zeros you assume in the hash, just like it does for a regular brute force trying all combinations of inputs.
The conclusion I draw from this is that not all variables contribute equally to the difficulty of a problem (for a SAT solver). This is actually why SAT solvers can be efficient for some problems in the first place, even for problems with hundreds of thousands of variables; the SAT solver (a smart brute force, as opposed to a naive brute force) is able to exploit the fact that many problems you feed to it are not intrinsically hard.
Exactly. And I do not see zeroes as fewer variables, but as more constraints. The two points of view are actually equivalent. If you have actual variables for the digits that need to be zero, you can either get rid of these variables and replace them with zero everywhere else or you simply add constraints that require those digits to be zero.
While SAT, and their cousin SMT, solvers are useful in the analysis of symmetric cryptographic primitives, they quickly become useless once the complexity of the problem hits a certain threshold. They are best employed to analyze contained parts of the primitive.
It can be wise to use a SAT solver to find a preimage of, e.g., SHA-256 reduced to 10 rounds. But it will be hopeless on the full function, unless it is severely broken.
I think you misread the article. They are not claiming to break SHA, they are claiming they can get better than bruteforce performance on mining bitcoins.
I understand that. The author himself does not claim this approach is faster than bruteforce. Indeed, consider the converse: for mining using a SAT solver to be faster than bruteforce implies that the SAT solver is able to exploit some structure in the hash function which plain bruteforce has no access to. The point of a good hash function is to have no such structure.
If you additionally consider how SAT solving algorithms work, they are less amenable to modern CPUs than plain bruteforce, which can take full advantage of SIMD and instruction parallelism. If you look at password hash breakers, you will see them using extra tricks, like early abort, to speed things up even further.
In general, if one could find preimages using SAT it wIn general, if one could find preimages using SAT it would be done for cryptanalytic purposes would be done for cryptanalytic purposes. SAT is also not likely to admit a hardware (ASIC, etc) solution, so it would be much more difficult to massively parallelize and to keep cost per hash low.
SAT solving typically relies on reducing a function down to boolean expressions where clauses are minimally connected with AND operations (a.k.a. Conjunctive Normal Form).
The math basis for this is that in GF2 (Galois Field 2 - a number system with only 2 values):
This is useful because we can re-express a boolean function as multiplication and addition for complexity analysis.The difficulty comes when you mix AND and XORs; take the Ch function within SHA256 -
Which can be rewritten as: Rewritten in GF2 it is equivalent to: Because of distributive laws, we know there is zero way to simplify this further.Its trivial to see that to work backwards, you have to "consider" way more input combinations for every known output. Even if the output is 0 (which you want when Bitcoin mining), you have this problem where there are 4 possible input combinations.
Now unroll the entire SHA256d function Bitcoin uses and you realize the effort to work backwards is far more than the effort to brute-force the input keyspace. This explosion in complexity comes from the fact that the whole function feeds forward, so there is little that can be done to "resolve out" intermediate values as there are no cyclic value dependencies.
SHA256d is effectively a giant rat's nest of multiplies and adds, all stacked on top of each other, that cannot be reduced, due to distributive laws.