I'm curious: if we can solve the "trusting trust" problem - that is identifying ...

jancsika · on July 11, 2017

Ignoring for the moment all the indeterminism on a running laptop, the main reason is that we don't have the tooling to do that.

For example: compiler is to software as X is to hardware. What is X? And how does one go about creating their own X?

nickpsecurity · on July 12, 2017

Once you learn digital design, you learn how tools like this works:

http://opencircuitdesign.com/qflow/

Start with a simple CPU and memories you can hand-check sent to a 0.35-0.5 micron fab that's visually inspectible. Then, after verifying random sample of those, you use the others in boards that make the rest of your hardware and software. You can even try to use them in peripherals like your keyboard or networking. Make a whole cluster of crappy, CPU boards running verified hardware each handling part of the job since it will take a while. You can use untrusted storage if the source and transport CPU's are trusted since you can just use crypto approaches to ensuring data wasn't tampered with in untrusted RAM or storage. Designs exist in CompSci for both.

So, you'll eventually be running synthesis and testing with open-source software, verification with ACL2 a la Jared Davis's work (maybe Milawa modified), visual inspection of final chips, and Beowulf-style clusters to deal with how slow they are. And then use that for each iteration of better tooling. I also considered using image recognition on the pics of the visual trained by all the people reviewing them across the world. More as an aid than replacing people. Would be helpful when transistor count went up, though.

Other links:

https://www.cs.utexas.edu/users/moore/publications/acl2-pape...

https://www.cs.utexas.edu/users/jared/milawa/Web/

http://www.vlsitechnology.org/html/libraries.html

falcolas · on July 11, 2017

Ultimately a compiler is just a bit of software; one that takes inputs and produces outputs. The identification of compromise is the difference in outputs for the same inputs (simplified, of course).

So, given we can control most inputs to hardware, and most outputs, it seems possible to objectively identify when the HW is misbehaving (such as "A" produces network output that "B" does not). It wouldn't nail down which piece of hardware was compromised, but it would help identify that hardware is compromised.

It will never be _that_ easy, of course... but it seems possible.

jancsika · on July 12, 2017

> It wouldn't nail down which piece of hardware was compromised, but it would help identify that hardware is compromised.

Do TCP timings and retransmissions count as difference in outputs?

BenjiWiebe · on July 11, 2017

If X is "foundry" you probably can't create your own. And I think it is.

nickpsecurity · on July 11, 2017

It's a solved problem. Paul Karger, who invented the attack and concept in the 1970's, immediately worked with others to solve it with rigorous methods called high-assurance security. Far as this problem, it's mainly a problem of people you trust reviewing it, it getting distributed to you, and you verifying you got what they reviewed. With most distro's, it boils down to that since you have to trust millions of lines of code (maybe privileged) in the first place. SCM security of a trusted repo becomes the solution. Wheeler covers SCM security here:

https://www.dwheeler.com/essays/scm-security.html

Now, let's say you want to know the compiler isn't a threat. That requires you to know that (a) it does its job correctly, (b) optimizations don't screw up programs esp removing safety checks, and (c) it doesn't add any backdoors. You essentially need a compiler whose implementation can be reviewed against stated requirements to ensure it does what it says, nothing more, nothing less. That's called a verified compiler. Here's what it takes assuming multiple, small passes for easier verification:

1. A precise specification of what each pass does. This might involve its inputs, intermediate states, and its outputs. This needs to be good enough to both spot errors in the code and drive testing.

2. An implementation of each pass done in as readable a way possible in the safest, tooling-assisted language one can find.

3. Optionally, an intermediate representation of each pass side-by-side with the high-level one that talks in terms of expressions, basic control flow (i.e. while construct), stacks, heaps, and so on. The high-level decomposed into low-level operations that still aren't quite assembly.

4. The high-level or intermediate forms side by side with assembly language for them. This will be simplified, well-structured assembly designed for readability instead of performance.

5. An assembler, linker, loader, and/or anything else I'm forgetting that the compiler depends on to produce the final executable. Each of these will be done as above with focus on simplicity. May not be feature complete so much as just enough features to build the compiler. Initial ones are done by hand optionally with helper programs that are easy to do by hand.

6. Combine the ASM of compiler manually or by any trusted applications you have so far. The output must run through assembler, linker, etc. to get the initial executable. Test that and use it to compile the high-level compiler. Now, you're set. Rest of development can be done in high-level language w/ compiler extensions or more optimizations.

7. Formal specification and verification of the above for best results. Already been done with CompCert for C and CakeML for SML. Far as trust, CakeML runs on Isabelle/HOL whose proof checker is smaller than most programs. HOL/Light will make it smaller. This route puts trust mostly in the formal specs with one, small, trusted executable instead of a pile of specs and code. Vast increase in trustworthiness.

@rain1 has a site collecting as many worked examples as possible of small, verified, or otherwise bootstrapping-related work on compilers or interpreters. I contributed a bunch on there, too. I warn it looks rough since it's a work in progress that's focused more on content than presentation. Already has many, many weekends worth of reading for people interested in Trusting Trust solutions. Here it is for your enjoyment or any contributions you might have:

https://bootstrapping.miraheze.org/wiki/Main_Page