No matter what wizardry your tools provide, it seems exceedingly unlikely that you're going to burn through the code at, say, 100 bytes per second without running the risk of missing something.
The fact that you are focused on insisting every byte of the code indicates that you are not yet familiar with the process of how this works.
By way of background, I have done security source code audits of systems on the order of 750,000 lines of code. This was done in a 12 week effort. The approach taken with source code review is possibly different than you think. One part of the approach is to look for patterns of code that are known vulnerability patterns, such as sql injection, or opening a socket. You then trace back to code paths that lead to that to determine how external input (that is user-controlled or attacker-controlled) can be use to trigger those vulnerable pieces of code. Another part of the approach is to look at each of the inputs (or interfaces) to the code to determine how those inputs can influence behavior of the program. One is likely to switch back and forth between these instances.
One key approach in looking at the code is to ask "can user input cause the program to choose one path of a branch or another." Another key approach is to ask "can the user's actions cause a change in one word of the programs memory." From that, an exploit can be crafted.
So you might well now ask "ok so that is source code. Object code is orders of magnitude more difficult." This is not really the case. The tools that 'tptacek mentioned take apparently impenetrable object code and transform it to assembly language (as well as to an intermediate language ESIL), and answer many questions about the static and dynamic nature of the code under inspection. Also, you can get differences of the call graphs from one version to the next. This trick was used to detect a vulnerability resolved by a Windows patch in a common library. It was noted through this tool that there was another use of this library elsewhere in the system that left the vulnerability in. This is without having source.
In fact, for the most serious level of analysis, one should go directly for the binary, as who knows how the source code actually corresponds with what binary actually gets shipped.
And it turns out one can effectively audit code for a language that one is not an expert in. The key elements are "where are the branches" and "what are the call graphs" and "what are the inputs and outputs".
In another thread, you note that you are an expert and that you are involved in the production of a security product. I am as well, having been in the software development business for 52 years, the last 10 in the security field, focusing on software security. And I can testify that these are two different fields of expertise. An expert in software development, even of security products, does not automatically mean that one is an expert in finding security flaws in code.
I've trained many software engineers in software security, and a key part of that training is to note that software engineering builds up programs and solutions by using previously developed abstractions, and making new abstractions that use existing ones. A penetration tester will develop skills in penetrating abstractions. It is a different way of thinking, a different kind of expertise. It is clear from your work that you are excellent at building up abstractions.
There are a couple of ideas that you are missing, I think. One is that evaluating the "rate of burn" through the bytes of a binary blob is a useful way to determine the difficulty of assessing the security. (Nor is it a useful way to evaluate software productivity) It is not necessary to look at every byte. Think of looking at every basic block. I suspect you will get a number that is different by two orders of magnitude than what you are currently thinking.
> you are not yet familiar with the process of how this works
That is true, I'm not, certainly not with the details. I hire people to do audits and pen-testing for me. But in order to be able to distinguish people who are competent from snake-oil salesman, I have to have a pretty good grip on some fundamentals, and I believe I do. So even though I have not used a modern decompiler, I certainly know that they exist, I know the fundamentals of how they work, and I know their limitations. I know, for example, that information is lost in the process of going from source code to object code to decompiled code, so the process of auditing decompiled code is necessarily strictly more difficult than auditing source code. I also know that the relative sizes of the source, object, and decompiled code is more or less linear. (A few things deviate from this, like C++ templates and unrolled loops, but it's true to first order.) So using a linear approximation to get an estimate of the amount of work required to audit a binary blob is not entirely unreasonable, particularly since the only question I'm trying to answer is whether or not it is even plausible that iOS could be effectively audited by an unauthorized third party given its size. (Note that even if the answer to that turns out to be "yes", that still doesn't demonstrate that iOS actually is being effectively audited.)
So:
> It is not necessary to look at every byte.
I never said it was. Nonetheless, unless some auditing technique is effective enough to change the (first-order) linear relationship between source code, object code, and decompiled code, or effective enough to introduce a radical linear multiplier (i.e. eliminate 90% of the code from consideration) the fact that not every byte needs to be examined is irrelevant. My first-order estimate will still be good enough to provide a plausibly correct answer to the question I'm asking.
> In fact, for the most serious level of analysis, one should go directly for the binary, as who knows how the source code actually corresponds with what binary actually gets shipped.
Yes, exactly right. Nonetheless, this process is easier if you have the (purported) corresponding source. More information is always better.
There is one more important point that you seem to have missed (along with everyone else): there is a big difference between trying to find a vulnerability that is the result of an inadvertent bug, and one that has been introduced deliberately by a competent adversary. Finding the latter is vastly more difficult than the former. Decompilers operate on heuristics. Heuristics can be fooled. One of the things that a competent adversary would do if they were trying to hide a back door would be to put it in a form that causes it to be hidden by decompiler heuristics. So this entire discussion about reverse engineering techniques is actually totally moot. Writing a decompiler that did not have this problem would require solving the halting problem, so even though I don't know the details, I know they have this shortcoming. Not only that, but I know how I could identify the specifics of this shortcoming, and I know how I could write code to take advantage of this shortcoming to conceal a back door. And if I know these things, then there are certainly people at Apple who know these things.
This is the reason that I am quite confident in my position despite my ignorance of the details of modern reverse engineering techniques. The halting problem gives the advantage to the attacker, and no technological advance is going to change that. It's like a second law of thermodynamics for security. I don't need to know the details of contemporary technology to know that, all else being equal, the person trying to hide a back door in a 1.6GB binary is very likely to beat the person trying to find it. And the problem is not just technological. As I've pointed out elsewhere in this thread, even if someone does find it, that knowledge will be extremely valuable. It is far from clear that anyone who succeeds in finding a back door in iOS would use that knowledge for the public good, particularly when you consider the personality types and economic circumstances that lead people to pursue a career in reverse-engineering in the first place.
This situation is ridiculously complex, and that complexity extends far beyond the size of the iOS binary. Anyone who is quibbling over my 1-byte-per-second estimate, or my personal proficiency with radare, is missing the point rather badly.
Your first-order estimate is not in fact good enough to provide a plausibly correct answer; you're not even in the ballpark of correct. The challenges facing reversers are in fact degrees different than those facing Apple itself, with its access to original source code; they are not entirely different kinds of challenges. And in neither case can the challenge be measured in "compiled image bytes reviewed per second".
Your arguments here pile non sequitur atop non sequitur. I'm left with the impression that this is a topic in which you've decided you're unwilling to concede anything. No doubt, if I challenged you about how difficult it is to reverse hardware, you'd pull some other weird rabbit out of your hat, like quantum states or interactions between circuits and cosmic rays.
Of course, if you'd led off your argument with "cosmic rays make all of information security in some ways unknowable", we'd all have simply said "sure, but that's besides the point". But that's not your argument. Your argument is that telcos should provide cryptographic security for telephone users, because Apple has insurmountable advantages against its users due to its access to its own source code. No, that doesn't make any sense.
I'm not taking it personally or anything. You've just gone on tilt. It happens to all of us. For what it's worth: I think you still share a Slack with several of us? You could ask this question there, and I'd be more comfortable responding in detail there.
I've already conceded that I am ignorant of many of the details of modern reverse-engineering techniques, so that is manifestly untrue.
> you're not even in the ballpark of correct
You keep saying that, and yet you don't back this up with any details or supporting arguments. In what way am I incorrect? Is my estimate too high? Too low? By how much? Am I wrong when I claim that a lower bound on the computational complexity of auditing is O(n)? If so, what is the correct result? Is it O(sqrt(n))? O(log(n))? O(1)?
BTW, I would actually love to be convinced that I'm wrong about this. That would be a huge two-fold win for me. It would mean that 1) I can stop worrying about security (as long as I use an iPhone) and 2) I would learn something new and almost certainly very interesting. But you (or someone) have to tell me how and why I am wrong, not just that I am wrong.
It would also help if you would stop stop advancing logical fallacies like this straw man:
> cosmic rays make all of information security in some ways unknowable
I don't understand why you're more comfortable discussing this on Slack, but OK, I've fired up my Slack client.
I don't want to pretend that there's a meaningful relationship between image size and reversing challenge, but to the extent there is, it's something more like O(log n).
The fact that you are focused on insisting every byte of the code indicates that you are not yet familiar with the process of how this works.
By way of background, I have done security source code audits of systems on the order of 750,000 lines of code. This was done in a 12 week effort. The approach taken with source code review is possibly different than you think. One part of the approach is to look for patterns of code that are known vulnerability patterns, such as sql injection, or opening a socket. You then trace back to code paths that lead to that to determine how external input (that is user-controlled or attacker-controlled) can be use to trigger those vulnerable pieces of code. Another part of the approach is to look at each of the inputs (or interfaces) to the code to determine how those inputs can influence behavior of the program. One is likely to switch back and forth between these instances.
One key approach in looking at the code is to ask "can user input cause the program to choose one path of a branch or another." Another key approach is to ask "can the user's actions cause a change in one word of the programs memory." From that, an exploit can be crafted.
So you might well now ask "ok so that is source code. Object code is orders of magnitude more difficult." This is not really the case. The tools that 'tptacek mentioned take apparently impenetrable object code and transform it to assembly language (as well as to an intermediate language ESIL), and answer many questions about the static and dynamic nature of the code under inspection. Also, you can get differences of the call graphs from one version to the next. This trick was used to detect a vulnerability resolved by a Windows patch in a common library. It was noted through this tool that there was another use of this library elsewhere in the system that left the vulnerability in. This is without having source.
In fact, for the most serious level of analysis, one should go directly for the binary, as who knows how the source code actually corresponds with what binary actually gets shipped.
And it turns out one can effectively audit code for a language that one is not an expert in. The key elements are "where are the branches" and "what are the call graphs" and "what are the inputs and outputs".
In another thread, you note that you are an expert and that you are involved in the production of a security product. I am as well, having been in the software development business for 52 years, the last 10 in the security field, focusing on software security. And I can testify that these are two different fields of expertise. An expert in software development, even of security products, does not automatically mean that one is an expert in finding security flaws in code.
I've trained many software engineers in software security, and a key part of that training is to note that software engineering builds up programs and solutions by using previously developed abstractions, and making new abstractions that use existing ones. A penetration tester will develop skills in penetrating abstractions. It is a different way of thinking, a different kind of expertise. It is clear from your work that you are excellent at building up abstractions.
There are a couple of ideas that you are missing, I think. One is that evaluating the "rate of burn" through the bytes of a binary blob is a useful way to determine the difficulty of assessing the security. (Nor is it a useful way to evaluate software productivity) It is not necessary to look at every byte. Think of looking at every basic block. I suspect you will get a number that is different by two orders of magnitude than what you are currently thinking.