In any case, Java's stack-based bytecode is really a register machine in disguise. It isn't a general stack machine -- for instance, at a given "PC" the shape of the expression stack (i.e. the number and "basic" types of elements in it) needs to be constant.
The right way to look at Java's bytecode stream is opcodes for a register machine with _implicit_ input and output registers. Mapping it to a compiler IR would not be easy in the general case if that weren't true.
> But you said that the JMM guarantees that all reads in C_i - C_(i-1)
> will see writes in C_(i-1); this means that C_3 which reads
> `tuple.nonVolatileF` will see the write to that variable in C_2, no?
Talking about such things in English is ambiguous. The formal
statement is
"For any read r ∈ C_i −C_(i−1), we have W_i(r) ∈ C_(i−1) and W(r) ∈
C_(i−1)".
A clearer way to state the clause in English is "writes seen by reads
in C_i - C_(i-1) belong to C_(i-1);". I can justify an execution as
long as reads (ultimately) seen any write in the commit previous to
the one it is in, subject to happens before consistency. To prevent
causality loops, the JMM allows a read to see a write that happened
before it in the commit that introduces it. Then, to allow certain
interesting optimizations, the JMM allows us the bait-and-switch the
write a read saw -- "Each read r in C_i − C_(i−1) must see writes in
C_(i−1) in both E_i and E, but may see a different write in E_i from
the one it sees in E.".
The non-volatile read is _allowed_ to see the non-volatile write
(i.e. such an execution exists, this is the question I ask in the
exercise), but it doesn't have to. The transform in question is
illegal because the transformed program allows behavior that the
untransformed program didn't allow -- the transform breaks semantics.
> Since you're constructing an execution, why didn't you interleave
> the execution to make this trivially true?
I don't understand this, make /what/ trivially true? In any case, the
transformed program has a data race, and so observationally it doesn't
need to have a sequentially consistent execution. Specifically, it is
allowed to show behavior that cannot be described by any
interleaving of the instructions streams of the individual threads.
> Also, can't you wrap the writer in an atomic block during
> transformation?
> The non-volatile read is _allowed_ to see the non-volatile write (i.e. such an execution exists, this is the question I ask in the exercise), but it doesn't have to.
And this is why debugging this kind of issue in practice is fun: You have very little or non-obvious guarantees here.
If you think about constructing adversary proofs or decided that your program is a little gnome bound to make you insane, this can be rephrased as: In this example, the nonvolatile read can choose any value of any write as long as it is in a happened before relationship to the nv-read. It might pick every value, every second value, or just that one troublesome valu and never change again.
> - It doesn't address how you decide which edges to speculatively
> consider executable.
>
> - Optimizers intentionally do not play the "what if" game and try
> optimizing things multiple different ways to see what wins or what
> gains might be had. It simply gets too expensive (in compile time)
> too quickly.
My use case was that you already have a set of edges that profiling
tells you is "rarely taken", and you wish to know if eliding any of
those edges improves optimization. Normally JIT compilers end up
installing traps in most of these edges, hoping that the cost of the
occasional side exit to the interpreter is worth the added performance
in the fast case. I wanted a way to push the decision on whether to
install a trap for a specific edge to later in the optimization
process.
> - There are already other ways to achieve similar effect,
> e.g. superblock formation or simple tail duplication (which can be
> done for specific effect, e.g. see branch threading in LLVM).
I don't disagree here. I'll have to admit that this approach is
somewhat "out there", in its current form it doesn't seem very
practical.
> My use case was that you already have a set of edges that profiling tells you is "rarely taken", and you wish to know if eliding any of those edges improves optimization.
True, I somehow forgot that stipulation by the time I got to the end of the post.
(Disclaimer: I know nothing about Ruby, but I know some things about JIT compilers)
Another way to handle this is to assume that Fixnum#+ hasn't changed when compiling a method that is using it (maybe add a check at method entry); but when it does get redefined you "deoptimize" the methods that you compiled while holding that assumption.
A quick solution involves starting at a corner, and placing each queen one horizontal and two vertical (a constrained knight's move, basically) away from the previous one. This works for an 8x8 board, but does not generalize to an NxN board (it is easy to see that this won't work when N is divisible by 3). Determining which N's it exactly works for is a math, not a programming puzzle, however. :)
I'm from the same place as the artagnon. Anti-weed laws are almost never enforced, and weed is incredibly cheap (you'll get enough weed to get 10 people stoned out of their minds for the price of a bottle of beer). And the ease of storage means you can easily maintain a huge stash.
A "false" register dependency is a Write-after-read (WAR) [1].
As far as three address code making register renaming easier, I'm not sure what the author had in mind -- isn't `add $5, %rax` essentially a condensed form of `add $5, %rax, %rax` (which is a 3AC)? In fact, 3AC is _more_ general and should make register renaming harder if anything at all.
A careful compiler could avoid WARs more easily in 3AC than 2AC (by spacing reuse of registers), so an architecture without renaming could still extract ILP.
The right way to look at Java's bytecode stream is opcodes for a register machine with _implicit_ input and output registers. Mapping it to a compiler IR would not be easy in the general case if that weren't true.