More

thedigitalengel · on Dec 27, 2014

In any case, Java's stack-based bytecode is really a register machine in disguise. It isn't a general stack machine -- for instance, at a given "PC" the shape of the expression stack (i.e. the number and "basic" types of elements in it) needs to be constant.

The right way to look at Java's bytecode stream is opcodes for a register machine with _implicit_ input and output registers. Mapping it to a compiler IR would not be easy in the general case if that weren't true.

thedigitalengel · on Sept 9, 2014

Thank you for taking time out to comment! :)

> But you said that the JMM guarantees that all reads in C_i - C_(i-1) > will see writes in C_(i-1); this means that C_3 which reads > `tuple.nonVolatileF` will see the write to that variable in C_2, no?

Talking about such things in English is ambiguous. The formal statement is

"For any read r ∈ C_i −C_(i−1), we have W_i(r) ∈ C_(i−1) and W(r) ∈ C_(i−1)".

A clearer way to state the clause in English is "writes seen by reads in C_i - C_(i-1) belong to C_(i-1);". I can justify an execution as long as reads (ultimately) seen any write in the commit previous to the one it is in, subject to happens before consistency. To prevent causality loops, the JMM allows a read to see a write that happened before it in the commit that introduces it. Then, to allow certain interesting optimizations, the JMM allows us the bait-and-switch the write a read saw -- "Each read r in C_i − C_(i−1) must see writes in C_(i−1) in both E_i and E, but may see a different write in E_i from the one it sees in E.".

The non-volatile read is _allowed_ to see the non-volatile write (i.e. such an execution exists, this is the question I ask in the exercise), but it doesn't have to. The transform in question is illegal because the transformed program allows behavior that the untransformed program didn't allow -- the transform breaks semantics.

> Since you're constructing an execution, why didn't you interleave > the execution to make this trivially true?

I don't understand this, make /what/ trivially true? In any case, the transformed program has a data race, and so observationally it doesn't need to have a sequentially consistent execution. Specifically, it is allowed to show behavior that cannot be described by any interleaving of the instructions streams of the individual threads.

> Also, can't you wrap the writer in an atomic block during > transformation?

What purpose would that serve?

tetha · on Sept 9, 2014

> The non-volatile read is _allowed_ to see the non-volatile write (i.e. such an execution exists, this is the question I ask in the exercise), but it doesn't have to.

And this is why debugging this kind of issue in practice is fun: You have very little or non-obvious guarantees here.

If you think about constructing adversary proofs or decided that your program is a little gnome bound to make you insane, this can be rephrased as: In this example, the nonvolatile read can choose any value of any write as long as it is in a happened before relationship to the nv-read. It might pick every value, every second value, or just that one troublesome valu and never change again.

thedigitalengel · on Sept 9, 2014

> Should the first 'illegal' read legal?

Well it depends. :) I did mean "illegal", though there certainly are many illegal transforms that look legal and vice versa.

thedigitalengel · on June 16, 2014

Unfortunately, the solution ultimately proposed doesn't work in this case:

  int _a = 50;
  int result = min(num, _a);

you end up expanding to "int _a = _a;" which creates a new int _a, and assigns it to itself.

thedigitalengel · on May 26, 2014

Thank you for taking out time to comment. :)

> - It doesn't address how you decide which edges to speculatively > consider executable. > > - Optimizers intentionally do not play the "what if" game and try > optimizing things multiple different ways to see what wins or what > gains might be had. It simply gets too expensive (in compile time) > too quickly.

My use case was that you already have a set of edges that profiling tells you is "rarely taken", and you wish to know if eliding any of those edges improves optimization. Normally JIT compilers end up installing traps in most of these edges, hoping that the cost of the occasional side exit to the interpreter is worth the added performance in the fast case. I wanted a way to push the decision on whether to install a trap for a specific edge to later in the optimization process.

> - There are already other ways to achieve similar effect, > e.g. superblock formation or simple tail duplication (which can be > done for specific effect, e.g. see branch threading in LLVM).

I don't disagree here. I'll have to admit that this approach is somewhat "out there", in its current form it doesn't seem very practical.

rudkx · on May 26, 2014

> My use case was that you already have a set of edges that profiling tells you is "rarely taken", and you wish to know if eliding any of those edges improves optimization.

True, I somehow forgot that stipulation by the time I got to the end of the post.

thedigitalengel · on April 8, 2014

(Disclaimer: I know nothing about Ruby, but I know some things about JIT compilers)

Another way to handle this is to assume that Fixnum#+ hasn't changed when compiling a method that is using it (maybe add a check at method entry); but when it does get redefined you "deoptimize" the methods that you compiled while holding that assumption.

vidarh · on April 9, 2014

That's what this referred to:

> or you need to record every call-site with inlined code and be prepared to overwrite it with fixups if the implementation changes.

thedigitalengel · on April 6, 2014

Related: https://github.com/sanjoy/bfjit (a brainfuck interpreter with a tracing JIT for hot loops).

thedigitalengel · on Jan 1, 2014

A quick solution involves starting at a corner, and placing each queen one horizontal and two vertical (a constrained knight's move, basically) away from the previous one. This works for an 8x8 board, but does not generalize to an NxN board (it is easy to see that this won't work when N is divisible by 3). Determining which N's it exactly works for is a math, not a programming puzzle, however. :)

thedigitalengel · on Aug 8, 2013

I'm from the same place as the artagnon. Anti-weed laws are almost never enforced, and weed is incredibly cheap (you'll get enough weed to get 10 people stoned out of their minds for the price of a bottle of beer). And the ease of storage means you can easily maintain a huge stash.

jlgreco · on Aug 8, 2013

Where is this wonderful place?

artagnon · on Aug 8, 2013

Any Indian village.

Come over sometime ;)

apalmer · on Aug 8, 2013

where are you from? you can get 10 people stoned on marijuana for USD$ 3.00? Thats not true in the US

thedigitalengel · on Aug 1, 2013

A "false" register dependency is a Write-after-read (WAR) [1].

As far as three address code making register renaming easier, I'm not sure what the author had in mind -- isn't `add $5, %rax` essentially a condensed form of `add $5, %rax, %rax` (which is a 3AC)? In fact, 3AC is _more_ general and should make register renaming harder if anything at all.

[1] http://en.wikipedia.org/wiki/Register_renaming

Scaevolus · on Aug 1, 2013

A careful compiler could avoid WARs more easily in 3AC than 2AC (by spacing reuse of registers), so an architecture without renaming could still extract ILP.

It's not a general or optimal solution, though.