More

monopede · on Nov 30, 2012

Note that Dalvik is actually not a very good trace JIT. There's a recent paper comparing Dalvik to an embedded version of Hotspot. The Dalvik interpreter was slightly faster (~5%), but the JIT speedup was only 3-4x whereas it was ~12x for the method JIT. The reason is that Dalvik's traces are way too short and cannot even span method boundaries. That means little scope for optimisation and a lot of code just to load the state from the stack into registers and then write it back. Weirdly enough, Dalvik also generates more code than the method JIT, yet low amount of compilation overhead was their main goal.

Fortunately for Dalvik, most applications actually spend most of their time in native code.

The paper (behind ACM paywall): http://dl.acm.org/citation.cfm?doid=2388936.2388956

monopede · on Nov 30, 2012

I don't think assemblers written in assembly were that bad. LuaJIT 2 uses direct threading (not new at all), register-based bytecode (relatively new), and manually optimised register assignment (perhaps new). AFAICT, the key innovations are that he did not use Lua 5.1's register-based bytecode format, but simplified it even further so it can be decoded efficiently on x86. The second key component is that he pre-decodes the following instruction in order to take advantage of out-of-order execution. This technique also required fixing some register's roles.

Don't get me wrong, I think LuaJIT2's interpreter is great, but interpreters before LuaJIT2 weren't complete crap, either. Many emulators, for example, have very good interpreters written in assembly (some aim to be cycle-accurate).

pbiggar · on Nov 30, 2012

I was trying to describe how it looked from an academic standpoint. Direct threading and register bytecode was well known (register stuff is actually very old, but the jury was out until about 2003), but everything else Pall did was basically new to programming language researchers and implementers.

monopede · on Nov 19, 2012

Moore's law: Transistor count doubles every 18 months. Assuming 40 years at the company (it's a bit less) that's: 2^(40/1.5) = 67,108,864

The 62 Core Xeon Phi has 5 billion transistors. Divided by that number is: 149. That's a bit low. The Intel 8080 that came out in 1974 (when he joined the company) had 4,500 transistors. Looks like Moore's law has been slowing down a bit in recent years. Probably due to the focus on reduced energy usage.

EDIT: It works out if you replace "18 months" by "24 months". Revised value: 2^(38/2) = 524288. Impressive still.

Aardwolf · on Nov 20, 2012

2^(40/1.5) is 106528681, just like I posted...

Which calculator are you using? Does it use a Pentium CPU with the famous division bug perhaps? :)

monopede · on Nov 21, 2012

I was using the "bc" command line calculator. Turns out it's rounding the exponent. The revised number is correct, though.

Aardwolf · on Nov 27, 2012

Note to self: never use the bc command line calculator.

monopede · on Nov 18, 2012

Would make sense. Michael Bender is a co-author on that paper and he apparently works for Tokutek, judging by this presentation: http://www.bnl.gov/csc/seminars/abstracts/Bender_Presentatio...

ot · on Nov 18, 2012

He is a co-founder: http://www.tokutek.com/about/team/

monopede · on Aug 9, 2012

As I understand this, this wouldn't automatically protect you from race conditions. If you have a shared variable x then a statement like

  x += 5

may be fine depending on whether it is implemented as a single bytecode instruction or not. However, more complicated updates are still subject to races:

  x = somefunction(x)

It is only safe if you use:

  with thread.atomic:  x = somefunction(x)

Having a serialisation doesn't mean it's the right serialisation.

That said, the fundamental problem is shared mutable state, and I don't see an easy way around that in Python. In that sense, this is probably easiest to work with.

apendleton · on Aug 9, 2012

Right, this is no different than it would be programming on current Python with the GIL. There are two different levels of locking in current Python implementations: locks within the implementation of the interpreter (the GIL for cpython and current pypy, or all the micro-locks in Jythin), and the locks accessible from the Python environment by Python programmers (e.g., the locks in the threading module: http://docs.python.org/library/threading.html#lock-objects ). The pypy STM project attempts to replace the first set to allow for simultaneous multi-core use while making it seem to the programmer like the GIL is still there, but if the programmer needs things like a specific execution order, it will need to managed by hand, the same way it always has.

monopede · on July 22, 2012

"I'm thinking of doing a PhD in Germany"

I'm not discouraging, but once you're considering a particular position, please try and talk to previous PhD students of your prospective supervisor. Germany has hardly any fully-funded PhD positions. Instead you typically get employed as staff and have to combine that with your PhD work. The problem is, often professors take on lots of PhD students to get more funding and end up having hardly any time. I've heard stories of a professor having 8 or so PhD students that still wait on him to read their thesis and give the go-ahead for their viva (PhD defense). I'm not saying this is the case everywhere, but it's something to be aware of. So try and find out beforehand.

radicalcut · on July 22, 2012

Thanks a lot for advice. I have still more than a year ahead before I need to make a decision so there's enough time to ask around when I narrow my choices. I haven't even made my mind whether I want to get into neuroscience or stay in molecular/cell biology.

Sadly, the situation as you decribed is almost the same with PhD students in the lab where I currently work on my diploma thesis and in some around in the institute. Having seen what I've seen (troubles with disertations and giving the PhD defense go-ahead) that's definitely something I want to avoid.

monopede · on June 2, 2012

The blog author didn't actually have two factor authentication enabled. The headline is wrong.

Strom · on June 2, 2012

You are incorrect, quote from the article:

all CloudFlare.com accounts use two-factor authentication. We are still working with Google to understand how the hacker was able to reset the password without providing a valid two-factor authentication token.

boundlessdreamz · on June 2, 2012

He didn't have it for his personal account. But he did have it for cloudfare.com accounts.

monopede · on May 31, 2012

Actually, she just spend a few weeks practising with an Olympic competitor, i.e., nowhere near 6 months. I'm not an expert on these things, but it seemed to vary from scene to scene. Someone commented on the trailer and explained how it's very accurate, but there were a few shots in the movie where what the expert explained seemed missing (like "kissing" the string).

monopede · on April 22, 2012

You can build a library for doing Erlang-style programming, but there are lots of features that you can't reproduce.

(1) In C++ you may get hard crashes, so to get reliability you need to use high-overhead OS processes. Erlang allows you to hundreds of thousands (yes, really) of processes because they are really lightweight (76 words + heap/stack). Also, copying within the same address space has much lower overhead (no system call/context switch) and serialisation is much simpler.

(2) Because of immutable data you can do have nice fault tolerance as follows. Imagine you have a request handler function that takes a state and a request. However, the request is malformed or triggers a bug in the handler. In a language that doesn't enforce immutability you have to restart the handler process because the state may be in an inconsistent form. In Erlang you would just discard the request (or send an error message) and handle the next request with the unmodified state.

(3) Pattern matching makes receiving a message very convenient.

I probably forgot a few things, but this gives you an idea.

monopede · on April 11, 2012

That depends on the kind of scientific computing you want to do.

Haskell has the repa library [1] which is very nice for working with (multi-dimensional) arrays at a high level. Performance is decent (I don't know if they have a BLAS/LAPACK binding). Overall, the main advantage of Haskell is its runtime system and its great support for concurrency. The downside is, it does not have OpenMP and the MPI bindings don't look very nice to use (I don't know how OCaml or SML fare in this area). There are OpenCL bindings, but I've never used them. Data parallel Haskell is still under heavy development, so that's probably going to take a few years to become production-ready.

OCaml's advantage is that C-like algorithms are easier to transcribe and use (no monads). OCaml's main disadvantage is that its runtime doesn't support multicore well (or even at all?). If you want that you can use F#, though.

I don't know anything about the current state of SML implementations.

[1]: http://www.haskell.org/haskellwiki/Numeric_Haskell:_A_Repa_T...

srean · on April 11, 2012

Answering a sub question of yours: Indeed the only way to use more than one core in OCaML is to use multiprocessing. If there is a lot of data that needs to be exchanged, it may not be very fast.

That said there is this patched up version (funded by a one off summer of code by Jane street, I think)

http://www.algo-prog.info/ocmc/

that gives an API for using threads. I am fairly new to OCaML so will not be able to provide details. Another language that I am looking at is Felix

http://felix-lang.org:8080/ (Note the port, its not the one that the search engines will give you).

I am ok with OCaML not giving its users a threading API but a runtime that executes many of its higher-order functions in parallel would be really nice. Well, higher-order functions and the other parallelism exposed by the functional semantics, with some helpful directives from the user of course.

gtani · on April 11, 2012

There's been a lot of projects, of which the ocamlnet/netmulticore and Jane St async's are (I think, but not very confidently) the only current. Others are:

poly/ML, ocamlP3, OC4MC, functory, JoCaml

coThreads, LWT

http://www.reddit.com/r/programming/comments/q9cro/real_worl...

http://stackoverflow.com/questions/6588500/what-is-the-state...

carterschonwald · on April 11, 2012

There's also hmatrix for a pretty nice Haskell API over blas. The one current caveat is that because certain vector code currently uses GSL under the covers, the core hmatrix lib has to be GPL in turn (as GSL is GPL). That said, there's some work underway (by me and some others) to replace the offending pieces of code with some under bsd or other permissive license so that core hmatrix can be rereleased as a bsd licensed lib and thus see broader use.