The current CLR version is definitely not at production quality like the JVM version, and afaik it never was.
At a recent talk in the Netherlands, Martin Odersky also basically admitted that there's still a fairly long way until the CLR version is really at production quality, and even then it will probably keep lagging behind the JVM version.
So surely you're right about intention, but that's also about all it is.
I am aware of one that is related to compatibility with java more than the JVM itself: no multiple dispatch (at least, this is what martin odersky said to me when I asked him why they are not in scala)
The compiler can only optimize simple calls in tail position, so recursive algorithms can overflow the stack, because the JVM lacks the support for any optimization here.
No good IO/File API (because it needs support from the JVM for certain operations, which will only be added in Java 7).
No good Unicode support (it uses Java's java.lang.String).
Type erasure (although the consequences are not as bad as Java's).
Java's reflection API doesn't agree with Scala code sometimes.
java.lang.String uses UTF-16 internally. It's a wrapper around an array of 16-bit chars. Thus, it's possible for one to create a String which is NOT valid unicode by abusing surrogate pairs.
Thus, the JVM has no native datatype that represents a "valid unicode string". This is unfortunate, because if java.lang.String did enforce this it would let us make some helpful assumptions.
Well, every single way to read/write that string to bytes is unicode by default, unless you go out of your way to plug in a different encoder.
What are you trying to do, export a pointer and write the raw bytes to some destination while assuming it's correct unicode? If you're doing something that low level, it's always possible to corrupt your data and have invalid unicode, just set an invalid byte/rune somewhere in that byte string. Direct memory access always throws guarantees out the window.
It is annoying that getBytes() has to allocate and fill a byte array because of the mismatch between char/byte, but you can work around it when necessary and that's not really related to "not being unicode enough", if anything it's "too unicode" with the insistence on the char type for internal structure.
No, one of the constructors for String takes only char[] as a parameter. You can pass in an arbitrary array of chars, even invalid UTF-16.
You're correct that well-written code should never do this. However, there is no guarantee that some library you're using doesn't. You can never assume that 'new String(oldString.getBytes("UTF-8"), "UTF-8").equals(oldString)', which has some unfortunate side-effects if you're doing anything involving serialization and equality.
I agree that Java's String API is generally quite well-designed, but the ability to access the raw UTF-16 is a very big leak in the abstraction.
If that ability was lacking, other people would be complaining about it. Abstractions should not prevent you from accessing the bits underneath: they should make it unnecessary. Which they never completely succeed in, because there are always fringe use cases you didn't foresee.
Java chars are 2 bytes, but the default String encoding when serializing to/from byte[] arrays and streams is UTF-8 (on unix at least, may vary on windows).