Were there compromises in Scala's design because it targetted the JVM?

mike_esspe · on June 6, 2011

Java's type erasure is bad for matching List[SomeType], because information about SomeType is erased. This is my greatest nuisance of JVM with Scala.

bobbyi · on June 6, 2011

Scala had a CLR version from pretty much the beginning, so I don't think it was designed with the intention of exclusively targeting the JVM.

skrebbel · on June 6, 2011

The current CLR version is definitely not at production quality like the JVM version, and afaik it never was.

At a recent talk in the Netherlands, Martin Odersky also basically admitted that there's still a fairly long way until the CLR version is really at production quality, and even then it will probably keep lagging behind the JVM version.

So surely you're right about intention, but that's also about all it is.

vukk · on June 6, 2011

No lightweight threads, one has to resort to an evented approach (and lose the fault tolerance etc. afaik).

gtani · on June 6, 2011

The actors- hotspot tie is pretty big

http://stackoverflow.com/questions/2987408/actors-in-scala-n...

riffraff · on June 6, 2011

I am aware of one that is related to compatibility with java more than the JVM itself: no multiple dispatch (at least, this is what martin odersky said to me when I asked him why they are not in scala)

smcj · on June 6, 2011

The compiler can only optimize simple calls in tail position, so recursive algorithms can overflow the stack, because the JVM lacks the support for any optimization here.

No good IO/File API (because it needs support from the JVM for certain operations, which will only be added in Java 7).

No good Unicode support (it uses Java's java.lang.String).

Type erasure (although the consequences are not as bad as Java's).

Java's reflection API doesn't agree with Scala code sometimes.

jbooth · on June 6, 2011

java.lang.String is UTF-8 by default unless you call toBytes() with a different encoder.. how is this not "unicode" enough? (Honest question).

Type erasure is a huge pain though.

stephenjudkins · on June 6, 2011

java.lang.String uses UTF-16 internally. It's a wrapper around an array of 16-bit chars. Thus, it's possible for one to create a String which is NOT valid unicode by abusing surrogate pairs.

Thus, the JVM has no native datatype that represents a "valid unicode string". This is unfortunate, because if java.lang.String did enforce this it would let us make some helpful assumptions.

jbooth · on June 6, 2011

Well, every single way to read/write that string to bytes is unicode by default, unless you go out of your way to plug in a different encoder.

What are you trying to do, export a pointer and write the raw bytes to some destination while assuming it's correct unicode? If you're doing something that low level, it's always possible to corrupt your data and have invalid unicode, just set an invalid byte/rune somewhere in that byte string. Direct memory access always throws guarantees out the window.

It is annoying that getBytes() has to allocate and fill a byte array because of the mismatch between char/byte, but you can work around it when necessary and that's not really related to "not being unicode enough", if anything it's "too unicode" with the insistence on the char type for internal structure.

stephenjudkins · on June 6, 2011

No, one of the constructors for String takes only char[] as a parameter. You can pass in an arbitrary array of chars, even invalid UTF-16.

You're correct that well-written code should never do this. However, there is no guarantee that some library you're using doesn't. You can never assume that 'new String(oldString.getBytes("UTF-8"), "UTF-8").equals(oldString)', which has some unfortunate side-effects if you're doing anything involving serialization and equality.

I agree that Java's String API is generally quite well-designed, but the ability to access the raw UTF-16 is a very big leak in the abstraction.

Confusion · on June 7, 2011

If that ability was lacking, other people would be complaining about it. Abstractions should not prevent you from accessing the bits underneath: they should make it unnecessary. Which they never completely succeed in, because there are always fringe use cases you didn't foresee.

wolfgangK · on June 6, 2011

I would have sworn that it was utf-16 (which is still Unicode enough btw) .

jbooth · on June 6, 2011

Java chars are 2 bytes, but the default String encoding when serializing to/from byte[] arrays and streams is UTF-8 (on unix at least, may vary on windows).

jbooth · on June 6, 2011

Uh, not sure why this comment was controversial?