The thing I realized, that may not be obvious, is that this allows for a “real” Optional type in Java, where the Option itself can not be null. This will allow us to remove bare null usage from existing and future Java codebases, without the annoyance of checking if the Option itself is null.
I hope that there will be a compiler flag at some point to enable a warning or compiler failure on any bare null usage.
One thing with the current proposal that I haven't quite figured out is that it seems to me that the decision of whether a certain type is an "object" or a "value" is pushed to the use site rather than definition site, so the same type can be used as both object and value at different sites — in a way that doesn't quite match the intuition for boxed types in, say, C++.
I believe this is because of compatibility concerns when types migrate from object to value or vice-versa. The linked doc[0] says "operations can differ between value and reference types [...] This means that, if we are to support migration, we need a way for legacy classfiles to make a local decision to treat a given type as a reference type".
It is agreed in general that whether something is a value type is a definition-side attribute. However if code was compiled expecting the Optional value (e.g. does an ==) but it is now an Optional object, what happens? The identity check is different than the value-equals one. The doc goes into more detail on some differing things here.
I haven't studied the arguments or read the doc fully, but I'd say if you make it where it can only migrate one direction, object types to value types, you can solve the two big issues of equality (equals() is intercepted by the JVM for value types to imply ==) and nullability (value is never null, == null becomes always false). There may be other issues with hashCode, use as mutex in synchronized block, etc.
Maybe another approach is to have the value type implement "ObjectLike" if it wants to exist in the L-world. There needs to be a separation between what an "object type" is and what an "object type" can do. The latter causes migration concerns, the former is just about state and copying/reference semantics.
Sure, classes are generally designed for either being used as value types or OO Objects, you can prevent objects from being copyied/moved and if you really go out of the way even from being created outside of the heap, but in general the value vs reference semantic is left at point of use in C++.
Ever since the JDK 1.0 days, I didn’t get why there was this C/C++ carry-over inconsistency of manually-boxed types and non-Objects primitive types separated from Objects. A type hierarchy, patterned similar to Ruby’s as an example, makes the most sense:
- Object contains a Class that it derrives from (no BaseObject or Modules)
- Class is an Object
- String is an Object
- Boolean is a two-value singleton of true and false
- Number is an abstract subclass of Object
- Decimal and Integer are abstract subclasses of Number
- Float, Double,
LongDouble, BigDecimal are concrete subclasses of Decimal
- SignedByte, Byte, Short, UnsignedShort, Int, Unsigned, Long, UnsignedLong, Char, BigInt are concrete subclasses of Integer (or U/I/F## types reminiscent of Rust instead of C type names)
and so on.
Then there is no boxing/unboxing of simple types or literals because they are one-and-the-same, and no there’s no confusion about how to interact with any truly generic type of value.
I've heard theorists say exactly this a number of times before.
The reason Java isn't a 'pure' object-oriented language is simply performance.
Suppose everything - even every int - is heap-allocated. You now need a very sophisticated JIT compiler (as in, better than any we have today), or it's going to run dog slow. Having a huge number of needless allocations happening at every step is going to:
1. Slow things down by doing vastly more heap allocations that you otherwise would (even with Java's ultra-fast allocations)
2. Slow things down by doing violence to your code's locality and cache behaviour, because your ints no longer live in the stack
3. Slow things down by doing violence to your code's locality and cache behaviour, because Java objects are bloated compare to raw ints
4. Slow things down by hugely increasing garbage-collection pressure
If Gosling had taken that route, we wouldn't be talking about Java today.
> Suppose everything - even every int - is heap-allocated. You now need a very sophisticated JIT compiler (as in, better than any we have today), or it's going to run dog slow.
Most dynamic language implementations (JavaScript, Lisp, Smalltalk, even Ruby) use a tagged pointer representation allowing integers (and sometimes floats) to be encoded directly in the reference, avoiding heap allocation in this common case.
Another alternate model is to pass the type of a value separately from the value itself, and allow the value to be of variable size.
Java simply made the wrong tradeoff, and while it wasn't fully apparent at the time, there's no good defense of that decision today.
Most dynamic language implementations don't let me inline a bunch of 128-bit or 256-bit values in an array, or allocate them on the stack.
Code that deals with a lot of values which are small but larger than a machine word can be made a lot more efficient if there is a way to treat those values as not objects.
> tagged pointer representation allowing integers (and sometimes floats) to be encoded directly in the reference
I don't see how that could work for Java. Every Java object can be used as a mutex. This strikes me as very silly, but it's part of Java. Incidentally .Net went the same way, and I'm not the only person who thinks it was a silly decision for both frameworks [0]
Java also tags objects with type information for runtime checking, but that would play ok with tagged pointers as you're describing, as far as I can see.
Seems to me .Net generally has the right idea on types. Primitives are not objects, but you can do List<int> without autoboxing.
> Another alternate model is to pass the type of a value separately from the value itself, and allow the value to be of variable size.
Wouldn't that bloat the stack considerably? Wouldn't it be better to have a type-system that eliminates the need for that sort of thing?
> Java simply made the wrong tradeoff, and while it wasn't fully apparent at the time, there's no good defense of that decision today.
Are there any modern frameworks at all similar to Java, that do as you describe? Dog-slow dynamic languages aren't really the same beast.
Agreed, but they could have had a bit more value types love and AOT on 1.0 days, given the ongoing research of GC enabled languages for systems programming all the way back to CLU and Mesa/Cedar.
Oh well, at least in couple of years we will have them.
Languages do not need a 1 to 1 relationship between the storage medium of a value and the interface it exposes to a programmer.
Java's situation is even worse because it's a compiled language that does not need a JIT or even much sophistication from a compiler to keep a single hierarchy on its type system.
I suspect the reason Java did it was to not surprise C++ programmers. Solely dictated by marketing, not by technical reasons.
> Languages do not need a 1 to 1 relationship between the storage medium of a value and the interface it exposes to a programmer.
Sure, but that doesn't excuse the well-documented 'sufficiently-smart-compiler fallacy'.
The performance improvements that can be had by the escape-analysis/object-inlining family of JIT compiler optimisations, are considerable, but even today, production JVMs don't do a very good job. It's not an easy problem to solve well.
> I suspect the reason Java did it was to not surprise C++ programmers. Solely dictated by marketing, not by technical reasons.
I sincerely doubt it. You're wrong to dismiss the performance question.
> I suspect the reason Java did it was to not surprise C++ programmers. Solely dictated by marketing, not by technical reasons.
C++ is actually more uniform than Java in this respect because it allows one to define new value types, and also allows heap-allocation of "primitive" types such as int.
> Languages do not need a 1 to 1 relationship between the storage medium of a value and the interface it exposes to a programmer. ... I suspect the reason Java did it was to not surprise C++ programmers. Solely dictated by marketing, not by technical reasons.
The situation is almost the opposite of what you described it. If anything one of the major design points was to solve the surprises inherent to C/C++.
Take integers for example. In java they're defined to be 2s compliment; the processor architecture doesn't matter. C/C++ left the spec open to let the language be 1 to 1 with the medium, Java did not.
This is similar to all the issues with how threads interact with memory. Java lend the way in making the memory model a part of the language specification rather than allow it to be implementation (and architecture) defined.
> If anything one of the major design points was to solve the surprises inherent to C/C++.
Sure, but you're speaking past each other.
* Java was designed to be more predictable and have fewer dark corners than C++ (no undefined behaviour, precisely defined primitives and generally far less platform-specific behaviour, etc)
* Java was designed to feel familiar to C++ developers in order to aid adoption (specifically its syntax)
> Take integers for example. In java they're defined to be 2s compliment; the processor architecture doesn't matter. C/C++ left the spec open to let the language be 1 to 1 with the medium, Java did not.
There is a historical reason for this. In the early 1970s, when C was first designed, non-twos complement machines (such as CDC and UNIVAC machines) were still an important part of the industry and so it made sense for C to be designed to allow supporting those machines. By the 1990s, when Java was designed, non-twos complement machines were much less relevant, so it made sense to exclude support for them from the design of Java. Now finally, in the late 2010s, when the relevance of those machines has shrunk even further (although ones-complement Unisys mainframes still exist even today), it makes sense to remove that support from the C standard, even though it made a lot of sense when C was first designed.
Java has been around a long time now and machines were a lot slower in the JDK 1.0 days so they were probably considering the poor performance of requiring a minimum Object overhead.
And the distinction has worked elsewhere. Objective-C for instance has an NSObject in Cocoa but the language is a proper superset of C and has no problem with plain types. (Granted if you use low-level objcruntime functions, it will be more complex to pass in C data types that are not objects but they still work.)
Despite not requiring Object everywhere, early Java had a frustrating tendency to use Object for many basic things, seemingly requiring boxing and casting more than it should. (At least with Objective-C++ you had the option of some std:: container of plain data, if you didn’t want to wrap in some NS<container> type.) Both Java and Objective-C ultimately changed the syntax to allow type specifications in containers to compensate.
> Java has been around a long time now and machines were a lot slower in the JDK 1.0 days so they were probably considering the poor performance of requiring a minimum Object overhead.
Never mind the machines back when Java was 1.0 — one of the reasons Project Valhalla exists is that "the poor performance of requiring a minimum object overhead" is very much still a significant cost!
The biggest reason is the pressure on HFT scenarios that want to migrate to Java and see it as a blocker, and the new kids on the block that offer better support for such scenarios.
Almost correct, except that Sun ignored the prior work with languages like Eiffel, Sather, Modula-3, Oberon variants.
In Eiffel for example, you can declare what is the default behavior (reference or value type), and then any user of the type is able to still override it at declaration site.
Maybe, but the fact that they feel the market pressure to add them to stay competitive on modern hardware vs what .NET did a few years later, means it wasn't properly right as well.
Java was released in Jan 1996, .NET in Feb 2002. A lot changed in those six years. .NET built on the shoulders of the giant that was Java, as well as the lessons from the globally accelerating development enabled by the internet.
Those early decisions may not have been technically right in some sense, but technical considerations are not the only factor.
Mesa/Cedar was developed at Xerox PARC in 1981 and is cited as one of Java's influences.
One of its descendants, Modula-3 designed around 1986, is equally cited as yet another Java reference.
Both of them, had value types and compiled ahead of time into native code. Although Mesa/Cedar was native in the Lisp Machine sense, given the Xerox PARC microcoded CPU architectures.
There are some old posts from Gosling where he discusses about Eiffel/Sather features.
So the knowledge was there, it was just a bit disappointing not getting them, because as you say technical considerations were not the only factor.
Note that the first version of Java was delivered under tremendous time pressure. I guess they would have done a few things differently if the Sun engineers had more time.
> this C/C++ carry-over inconsistency of manually-boxed types and non-Objects primitive types separated from Objects
C++ allows user-defined value types that will behave like fundamental types.
And it has raw pointers/references to them.
Among all, there was a user-defined value type invented: the shared pointer (which works kinda like a GC, or exactly like a GC if you implement it and throw RAII out of the window).
Java took only the fundamental types and a shared pointer. No wonder that there are some parts missing.
Common Lisp has exactly this and it's a damn sight faster than Ruby and Java. Java is the way it is because it was designed to be marketed from the start. And that marketing is the only reason it is popular.
> Then there is no boxing/unboxing of simple types or literals because they are one-and-the-same, and no there’s no confusion about how to interact with any truly generic type of value.
It's simply performance/space. It's why value types still exist in Java, C#, etc.
> A type hierarchy, patterned similar to Ruby’s as an example, makes the most sense:
The type hierarchy doesn't matter. That's what C# does. It has one unified type system where even int, double, etc all ultimately derived from Object and it still have value types. It's a language/compiler design issue.
> Then there is no boxing/unboxing of simple types or literals because they are one-and-the-same, and no there’s no confusion about how to interact with any truly generic type of value.
In ruby there is no unboxing, but you could argue that all simple types are "lightly boxed" because they are objects by default.
My personal ask for the language designers is somewhat similar, but not quite this (it seems). I'd like to have either the language support for builders or setters where after creation an object becomes immutable.
The options we have right now are bad: the long constructor arguments are too verbose, the setters are causing non-immutability and builders are worst due to lack of discoverability and standards.
That sounds like a need for named and default constructor parameters. With the two of those, you can mix-and-match whatever values you want, with non-defaulted ones being required. And since the parameters are named, they self-document at the usage site.
I know it's not quite what you're asking for but Java's annotations allow for pretty reasonable and compose-able preprocessors to implement things like this. Checkout lombok's @Builder annotation https://projectlombok.org/features/Builder
I hope that there will be a compiler flag at some point to enable a warning or compiler failure on any bare null usage.