On (1), all objects in Java are references, being a pointer to a pointer effectively. This indirection has a cost but has been incredibly useful historically because it enables moving allocation around the address space because none of the references change. They're effectively just pointers to a lookup table and it's the lookup table that changes.
As for the example of packing an array of Points into a contiguous block of memory to avoid the dereferencing, I'll be curious to see what the proposal is here. Now, for example, if you have a collection of Points (x,y), you can put in a 3DPoint (x,y,z) that extends Point. You can't do that if you've flattened storage, not without some overhead at least.
For years it's been the dream in Java to have true value Object types. You can do this in C++, which also allows you to allocate onto the stack. My position is that the complexity cost of this is massive. Copy constructors, move constructors, assignment operators, implicit constructors, implicit casts, etc. I mean you can have class A with instance a1 on the stack and a2 on the heap and pass &a1 and &a2 to a A* parameter and the callee has absolutely no idea of the lifetime of those or where they're allocated.
But still, it would be nice to pass around an SHA1 hash as a 20 byte value instead of a reference to an object that contains a reference to an array that contains 20 byte elements.
As for (2), this was a deliberate design choice in Java 5 (1.5) when generics were added. The decision was made, for better or for ill, to retain backwards compatibility through type erasure such that a List<T> is a List. This decreased the pain of upgrading and legacy code but created this ugly corner of primitive types.
C# 2.0 came along later and decided to go the other way such an IList<T> is not an IList (IIRC the types; I'm not a C# guy).
There are pros and cons.
The fact that certain values are guaranteed to satisfy reference equality is kind of weird and (IMHO) confusing ie new Integer(127).equals(new Integer(127)) is true but new Integer(128).equals(new Integer(128)) isn't necessarily true. I guarantee you a lot of people don't know that.
> On (1), all objects in Java are references, being a pointer to a pointer effectively. This indirection has a cost but has been incredibly useful historically because it enables moving allocation around the address space because none of the references change. They're effectively just pointers to a lookup table and it's the lookup table that changes.
There's only one level of indirection. References are simply pointers. You can't see the raw bits of the pointer without Unsafe, but that's how the Sun JVM implements references. (The implementation calls them "ordinary object pointers".) GC rewrites these pointers when it moves objects; there is no separate lookup table of all objects.
You might have been mislead by the existence of Object.identityHashCode(). The identity hash code is not a memory address and not guaranteed unique. It's an arbitrary value stored in some bits of the object's mark word and copied around when the object moves. That's how it remains stable across GCs.
Or you might be thinking of "compressed oops". Those are still pointers, but encoded as (object_address - start_of_heap) >> 3 to save space.
I think there is a global lookup table for interned strings and symbols loaded from class files, but slots in that table are not themselves pointed to by references.
IMO C# got this right. Admittedly I prefer C#, but working in Java right now type erasure really confuses me, especially working with streams. I can/will learn but it seems like a bad model.
That's definitely interesting, but I feel like C#'s approach of just creating a new set of containers for generics aged way better. It feels silly that I'm paying a tax in 2021 on a decision made in 2004 to be backwards compatible to code written in like 1996
> The language actually provides quite a strong safety guarantee for generics, as long as we follow the rules:
If a program compiles with no unchecked or raw warnings, the synthetic casts inserted by the compiler will never fail.
Huh. That was written in 2020, four years after it was shown how to write a very small program that "compiles with no unchecked or raw warnings", and yet "the synthetic casts inserted by the compiler" will fail at run time [0]:
class Unsound {
static class Constrain<A, B extends A> {}
static class Bind<A> {
<B extends A>
A upcast(Constrain<A, B> constrain, B b) {
return b;
}
}
static <T, U> U coerce(T t) {
Constrain<U, ? super T> constrain = null;
Bind<U> bind = new Bind<U>();
return bind.upcast(constrain, t);
}
public static void main(String[] args) {
String zero = Unsound.<Integer, String>coerce(0);
}
}
The sample program doesn't compile: Unsound.java:16: error: method upcast in class Bind<A> cannot be applied to given types; return bind.upcast(constrain, t);
As the article says, it was an unfortunate side effect of trying to maintain compatibility.
I imagine C# was able to learn from what was going on with Java at the time and not have to suffer the same fate. Or maybe it was just less popular and could force the issue (I’ve never dealt with it so I don’t know the history well).
The main use case of generics is collections, and if my memory serves me right, in .NET they simply created a new collections library (System.Collections.Generics) leaving the original intact (System.Collections) allowing old programs to work as before. It lacked 100% compatibility because you couldn't freely interchange the classes (without writing adapters) but from what I gathered, it was a small price to pay compared to type erasure (in my opinion) which prevented more aggressive runtime optimizations/evolution. Today you usually find the old collections in ancient software which hasn't been updated for years.
As it is usually brought up, not erasing types also comes with some potential cons: namely making CLR languages dependent on C#’s chosen variance model.
In Java, List<Cat> may or may not be the subclass of List<Animal>, but it is up to the List implementor. This way Scala/Kotlin/another JVM language is free to define their own variance model independent of the host language. C# did limit their language ecosystem with it quite a bit. (Afaik Scala for CLR stopped in part due to this).
In C# I believe you can optionally mark generic classes/methods as covariant or contravariant. Is that not enough or does it not get exposed in the CLR or something?
How do two languages with two different variance models interoperate in JVM if they share the same type but expect different behavior? Is it safe to share a list created in one language and pass into another if their variance models differ? Having an explicit variance model makes cross-language interoperability safer and easier (which was one of the main selling points of the Common Language Runtime), doesn't it?
Simply from an empirical point of view, the CLR is pretty much a desert compared to the flourishing JVM one so while it can be attributed to many things, I am really sure that explicit variance is not that attractive for language developers.
Yeah, java can change the variance model used by generics, but my point is that it is a language-level feature, not something fundamental at a JVM level, which is imo the correct decision.
Also, unfortunately arrays are covariant so Cat[] is a subclass of Animal[] (both in Java and C# actually), where your mentioned example indeed introduces a “poisoned” value, waiting for a classcastexception.
> They're effectively just pointers to a lookup table and it's the lookup table that changes.
What makes you think this is the case? Pointers to objects are direct, and garbage collectors freely change pointer values whenever it decides to relocate objects. Some collectors might use temporary forwarding pointers, to support concurrent collection.
1. No object value types; and
2. Boxing and primitives.
On (1), all objects in Java are references, being a pointer to a pointer effectively. This indirection has a cost but has been incredibly useful historically because it enables moving allocation around the address space because none of the references change. They're effectively just pointers to a lookup table and it's the lookup table that changes.
As for the example of packing an array of Points into a contiguous block of memory to avoid the dereferencing, I'll be curious to see what the proposal is here. Now, for example, if you have a collection of Points (x,y), you can put in a 3DPoint (x,y,z) that extends Point. You can't do that if you've flattened storage, not without some overhead at least.
For years it's been the dream in Java to have true value Object types. You can do this in C++, which also allows you to allocate onto the stack. My position is that the complexity cost of this is massive. Copy constructors, move constructors, assignment operators, implicit constructors, implicit casts, etc. I mean you can have class A with instance a1 on the stack and a2 on the heap and pass &a1 and &a2 to a A* parameter and the callee has absolutely no idea of the lifetime of those or where they're allocated.
But still, it would be nice to pass around an SHA1 hash as a 20 byte value instead of a reference to an object that contains a reference to an array that contains 20 byte elements.
As for (2), this was a deliberate design choice in Java 5 (1.5) when generics were added. The decision was made, for better or for ill, to retain backwards compatibility through type erasure such that a List<T> is a List. This decreased the pain of upgrading and legacy code but created this ugly corner of primitive types.
C# 2.0 came along later and decided to go the other way such an IList<T> is not an IList (IIRC the types; I'm not a C# guy).
There are pros and cons.
The fact that certain values are guaranteed to satisfy reference equality is kind of weird and (IMHO) confusing ie new Integer(127).equals(new Integer(127)) is true but new Integer(128).equals(new Integer(128)) isn't necessarily true. I guarantee you a lot of people don't know that.