Perhaps that did not come across in the article well. When I set out to do Leaf that was the goal of the work. It's wasn't a diversion to some other goals. Over time my priorities changed, and I was trying to force myself to follow Leaf to follow them.
Had I wanted to make a game from the start I would most certainly not have made a language first! That would be crazy. And that's my comments about sunk costs -- just because I have Leaf, it'd still be crazy to do this other projects in it.
Okay, they are classified but I don't see that anything in the setup, or pip package management, seem to actually use them. Are they somehow used in actual package management, or just search qualifiers?
You are correct, the classifiers are just search qualifiers.
However, since pip 9.0.0, it will respect the Requires-Python[0] metadata, which can be specified via your setup.py file.
Since PyPI allows only a single source distribution per release, though, this will still not get you separate modules for separate codebases as it seems you originally desired. As mentioned in a previous comment, it's generally preferred to support multiple versions in a single codebase via six.
That is not an actual migration guide. It's merely a meta-doc saying what types of things you need to pay attention to. I was looking for a specific list of things that would have to be migrated. The docs mention of reading the release notes for each versino of Python 3 isn't very helpful.
The `iter` form of the `readline` doesn't load the entire stream first, whereas the second form you posted does. This makes a significant difference when reading data from another program.
I saw that migration guide, but it has very few details about what is actually involved in migration. It doesn't include API change details.
Why do you think the `for` loop form loads the whole stream? That shouldn't be true, since `for` just calls `iter` on the object internally, and normal file-like objects by default are iterated line-by-line.
If you saw the migration guide but found it incomplete, you should edit your post to say that rather than saying there isn't one.
The simple syntax changes I did prior to running 2to3 as they came up on the first run I tried. The semantic changes that didn't come up in 2to3 is what I was more bothered about.
#1 has a big influence on my argument. I think it was Python and NodeJS library systems that made it super simple for people to publish libraries. There are a lot of trivial libraries out there, and many of questionable quality. Sometimes using a library ends up as more work than writing the bits of code myself.
It's not like I'm going to decide to write a new 20k line library for my project, but if something is only a few hundred lines of code it's definitely easier to just rewrite.
Those are some of the points I'm considering as well when doing the memory management for Leaf. My desire to have predictable destructors is tough to reconcile in a tracing GC.
I'm uncertain of what you mean with "copying the data over" in reference to large data. Surely it's still just the pointer being copied? Or are you speaking of things like `std::vector` in C++ that are value-copied by default?
In my quest for a memory management scheme on Leaf it's your final point which is primarily forcing my decision: the compatibility with other memory managers. In this article I just wish to provide counterpoint to those people trying to convince me that tracing GC is a better approach. In the end, it's the need to right libraries and be compatible that prevents me from using a complex GC.
Just keep in mind the trade-offs that you are making.
For starters, you are adding a considerable performance overhead to local pointer assignments; you're replacing a register-register move (and one that can be optimized away in many situations by a modern compiler through register renaming, turning it into a no-op) with a register-register move plus two memory updates plus a branch. Even where you have cache hits and the branch can be predicted correctly almost always, you're dealing with cache and BTB pollution; these effects may not fully show up in micro-benchmarks, but only completely manifest in large programs.
Second, adding concurrency to the mix can complicate matters a lot (depending on your concurrency model [1]). You may need additional atomicity guarantees for the reference count updates – which, while cheaper than a full-blown lock, is generally still more expensive than the purely sequential case; you preclude compiler optimizations that can eliminate the overhead of pointer updates; and you're still lift with the potential for race conditions (e.g. when one thread writes to a global variable or heap location while another thread is retrieving a pointer from the same location, and that pointer is the only remaining reference to an object).
Third, typical approaches to cycle detection and collection (such as trial deletion) reintroduce most of the challenges of tracing GC. While you may think that you can avoid cycles or handle them manually, they often crop up in unexpected places. A common case is when a closure is stored in an object that is also captured in the closure's environment [2].
I don't mean to discourage you from your decision – I don't know what your ultimate goals are and naive RC may very well be the best set of trade-offs you can make – I just want to alert you to possible trade-offs that you are making.
[1] Some popular concurrency models (such as Erlang's, Dart's, or Nim's) avoid shared references, so these issues would not even crop up for such languages, of course.
[2] I have a hypothesis that naive RC as an approach to memory management is particularly common in languages that do not (or historically, did not) support closures, but that's a different story.
What's the difference been naive RC and non-naive?
Would the above languages (Erlang, Dart and Nim) count as naive implementations?
I was under the impression Nim, at least, had pretty efficient RC implementation that performed favourably compared with a Boehm GC when considering single threaded performance. Is this accurate?
By naive I refer to RC implementations that simply instrument pointer assignments with the necessary reference counting instructions. Non-naive RC implementations try to minimize the overhead associated with it; deferred reference counting or having the compiler optimize away unnecessary reference count updates are typical approaches.
Erlang and Dart have tracing GCs; I was referring to their concurrency model (which uses thread-local heaps), not their method of garbage collection. Nim uses either deferred reference counting or a simple mark-and-sweep collector, depending on a compile time switch, but also uses thread-local heaps. The main attraction of Nim's deferred RC collector is that it can keep pause times very low; I haven't actually benchmarked its amortized performance, but I expect it should be competitive with or better than the Boehm GC, especially for large heaps.
Thanks for your answer. "Garbage collection" seems to cover a huge range of actual implementation approaches and sometimes it's hard to see the pros and cons of each approach when they're all lumped under the same thing.
In my mind, 'stop the world' pauses are the biggest disadvantage of some GC approaches, so it's nice to hear that deferred RC ameliorates that to some extent.
Had I wanted to make a game from the start I would most certainly not have made a language first! That would be crazy. And that's my comments about sunk costs -- just because I have Leaf, it'd still be crazy to do this other projects in it.