I recently found a small issue with `uv run`. If you run a script from outside the project folder, it looks for the pyproject.toml on the folder from which you are calling `uv run`, not on the folder where the python script is located (or its parents)! Because of that scripts that store their dependencies in a pyproject.toml cannot be run successfully using a “bare” `uv run path/to/my/script.py` from outside the project folder.
You can work around this surprising behavior by always using inline dependencies, or by using the `--project` argument but this requires that you type the script path twice which is pretty inconvenient.
Other than that uv is awesome, but this small quirk is quite annoying.
I am one of the Arraymancer contributors. I believe that what mratsim (Arraymancer’s creator) has done is pretty amazing but I agree that the scope is a quite ambitious. There’s been some talk about separating the deep learning bits into its own library (which I expect would be done in a backwards compatible way). Recently we worked on adding FFT support but instead of adding it to Arraymancer it was added to “impulse” (https://github.com/SciNim/impulse) which is a separate, signal processing focused library. There is also Vindaar’s datamancer (a pandas like dataframe library) and ggplotnim (a plotting library inspired by R’s ggplot). The combination of all of these libraries makes nim a very compelling language for signal processing, data science and ML.
Personally I’d like Arraymancer to be a great tensor library (basically a very good and ideally faster alternative to numpy and base Matlab). Frankly I think that it’s nearly there already. I’ve been using Arraymancer to port a 5G physical layer simulator from Matlab to nim and it’s been a joy. It’s not perfect by any means but it’s already very good. And given how fast nim’s scientific ecosystem keeps improving it will only get much better.
I recently started using nim for some little side projects and I am having so much fun. Somehow they’ve managed to combine the strengths of high level languages such as python with those of low level, systems languages like C with few (if any) of the drawbacks. It’s pretty amazing to be honest.
Programming in nim feels a lot like programming in a statically typed Python, except that you get small, fast, single file executables out of the box.
Compared to C++ you don’t need to worry about memory allocations too much except when you really want to, and when you do it is much simpler than in C or C++. It is also much easier to control mutability and the type system is much better. Also you can use most of the language at compile time in a much more elegant way than in modern C++.
Another surprising thing is that while the ecosystem is obviously much smaller than on more popular languages, the fact that it has a built-in package manager gives you access to it much more easily than in C++. There are a lot of high quality packages (such as datamancer, arraymancer and ggplotnim) which makes nim very productive in a lot of domains.
That’s not even including advanced things such as nim’s excellent macro system which I personally don’t use (but which enable some of the best nim libraries).
Oh, and I _love_ nim’s uniform function call syntax. Every other language should copy that feature.
I almost forgot to list the drawbacks. The main one of course is that it is not the most popular language (but the ecosystem is big enough that I didn’t find it is a big problem for my use case in practice). Other than that the editing and debugging experience could be improved. There is a decent VS Code plug-in (look for the one made by “saem”) but it is just OK, not great. There is some integration with gdb but it is a bit rough. I usually end up adding logs when I need to debug something.
The nim team is currently working on removing the garbage collector by means of a new reference counting based “garbage collector mode” called “arc” (for automatic reference counting). You can get more info in the following link, where it is described as “plain old reference counting with optimizations thanks to move semantics”:
The objective is to make nim suitable for embedded programming and other use cases for which garbage collection is a non starter.
This new —gc:arc mode is already available in the nightly builds and the benchmarks are already impressive. I believe that the plan is to make arc the default “garbage collector mode” in nim 1.2.
One of the issues with Nim is that ARC is still a form of automatic memory management, of which many domains do not want. This is why Odin has been designed to take advantage of custom allocators so that programmers have a huge control of how memory is allocated at all levels. And coupled with the `context` system, you can also have control over and track third-party code in how it allocates things.
Custom allocators are a pleasure to use, and allow for so much control. Along with the temporary allocator, you can make Odin feel like it's a dynamic language whilst being extremely fast. Custom allocators are an under-utilitized thing in programming in general, and I hope more people release what is possible with them that is not possible with automatic memory management schemes.
I write C++ for video games, and garbage collection is often shunned in performance-critical systems since it's often nondeterministic and GC pauses can become too long. I think garbage collection could work in games, but it's commonly implemented for use cases that are not video games (eg: servers, desktop applications, etc.)
A programmer writing managed C# in an engine like Unity will often spend a lot of programming time ensuring that the code will not allocate every frame, as the additions to the heap will eventually trigger a pause.
That said, every game and its requirements are different, and some game development might not mind that as much. A C++ engine programmer on a Nintendo Switch is in a very different situation than a hobbyist in JavaScript or a server backend programmer on a mobile game.
Just like doing virtual calls or using OS memory allocator used to be shunned by early C++ adopters on the game industry.
I still remember when doing games in Basic/Pascal/C was considered to be like Unity nowadays, real games had to be written in Assembly.
As you say, every game and its requirements are different, and many 8/16 bit games were perfectly doable in C, Pascal, Basic and eventually the community moved along, just like it happened with C vs C++ a couple of years later.
I see the use of GC enabled languages the same way, and C# belongs to those languages that also offer other means of memory allocation, not everything needs to live on the GC heap.
I think you’re absolutely right here. The reason people often dismiss garbage collection in game programming is the pauses, but if the pauses aren’t noticeable then the reason to dismiss goes away. Computer performance gains over time can totally help dismiss that, much akin to virtual calls or default allocators. The writing on the wall was there after games like Minecraft became a huge hit.
Garbage collection is not so
much considered a negative thing, but a thing that's inappropriate for the embedded domain. The problem is that garbage collection entails some system code periodically scanning lists of memory allocations to identify stuff that's now garbage that can be recycled. Embedded Devs worry about the scheduling of that code, and how long it could take to run worst case, and whether it will spoil their real time guarantees. There are various mitigation strategies, but for good or evil many individuals and organisations apply a simple "no, we're not going to use GC ever" policy.
Thank you guys for the response, super appreciate it.
I guess, I can understand from an abstract perspective that you can manually tune performance and optimize to a higher degree if you can control memory allocation yourself.
And for a lot of purposes where performance is imperative, like games or embedded devices it can make or break the ability of software to function properly.
But my question then is, if languages like Crystal, Nim, or D (or any other GC lang with similar speed) can operate either at/near the performance of C, why exactly do you need manual memory management?
And if you do need it, I assume many languages that cater to this audience provide some sort of symbolic annotation that allow you to manually control GC where you feel you need it, aye?
I think you are correct in your basic assertion that no one wants manual memory management for its own sake. What they really want is sufficient performance for their use case. The benchmarks you usually see are throughput oriented, and on small heaps. If you have tight latency budgets and/or huge heaps, the performance is not close.
Optional manual memory management sounds great, but I'm skeptical it would work well in practice. The reason is that if the language default is GC, libraries won't be designed for manual memory management, meaning it will be hard for your manual code to interact with data structures created by non-manual parts.
"Near C" performance is often not good enough, and usually misleading. You can write poorly performing applications in C, and certain benchmarks may favor or disfavor certain elements of a language. Generally they're created to be "similarly written" in all benchmarked languages, which may seem like the fairest comparison at face value. But what that means is that they are often naively written in one or more of the languages. Expertly written, hand-tailored-to-the-problem-domain C code is almost always going to outperform other languages by a significant margin, especially languages without manual memory management. You can do things in C like use arena allocators to significantly reduce memory performance overhead - things which require low-level control and a non-naive understanding of the problem domain. Garbage collectors can be quite performant, but they aren't capable of this kind of insight. Code that is written in C similarly to a garbage collected language will be similarly naive (another malloc call for each and every allocated thing, versus allocating out of an arena, for instance).
As I said, mitigation strategies exist, including manual control of GC etc. It's not true that using GC is universally impossible in embedded / real-time situations. It is true that it can cause performance and non-determinism issues (which are potentially solvable), and it's also true that some developers avoid GC so they don't have to deal with those potential issues. They would prefer to deal with the issues associated with manual memory management.
Who's to say who's right and who's wrong? Ultimately life (and the subset of life that is software development) is a massively complex strategy and tactics game with a myriad of possible playing strategies and no agreed perfect solution.
> if languages like ... can operate either at/near the performance of C
That depends entirely on how you define and measure performance. If total throughput is your metric, then it's no problem - for example, Go is perfectly acceptable for web services.
Predictability of latency, however, is absolutely _not_ on par with C code. For example, 3D rendering with a GC can easily result in perceptible stuttering if care isn't taken to minimize allocations and manually trigger the GC at appropriate times.
> some sort of symbolic annotation that allow you to manually control GC
It's not that simple. D tried to sell this at one point, but it just doesn't work for large multithreaded programs and things aren't single threaded these days. Manually controlling a global GC means manually balancing, for example, one block of threads that perform lots of allocations and deallocations (and will starve if the GC doesn't run regularly) with soft real time networking code and hard real time VR rendering code. And (for example) you certainly don't want your rendering loop pausing to scan the _entire heap_ (likely multiple gigabytes) on each frame! Alternatively, in the case of Go (and depending on your particular workload) you might not appreciate the concurrent GC constantly trashing the caches.
Custom allocators and non-atomic reference counting are fantastic though.
Several companies have been selling Java, Oberon and now Go runtimes targeted to bare metal deployment on embedded scenarios.
Some of them are more than 20 years old, so apparently they might have one or two customers keeping them alive.
The hate against GC feels like the hate against high level languages on 8 and 16 bit platforms back in the day, because anyone doing "serious" stuff naturally could only consider Assembly as a viable option.
Being able to use a GC in some embedded cases (not too hard constraints on memory use or latency), doesn't mean that you're able to use GC in every embedded cases.
I work on telecoms just above the FPGA/DSP even a 1ms pause would be a big issue.
Agreed, however there is a big difference between stating that it doesn't work at all, and accepting that there are plenty of use cases where having a soft real time GC tailored for embedded development is perfectly fine, and actually does improve productivity.
Since you mention telecommunications, I would consider network switches running Erlang a use case of embedded development.
Other examples would be the Gemalto M2M routers for messaging processing, or some of the NSN base station reporting platform.
So while it doesn't fit your scenario, it does fit other ones, this is what some in anti-GC field need to realise.
Because garbage collection, and in particular tracing garbage collection, adds significant overhead both in CPU cycles and memory. This overhead is also very unpredictable and depends heavily on memory allocation and object lifecycle patterns. Simple GCs can pause the program for a very long time, proportional to the size of the used memory, and this may be several tens of seconds for large heaps, so quite unacceptable. There are ways to mitigate these long pauses with incremental or concurrent GC, but they increase complexity of the runtime system and have even more average overhead, and although in the average case they may perform acceptably, they tend to have very complex failure modes. In addition to that, a tracing GC typically needs some additional memory "room" to operate, so programs using GC tend to use much more memory than really needed.
There is also a common misbelief that compacting GC helps make heap allocations faster than malloc. While technically true - the allocation itself is simple and fast, because it is only a pointer bump, a problem occurs immediately afterwards - this new heap memory hasn't been touched since the last GC, and it is very likely not cached. Therefore you get a cache miss immediately after the allocation (managed runtimes initialize memory on allocation for safety). Because of that, even allocating plenty of short-lived objects, which is the best case for GC, is not actually faster than a pair of malloc+free.
There are also other overheads:
* Managed runtimes typically use heap for most allocations and make stack allocation harder or not possible in all cases - e.g. it is much harder to write Java code with no heap allocations than C.
* To facilitate GC, objects need additional word or two words of memory - e.g. for mark flags or reference counts. This makes cache locality worse and increases memory consumption.
* During heap scanning, a lot of memory bandwidth is utilized. Even if GC does that concurrently and doesn't pause the app, this process has significant impact on performance.
* Tracing GC prevents rarely used parts of the heap to be swapped out.
At least for my use scenario in embedded systems,
performance is not necessarily worse with GC and nondeterminism is not a showstopper either. The problem is avoidable by proactively minimizing allocations in the hot paths or arranging 'critical sections' that disable GC temporarily. The deal-breaker is the memory footprint.
PowerShell is obviously a much better scripting language than the ancient DOS BAT "language" (if you can call it that). In theory it's also mostly ubiquitous on Windows which means that you can rely on it being there when creating a script. Yet I've found that often people keep using BAT files (e.g. in build scripts, etc). I think it is because you cannot just execute a PowerShell script unless it is signed or the user has manually enabled non signed script execution (by executing a command on the PowerShell command). This means you cannot rely on it just working, at which point it's often best to fall back to DOS or use another scripting language such a Python.
I understand that this is done for security reasons, but Windows already lets you execute any executable or BAT file that you might have downloaded from the Internet. So I'm not sure that disabling PowerShell script execution really gains you much (and there are probably other better solutions anyway).
So IMHO as long as DOS is available and PowerShell is so limited by default BAT files will not go away, which is unfortunate.
My heart sinks every time I see foo.bat next to foo.ps1, where foo.bat is just `powershell -ExecutionPolicy Unrestricted -Command foo.ps1` so that it's double-click-friendly.
I didn't downvote you, but consider that I answered the GP's question and solved a problem they are having, and you added some snark.
There is plenty of talk in this space about trading off machine resources for programmer effectiveness, so while your gripe is technically accurate that ship has sailed long long ago.
Well, in my experience the double-click-convenience of .bat files (due to their default cmd.exe association) is the reason people do this, not just the execution policy.
To sign a script you run:
Set-AuthenticodeSignature foo.ps1 $someCert
and make sure that $someCert 's pubkey is available on every user's computer.
The alternative is of course to get every user to run `Set-ExecutionPolicy Unrestricted` to solve the "problem" permanently.
(Thanks to ezquerra for bringing me back to the conversation here via Twitter.)
Disclaimer: I'm a PM on PowerShell at Microsoft, and the following is based on my personal observations. There's also no current plan to do any of what I suggested (but it's certainly giving me a lot to think about at the start of my work week).
The default ExecutionPolicy was engrained and controversial long before I joined the PowerShell Team (or even Microsoft), and I'll be the first to admit that, as a long time Linux user, I didn't GET why I couldn't just run arbitrary scripts.
The public Microsoft opinion on the matter has always been that ExecutionPolicy is there to try and prevent yourself from shooting yourself in the foot. In Linux, this is very similar to adding a +x to a script (although that's clearly much simpler than signing a script).
I'd say it's also akin to the red browser pages warning you about self signed certificates, or Microsoft/Apple "preventing" you from running certain app packages that are "untrusted" in one form or another. In one sense, you could actually argue that PowerShell was ahead of the curve.
Now as a power user with much less at a stake than, say, an IT administrator at a Fortune 500 company, the first thing I do with all these restrictions, warnings, and confirmation prompts is to turn them off.
But those warnings (often with no clear way to disable them) are there for a reason. PowerShell was built at a time when Microsoft admins were GUI heavy, and PowerShell did its best to herald them into a scripting world while fostering best practices. And if you're using a bunch of scripts sourced from SMB shares within your domain as a domain administrator, you don't want to accidentally run a script that hadn't gone through your internal validation process (hopefully culminating in the script getting signed by your domain CA).
So let me assume that you agree with everything so far. Why does this experience still stink? Any Microsoft administration worth their salt uses PowerShell, and many of even the power-est of users finds ExecutionPolicy annoying.
In my opinion, it's too hard to sign scripts. We should be following in the footsteps of the drive to get everything on HTTPS (props to Let's Encrypt and HTTPS Everywhere, along with many others). We should have guidance on signing if you have a CA, we should have guidance on getting scripts signed by third party CAs, and we should probably offer a cheap/free service for signing stuff that goes up on the PowerShell Gallery.
Oh, and we should make it easier to read the red error that gets thrown explaining how to lookup a help topic that tells you how to bypass/change ExecutionPolicy.
Unfortunately, that's all easier said than done. But the default being what it is puts the onus on us to do something to make security easier.
I understand your reasoning. I like some of your proposals. For example I'm all for making it easier to sign scripts, for example. Yet I feel that the possible solutions that you mention ignore the fact that you can bypass this "security" mechanism can be bypassed with a simple BAT file.
Why do PowerShell scripts require more security than a BAT file or an executable? Are users really safer thanks to the ExecutionPolicy check? Or are they simply worse off because people will either use less powerful BAT files or completelly opaque executables? At least with a PowerShell script you are able to inspect the code if you are so inclined. By pushing people to use executables instead they are less likely to know what changes will be done to the system.
If the problem is admins accidentally double clicking on unsigned scripts, by all means show a confirmation dialog (if the script is not signed) when a non signed script is _first_ executed. Actually, do that for BAT files and perhaps even for executables as well. But don't do it (by default at least) when someone calls a script explicitly from a command line or from another script. IMHO that would really make us all safer and would make PowerShell a real replacement for BAT files.
It is actually pretty simple. Mercurial automatically assigns a _local_, sequential revision number to every commit in your repository clone. In fact those revision numbers almost match the numbers in your examples (Commit 1 would have revision number 0, Commit 2 would be revision number 1, etc).
So in your example you would do:
"hg update 2" to checkout Commit 3, and
"hg update 3" to checkout Commit 4
Note that these revision numbers are _local_ to your repository. That is, another clone of the same repository may have different revision numbers assigned to different revisions. The revision numbers are assigned in the order in which commits have been added to that particular clone.
You could of course also use the revision id (i.e. the SHA1 hash) to identify them. You do not need to use the whole revision id, just the part that makes it unique, usually the first few (e.g. 6-8) characters of the hash.
In addition Mercurial has a very rich mini-language to query and identify revisions in your repository. These queries are called "revsets". Most mercurial commands accept revsets wherever they would need to receive a revision identifier. With it you can identify revisions in many ways such as by date, by belonging to a given branch, commited by a certain author, containing a certain file, etc (and any combination of those and many others).
Finally, if you use a good mercurial GUI (such as TortoiseHg) the whole thing is moot because the GUI will show you the whole DAG and you can just click on the revision that you want and click the "Update" button.
I actually find the ability to create these "anonymous" branches really useful. Naming things is hard so I find that coming up with a name for each new short lived branch is annoying.
When you enable mercurial's evolve, history rewriting operations are no longer destructive. Mercurial's evolve creates a sort of repository "meta history" by saving every revision before you modify it (e.g. by using amend). It makes those saved, old versions "obsolete" and it "hides" them (so that they won't show on your DAG when you do "hg log", for example, unless you use the --hidden flag). Evolve also keeps track of the relationship between obsolete revisions and their "successors". That is, it will know whether a certain revision in your DAG is the result of amending one revision, or perhaps of folding several revisions into one, or splitting one revision in two or perhaps just removing a revision from the DAG (this is what I called repository meta history above).
Those hidden, obsolete revisions are not shown on your DAG and they are generally not pushed nor pulled. In most respects they behave as if they were not even there. It is only when you need them that you can show them or go back to them (by using the --hidden flag of some of mercurial's command such as hg log or hg update). This gives you a nice safety net (since rewriting history is no longer a destructive operation) that you can use _if you want_. It also makes it possible to rewrite revisions that you have already shared with other users (since when you push a successor revision you also push the list of revisions that it is the successor of).
I think evolve is a significant step forward on the DVCS paradigm as it enables safe, distributed, collaborative history rewriting. This is something that, AFAIK, was not possible up until now.
The definitely do. They migrated their repositories to mercurial a while ago. There are plenty of mentions of this fact on mercurial's development mailing list (mercurial-devel@selenic.com).
Facebook is a big user and backer of mercurial. I attended the last mercurial sprint in Facebook's London office (which was great, BTW). Facebook will also host the next mercurial sprint in New York. They have recently hired Matt Mackall, mercurial's creator. Several other mercurial core developers also work for Facebook and are paid to work on improving mercurial as their main job.
I believe at some point they considered both git and mercurial as their new VCS. They have some huge repositories (hundreds of thousands if not millions of commits) and a huge amount of people accessing those repositories. I think they found some scalability issues with git's performance with repos of that scale (http://comments.gmane.org/gmane.comp.version-control.git/189...). Apparently it was easier for them to improve mercurial's performance, perhaps because mercurial is written in python with some performance sensitive parts written in C. Over the last year they have made a lot of progress and mercurial's performance on huge repositories is now even better than it used to be.
Mercurial comes with a built-in "hg serve" command that you can use to serve repositories through http. It creates a web server that you can access through any web browser. Unless you need authentication or you have a lot of users you don't need to setup any external web server.
Otherwise setting up Apache + mercurial on windows is not very hard. If you need help please drop me a line.
Thanks for the offer of help, but we're OK with the web server side of things. I was just suggesting that one possible reason for Git's popularity compared to Hg's is that it isn't a walk in the park to set up a common repo for a team with Hg. If you've got someone who's familiar with setting up a web server anyway, you'll be OK, but with Git you don't need to do anything like that at all.
I don't sure I understand what is the problem with setting up a basic mercurial server. If you have TortoiseHg you just open your repository, click on "Repository / Start Web Server" and you are done. If you have bare mercurial just cd to your repository and execute "hg serve".
Perhaps you have some other requirement (e.g. authentication) that I did not take into account?
I suspect hg serve is fine for temporary use, but it's not really designed as stable, long-term solution. As you say, it lacks authentication, which isn't ideal (or allowed at all) in some circumstances. Also, it needs to be started manually, so it needs some sort of supervisor process/start-up script to be set up.
Obviously this isn't some horrific burden, but it's still more demanding than the basic server set-up for some other DVCSes. The original question was about the reasons for the relative popularity of different systems, and if we're talking about people who are making decisions about a DVCS for the first time, they're not experts already and this stuff probably does make a difference.
You can work around this surprising behavior by always using inline dependencies, or by using the `--project` argument but this requires that you type the script path twice which is pretty inconvenient.
Other than that uv is awesome, but this small quirk is quite annoying.