I'm glad Fossil works for them, but this line is bothered me:
> In contrast, Fossil is a single standalone binary which is installed by putting it on $PATH. That one binary contains all the functionality of core Git and also GitHub and/or GitLab. It manages a community server with wiki, bug tracking, and forums, provides packaged downloads for consumers, login managements, and so forth, with no extra software required
git, for all its issues, is not bundling the kitchen sink. I do prefer the "do one thing and do it well" approach.
Git actually is bundling a lot of stuff you probably don't realize. Run 'git instaweb' for example and it will spin up a local CGI server and perl CGI script to give a very simple web UI: https://git-scm.com/docs/git-instaweb Fossil isn't much different in core functionality here.
There is a ton of email client integration and functionality in git too that most people who only use GitHub probably have absolutely no idea exists or even why it's there. Stuff like sending email patches right from git: https://git-scm.com/docs/git-send-email
I tried that (primarily because I didn't know about it and was excited), but it didn't work. It needs me to install "lighttpd" to work. And the steps to install that are not straightforward.
I like that a lot of functionality is not bundled up in the git that I have already installed in my computer, but at the same time, I agree with the fact that adding these separate binaries is not easy as a user.
Wow now that's a blast from the past. That's a nice relic from the times of people getting annoyed with Apache's one process per request model and Nginx not being popular just yet. Does anyone use that anymore for, well, anything?
Yes, which is why git is a pain in the butt to install and you have to rely on your OS packages (which includes all the kitchen sink stuff) or a GUI installer with all the necessary dependencies vs. Fossil which is just download executable and done.
Securing the software supply chain for software updaters and VCS systems means PKI and/or key distribution, cryptographic hashes and signatures, file manifests with per-archive-file checksums, and DAC extended filesystem attributes for installed files; Ironically, there's more to it than just `curl`'ing a binary into place and not remembering to update it.
I'm a Linux person but often at work due to decades of Gates's effective brainwashing I almost always have to use windows on the desktop. IBM was the only exception. That said, on windows, all I do is get this https://git-scm.com/download/win run the installer and yes it installs stuff but it's literally all automatic. So it's easy on Windows and on Linux, Mac I don't know, I don't do Apple products.
From that page: "Git for Windows provides a BASH emulation used to run Git from the command line. *NIX users should feel right at home, as the BASH emulation behaves just like the "git" command in LINUX and UNIX environments."
That's correct, it works automatically and is great to fix windows (when I have to work on windows) so it has some functionality but even if all I used out of that was git, it's easy. Is it all one exe? No. Does it need to be? No.
CLI and GUI are different languages that are optimal for different use-cases. A dev who doesn't understand that and refuses to use CLI where appropriate is like a dev who would refuse to learn English. They are crawling when others can run.
The first Microsoft OS was a UNIX variant called Xenix "The first operating system publicly released by the company was a variant of Unix announced on August 25, 1980. Acquired from AT&T through a distribution license, Microsoft dubbed it Xenix": https://en.wikipedia.org/wiki/History_of_Microsoft
Windows NT only got the bare minimum POSIX support for Windows NT to be allowed into DoD contracts. It was barely improved from there, and its SUA replacement was hardly any better.
Mostly ignored by Windows developers and finally put to sleep in 2003.
Xenix, which was actually my first UNIX experience, predates Windows 3.x success, and was largely ignored as Microsoft decided to focus on MS-DOS and OS/2.
WSL exists, because Microsoft realised plenty of people don't care about Linux, and rather buy Apple hardware for a POSIX experience, and since Linux kernel ABI matters more than POSIX in modern times, so WSL it is.
One needs to sell that Linux Desktop experience, that is taking decades to move percentiles.
None of that is relevant for Windows developers targeting Win32, and .NET technologies, the crown jewels.
These is exactly what my first thought was! Installing open-ssh and git is kind of my first activity on a new computer.
But I believe they are talking about setting up a git server. The thing that we rely on GitHub/GitLab/Bitbucket for. Richard counts them as an added dependency.
(I mean, he is not wrong in being worried about that. You are basically giving your all your code to a company. Which does not matter for open-source projects like SQLite, but it does for many private projects with code as their primary IP)
What is a “git server”? There’s no distinction between client and server in git; you can “git pull” from any machine with the normal “git” program installed that you have SSH access to.
Who uses git-daemon these days? When's the last time you saw git:// in a README?
Practically all use of git is either the "smart HTTP" protocol over HTTPS, or over SSH. git-daemon is plaintext, you wouldn't want to push over that, and for public use HTTPS has won.
I believe grandparent means what is these days called "forges", a web frontend with user accounts and such, with an extra heaping pile of features.
I’ve been trying to install git on the hypervisor of my smartOS install intermittently for a year. I run into system package conflicts or key signing errors and then I give up and switch to a zone that isn’t so borked with package issues.
Then I go lament in the corner that I really should be serious about migrating this to 200TB home lab to anything else and then realize that the 200TB is not something easily backed up in the case FreeBSD or TrueNAS doesn’t understand my Zpool versions and all of those zones I created will need to be rebuilt as jails…
And my lamentations turn to tears. All because I wanted to just checkout some random project.
Come on, isn’t that kind of an argument from the 90ies, when disk space was scarce, and internet connections slow?
I mean like, I have space for millions of kitchen sinks now. Considering that, I kind of refuse to accept that out of all things, bundle size is a valid distinction criterion for VCS‘es.
Git send-email is actually the best email client I've ever used. It's the only reason I managed to post to mailing lists. I couldn't figure out anything else.
The concept that you can isolate "one thing" and it's not in itself "a set of other things" is a very nasty myth in the world of programming. Everything is composite. Everything is a pipeline of commands. Made of things. The "job to do" is in the eye of the beholder, not absolute. The "single responsibility principle" is not applicable in reality. It's always a tradeoff of maintaining a balance of cohesion and modularity. Where we make a cut and call it a "thing" is up to us and our needs.
For many people, for example, the tools of GitHub are so integral to their workflow, they can't use Git alone at all. So to them GitHub is the "one thing". So SQLite has their own "thing".
They're not saying it doesn't encapsulate more than one thing. They're saying *at the abstraction level of "stuff you need to install and run" it is one thing.
I think SQLite is fantastic and Richard is obviously a genius. But I always found his obsession with single binary monoliths odd.
As you mentioned it goes against the Unix philosophy of do one thing and do it well. To me it's obviously cleaner to divide a system into components that can later be swapped or modified independently.
I have no idea why Richard focuses on such things but:
I'm old enough to remember when developers spent time making sure they could plow all of their build assets into a single binary distributable. It often had kind of a zest to it and when you dealt with software had directories full of stuff it looked both "corporate" and "sloppy".
I've never quite gotten over the feeling that the piles of dynamically linked libraries hasn't helped things. I know objectively that there's a notion that you can update libraries instead of applications, but it feels like it makes updating applications fragile and you inevitably end up in some kind of dependency hell. "But breaking insecure applications is a feature!" I mean, okay, I guess. But I still need something to work and don't want to spend all day fixing broken layers of dependent stuff. If I have to do that, I may as well just update a single binary.
Go seems to have come back around to this style of thinking, and in a sense container images and JAR files are often trying to replicate what it's like to just download a binary, chmod +x it, and execute it.
Directories full of stuff is similar to websites with URL paths like "/site.php?page_id=18231238". Or even better when subdomains get involved and it looks like "secure3.action.domain.com/admin.php?page=123424". It technically works but is a bit ugly.
Also another web analogy might be dynamic linking being similar to microservices. People want to build and ship smaller components that can be swapped out independently. It works but does seem to make updating and testing fragile. You can test heavily at the boundaries but there's still kind of an "air gap" between the main app and the lib/microservice. If you want to be really sure there's no breakage, you have to test the whole thing, at which point you might as well just ship a monolith.
>Directories full of stuff is similar to websites with URL paths like "/site.php?page_id=18231238". Or even better when subdomains get involved and it looks like "secure3.action.domain.com/admin.php?page=123424". It technically works but is a bit ugly.
OOC, why does this stand out for you? Just to explain my curiosity, I've worked on Mac since I was a kid starting with System 6 and then going to OS X when it came out, so Apple's "your program is all in that file" just kind of made sense to me and it was really convenient to just drag a file to the trash and the app is _mostly_ gone, minus a few .plist and other config files in ~/Library.
But I _like_ the old forums and sites that still show the stuff like page_id=N; for the boards and forums I go to, it's very useful to just jump around long topics or you can play with it on your shitposting.
Plus most modern browsers truncate or hide the full URL anyways; I dislike this feature personally, but at least Safari's concise tabs are a good balance for someone like me.
Fair enough, for message boards it's fine. I think I was mostly just thinking about old/sloppy WordPress sites where you might click on "about us" and it takes you to ?page_id=1234. Feels like a lack of attention to detail compared to /about-us. Similarly, a binary surrounded by a bunch of folders and dlls feels like a lack of attention to detail (and thus kind of "corporate" as the previous poster mentioned).
Dynamic linking is the bane of backwards compatibility.
Now everything is containers, appimages, flatpacks, docker images and so on, and all they do is pack all the libraries a binary may need, in a more wasteful and inefficient format than static linking.
In that sense, we truly have the worst of both worlds.
The situation on windows is fascinating: everyone links these libraries dynamically, and yet, there are about two hundred of them on my system, every application using its own uniquely outdated version.
In my practical experience the set of things that can go wrong if you link apps dynamically is much larger than the problems that arise when they are statically linked.
For one, it is more complicate to keep track of which of the many shared libraries on a typical system are used by which application. It is common that the same library occurs multiple times in different versions, built by different people/organizations and residing in different directories.
Quick, without looking: which TLS library do your network exposed subsystems use, which directories are they in and where did you install them from. When you do go to look: did you find what you expected?
Have a look at all the other shared libraries on your system. Do you know which binaries use them? Do you know which versions of which libraries work with which binaries? Do you trust the information your package manager has about version requirements? Does it even have that information?
Then there's the problem of what happens when you upgrade. The servers you run might have a rigorous battery of tests. But now you are running them with libraries they were not tested against. Sure, most of the time it'll work. But you don't know that. And you have no way of knowing that without downloading, building and running the tests. Or have someone else do that.
I've been in the situation where someone inadvertently updated a library in production and everything came crashing down. Not only did it take down the site, but it took a while to figure out what happened. Both because the person who did it wasn't aware of what they'd done. And the problem didn't manifest itself in a way that made the root cause obvious.
The clearest risk with statically linked binaries is if they are not updated when there is, for instance a security problem. But in practice I find that easier to deal with since I know what I'm running, and for anything important, I'm usually aware of what version it is or when I last checked for updates/problems.
> For one, it is more complicate to keep track of which of the many shared libraries on a typical system are used by which application. It is common that the same library occurs multiple times in different versions, built by different people/organizations and residing in different directories.
That's not common at all, man. I strongly recommend you don't do that.
> Quick, without looking: which TLS library do your network exposed subsystems use, which directories are they in and where did you install them from.
Openssl 3.x.y. It's /usr/lib64/openssl.so or similar. They are installed from my distro's repository.
> When you do go to look: did you find what you expected?
Yes. Openssl 3.1.1-r2. The OpenSSL binaries are actually named /usr/lib64/libssl.so and /usr/lib64/libcrypto.so. Upstream version is 3.1.2. There have been two low priority CVEs since 3.1.1 (never change openssl...) and my distro has backported the fixes for both of them into 3.1.1-r2.
> Do you know which versions of which libraries work with which binaries?
What do you mean "which versions of which libraries"? There's only one version of each library. If the package manager needs to keep an old version of a library around, it gives a loud warning about it so I can either fix the problem or ignore it at my own peril.
Those two .so files (libssl.so and libcrypto.so) as used by postfix, dovecot, and nginx. They are also linked by opendkim, spamassassin and cyrus-sasl, but those don't have open ports on the internet, so they don't really count. OpenSSH can optionally link to openssl; as it happens, my openssh does not link against a crypto library, openssl or otherwise. It just uses openssh's built in crypto schemes.
> Do you trust the information your package manager has about version requirements?
Yes.
> Does it even have that information?
... wat..? Of course it does?
> I've been in the situation where someone inadvertently updated a library in production and everything came crashing down. Not only did it take down the site, but it took a while to figure out what happened. Both because the person who did it wasn't aware of what they'd done. And the problem didn't manifest itself in a way that made the root cause obvious.
I've been in the situation where a security guard at my last job inadvertently discharged his service revolver into a Windows machine, and it crashed. That doesn't mean I stopped using Windows. (I mean, I did stop using Windows...)
That's genuinely just not a problem that I've had. Not since 2004 and all the C++ programs on my computer broke because I force upgraded from GCC-3.3 to GCC-3.4 and the ABI changed. Or that time in 2009 where I installed a 0.x version of Pulseaudio on my gaming machine. Or that time I replaced OpenSSL with LibreSSL on my personal computer. If your server takes a shit because somebody was fucking around doing stupid shit on prod, and you do root cause analysis and come up with a reason that it broke other than, "employee was fucking around and doing stupid shit on prod" and the recommendation is something other than "don't fuck around and do stupid shit on prod" I don't know what to tell you. Dynamic linking isn't going to stop a sufficiently determined idiot from bringing down your server. Neither will static linking.
> What do you mean "which versions of which libraries"?
If you upgrade a shared library to fix a problem, how do you know that the application has been tested against the fixed version?
And no, your package manager won't know.
Congratulations on a) not having multiple installs of shared libraries on your system and b) for knowing which version you have. Knowing this isn't very common.
> If you upgrade a shared library to fix a problem, how do you know that the application has been tested against the fixed version?
Distro's like Debian solve that problem by not upgrading. The only things deemed worthy of "fixing" are security issues, and they are fixed by backporting the fix (only) to the existing shared library. Thus no API's (of any sort - even unofficial ones like screen scraping) are upgraded or changed, so no testing is necessary.
And thus:
> And no, your package manager won't know.
It doesn't have to know, because the package manager can assume all releases for Debian stable are backward compatible with all the packages in that release.
A lot of the noise you see on HN comes from people using distro's on their desktops. To them a distro is a collection of pre-packaged software with all the latest shinies, which they upgrade regularly. But Linux's desktop usage is 3%, whereas it server usage is claimed to be over 95% (which eclipses Windows Desktop share). Consequently distros are largely shaped not by the noisy desktop users, but by the requirements of sysadmin's. They need a platform that is guaranteed both stable and secure for years. To keep it stable, they must solve the problem you describe, and for the most part they have.
If you're linking to OpenSSL, it's scary to have that upgraded from under you. Maybe it got better in the 3 series, but I seem to recall pretty much all the important 1.0.1? releases would be something you'd need to mitigate a bit vulnerability, but would also have api changes that would break your application if you were trying to do newish things. Sometimes justified, but still a pita.
Somehow this makes me think of games Back In The Day where you could simply replace your crosshair by editing a bitmap file, versus now where everything's so much more locked-down behind proprietary container formats and baked-in checksums, etc.
Monolithic builds are great if you have no control over the deployed environment (IE desktop apps sans OS supplied libs). They’re worse if you do control the environment and how the upgrade paths get followed
Doesn't it seem that more and more people are just given access to some managed environment they have little control over anyway?
I feel like sometimes the dependency on dynamically linked stuff is akin to "well transistor radios are great if you don't care about soldering on fresh vacuum tubes like a real radio person would."
A dynamically linked library need only have one image of itself in memory.
If you are running a process that, for example, forks 128 of itself, do you want every library it uses to have a separate copy of that library in memory?
That's probably the biggest benefit. But it also speeds up load time if your executable doesn't have to load a huge memory image when it starts up, but can link to an already in-memory image of its library(s).
The only real downside is exporting your executable into another environment where the various dynamic library versions might cause a problem. For that we have Docker these days. Just ship the entire package.
> If you are running a process that, for example, forks 128 of itself, do you want every library it uses to have a separate copy of that library in memory?
> it also speeds up load time if your executable doesn't have to load a huge memory image when it starts up
I'm not sure about Windows and Mac, but Linux uses "demand paging" and only loads the used pages of the executable as needed. It doesn't load the entire executable on startup
You'd love NixOS. Gives you the flexibility of dynamic libraries with the isolation and full dependency bundling per app and less janky than snap or flatpak.
Slight tangent: it bugs me when people say "it goes against the Unix philosophy" as though The Unix Philosophy were some kind of religious text. Not everything should be a pluggable Unix executable, and Fossil making non-Unixy choices doesn't reflect poorly on it. They just chose a different philosophy.
I’m so thankful that Rust is helping popularize the solo exe that “just works”.
I don’t care if a program uses DLLs or not. But my rule is “ship your fucking dependencies”. Python is the worst offender at making it god damned impossible to build and run a fucking program. I swear Docker and friends only exist because merely executing a modern program is so complicated and fragile it requires a full system image.
> I’m so thankful that Rust is helping popularize the solo exe that “just works”.
Wasn't it Go that did that? I mean, not only was Go doing that before Rust, but even currently there's maybe 100 Go-employed developers churning out code for every 1 Rust-employed developer.
Either way “Rust is helping” is true. And given that Go is a managed language it never really factored into the shared library debate to begin with, whereas Rust forces the issue.
Maybe, but it's misleading. Using the assertion that "$FOO made $BAR popular" when $FOO contributed 1% of that effort and $BAZ contributed the other 99% is enough to make most people consider the original statement inaccurate.
> And given that Go is a managed language it never really factored into the shared library debate to begin with, whereas Rust forces the issue.
How so? Rust allows both shared and static compilation, so it's actually the opposite - Rust specifically doesn't force the use of single-binaries.
I'm struggling to interpret what it is you are saying: Go specifically forces the use of static linkage, whereas in Rust it's optional, is it not?
I am under the belief that in Rust you can opt-out of static linkage, while I know that in Go you cannot.
Are you saying that Rust doesn't allow opt-out of static linkage?
> Using the assertion that "$FOO made $BAR popular"
Thankfully that’s not what I said! This sub-thread is very silly.
FWIW Rust is exceptionally bad at dynamic/shared libraries. There’s a kajillion Rust CLI tools and approximately all of them are single file executables. It’s great.
I have lots of experience with Rust, the Rust community, and a smorgasbords of “rewrite it in Rust” tools. I personally have zero experience with Go, it’s community, and afaik Go tools. I’m sure I’ve used something written in Go without realizing it. YMMV.
Ehhh. You can compile a single exe with C or C++. I’ve personally come across far more Rust tools than Go. But I don’t really touch anything web related. YMMV.
The choice is actually between dealing with complexity and shifting responsibility for that to someone else. The tools themselves (e.g. virtual environments) can be used for both. Either people responsible for packaging (authors, distribution maintainers, etc.) have some vague or precise understanding of how their code is used, on which systems, what are its dependencies (not mere names and versions, but functional blocks and their relative importance), when they might not be available, and which releases break which compatibility options, or they say “it builds for me with default settings, everything else is not my problem”.
> Either people responsible for packaging have some vague or precise understanding of how their code is used, on which systems, what are its dependencies
But with python it’s a total mess. I’ve been using automatic1111 lately to generate stable diffusion images. The tool maintains multiple multi-hundred line script files for each OS which try to guess the correct version of all the dependencies to download and install. What a mess! And why is the job of figuring out the right version of pytorch the job of an end user program? I don’t know if PyTorch is uniquely bad at this, but all this work is the job of a package manager with well designed packages.
It should be as easy as “cargo run” to run the program, no matter how many or how few dependencies there are. No matter what operating system I’m using. Even npm does a better job of this than python.
A lot of problem with Python packages is the fact that a lot of Python programs is not just Python. You have a significant amount of C++, Cython, and binaries (like Intel MKL) when it comes to scientific Python and machine learning. All of these tools have different build processes than pip so if you want to ship with them you end up bring the whole barn with you. A lot of these problems was fixed with python wheels, where they pack the binary in the package.
Personally, I haven't ran into a problem with Python packaging recently. I was running https://github.com/zyddnys/manga-image-translator (very cool project btw) and I didn't ran into any issues getting it to work locally on a Windows machine with Nvidia GPU.
Then the author of that script is the one who deals with said complexity in that specific manner, either because of upstream inability to provide releases for every combination of operating system and hardware, or because some people are strictly focused on hard problems in their part of implementation, or something else.
A package manager with “well designed” packages still can't define what they do, invent program logic and behavior. Someone has to choose just the same, and can make good or bad decisions. For example, nothing prohibits a calculator application that depends on a full compile and build system for certain language (in run-time), or on Electron framework. In fact, it's totally possible to have such example programs. However, we can't automatically deduce whether packaging that for a different system is going to be problematic, and which are better alternatives.
> A package manager with “well designed” packages still can't define what they do, invent program logic and behavior.
The solution to this is easy and widespread. Just ship scripts with the package which allow it to compile and configure itself for the host system. Apt, npm, homebrew and cargo all allow packages to do this when necessary.
A well designed PyTorch package (in a well designed package manager) could contain a stub that, when installed, looks at the host system and select and locally installs the correct version of the PyTorch binary based on its environment and configuration.
This should be the job of the PyTorch package. Not the job of every single downstream consumer of PyTorch to handle independently.
> Just ship scripts with the package which allow it to compile and configure itself for the host system.
Eek. That sounds awful to me. it is exceptionally complex, fragile, and error prone. The easy solution is to SHIP YOUR FUCKING DEPENDENCIES.
I’m a Windows man. Which means I don’t really use an OS level packages manager. What I expect is a zip file that I can extract and double-click an exe. To be clear I’m talking about running a program as an end user.
Compiling and packaging a program is a different and intrinsically more complex story. That said, I 1000% believe that build systems should exclusively use toolchains that are part of the monorepo. Build systems should never use any system installed tools. This is more complex to setup, but quite delightful and reliable once you have it.
I remember having to modify one of those dependency scripts to get it running at all on my laptop.
In the end I had more luck with Easy Diffusion. Not sure why, but it also generated better images with the same models out of the box.
The only way I know to manage python dependencies is Bazel as the build system, and implementing a custom set of rules that download and build all python dependencies. The download is done in a git repo. All magically missing libs must be added to the repo and Bazel. And finally you might have a way to... tar the output into a docker container... sigh
> it goes against the Unix philosophy of do one thing and do it well
For me, Perl shows just how restricted that viewpoint was.
After I learned Perl, I stopped caring about tr, and sed, and many of the other "one thing well" command-line tools. And I've no desire to swap out and modify the 's//' component of perl.
Perl does "one thing" - interpret Perl programs - even though it also replaces many things.
I know 'rmdir' exists. It does one thing, well - remove an empty directory. It's been around since the first release of Unix.
However, I use "rm -rf" because it's easier to use a more powerful tool which handles empty directory removal as a special case.
You can also change your viewpoint and say that Fossil does do one thing well: it's a distributed project control system. That's a new category I just made up, to highlight just how subjective "one thing" is.
I like `rmdir` because I don't have to check if a directory that I think is empty is actually empty with `ls -la` before removing it. This happens a lot with moving stuff out of directories (sometimes to different destinations).
% man cc
...
DESCRIPTION
clang is a C, C++, and Objective-C compiler which encompasses
preprocessing, parsing, optimization, code generation, assembly, and
linking.
% man gcc
...
NAME
gcc - GNU project C and C++ compiler
...
> To me it's obviously cleaner to divide a system into components that can later be swapped or modified independently.
But where do you draw the line? What's "one thing"? - The size and complexity between cli tools doing one thing varies by orders of magnitude, some of those programs are much larger than this Fossil "monolith". Should a component have more functionality if separation means 10 times slower performance? What if it has hundreds of such features? What if separating those features means a hundredfold increase in complexity for setting up the software as it now has distributed dependencies? Should you have a separate audio player when a video player could already do the job out of necessity? Should a terminal support scrolling if you can already get that via tmux?
The Unix philosophy is bad for judging individual programs.
Unix’s philosophy is more of what you’d call ‘guidelines’, and is not universally applicable — not all problems can be decomposed nicely, and IPC just gives you a badly debuggable hodgepodge of added accidental complexity. It’s good for trivial tools like ls, cat, etc, but something more complex is likely better off as a monolith.
> As you mentioned it goes against the Unix philosophy of do one thing and do it well. To me it's obviously cleaner to divide a system into components that can later be swapped or modified independently.
Anecdote: when i first met Richard in 2011, after having contributed to Fossil since 2008, i asked him why he chose to implement fossil as a monolithic app instead of as a library. His answer was, "because I wanted it working next week instead of next month." It was a matter of expedience and rewriting it now would be a major undertaking for little benefit.
Reimplementing fossil as a library is a years-long undertaking (literally) and is not something we're interested in doing directly within the Fossil project, but is something i maintain as a semi-third-party effort, along with a handful of other Fossil contributors, over at <https://fossil.wanderinghorse.net/r/libfossil>.
> Imagine editing a spreadsheet like `cat foo.xls | select-cell B3 | replace '=B2+1' > foo.xls`.
It would be even more cumbersome than that. After that command you'd have to restore foo.xls from a backup, and then do the edit again this time remembering that the "> foo.xls" executes before the pipe executes. :-)
I wonder if anyone has written something to make pipes like that work? E.g., write two programs, "replace" and "with" that could be used like this:
replace foo.xls | ... | with foo.xls
What "replace [file]" would do is set a write lock on the file, copy the file to stdout, then release the lock.
What "with [file]" would do is copy stdin to the file, after obtaining a write lock on the file. I think most shells would start the components of a pipe in order so "replace" should be able to set its lock before "with" tries to get a lock, but to be safe "with" could be written to buffer incoming data and only start checking the lock after it has received a significant amount or seen an EOF on stdin. Or "replace" and "with" could coordinate using some out-of-band method.
I think the "Unix philosophy" is best applied to problems that indeed can be de-composed into clear discrete steps. In fact, that's the metric I use when I write command line utilities: does this make sense in a pipe?
There are a lot of things where this isn't very practical. For instance, imagine building a web server that consists of a couple of dozen discrete utilities that are then cobbled together using pipes. Or even implementing the core feature set of Git in this manner. Would it be practical? Would it be better if Git was an enormous shellscript that connected all of these "things" into an application? What does that give you? And what would be the cost?
How would you do SQLite (the CLI application) as a bunch of discrete commands?
The UNIX philosophy of minimal cmdline tools that do one thing right is fine and the Go-style 'monolithic exe' without runtime dependencies except the OS is also fine (and both philosophies actually don't need to collide).
The problem is all the software that depends on tons of dependencies that are brought in dynamically. Just look at Jekyll vs Hugo. I have Jekyll break regularly when something (seemingly) unrelated changes on my machine, but Hugo is just rock solid.
Or another much more annoying example: Linux executables linking with glibc and then not running on systems that don't have that particular version of glibc installed.
> To me it's obviously cleaner to divide a system into components that can later be swapped or modified independently.
Why is "cleaner" the only thing that matters? Why not "functional/featureful"? It's open source so it can be modified, but I'm not sure why ability to swap matters.
Exceptional things rarely happen without some outlier conviction involved. Most things happening due to outlier convictions are just that, follies that lead nowhere. But when the stars align and there's both great ability involved and a genuine gap to fill that would have remained undiscovered workout the outlier conviction, something like SQLite happens.
> git, for all its issues, is not bundling the kitchen sink.
It doesn't bundle the kitchen sink in its native *nix environment, but for Windows it does. Git installer is > 50 MB, including (if I remember correctly) even a terminal.
While you can download Fossil as a 3.3 MB standalone binary for any supported platform.
Let's not forget how long it even took for there to be a reasonable Windows build of git. Git implicitly relied on significant amounts of Linux tooling which required bringing over an entire Mingw environment.
Actually I do to save me from the deprived interfaces the OS ships with.
The Git install is one of the best Cygwin-likes I've encountered. It has the majority of tools needed and reasonable integration with the host OS. Very nice for getting something done quickly on any random Windows box.
It's fine if you don't have anything else installed already, but I have MSYS2 installed so it'd be nice if it were optional. Realistically an extra 50mb is not going to materially affect my life, but it's still aesthetically displeasing.
I agree - that's why I use the git package in MSYS2, which afaik is just as capable (and makes it much easier to use git with the full array of UNIX tools I might want to use that don't necessarily ship with Git for Windows).
I remember people installing git-bash and putty for Windows in computer science class to fill in all the gaps. Idk what the deal was with that cause, uh, why would I use Windows.
Actually it does, because with the windows git it includes it's own copy of ssh and bash, both of which will clash and fight with msys and/or other ssh installs - including the copy of ssh that microsoft themselves tuck away in \windows\system32
it's quite 'normal' for git-for-windows' ssh-agent to completely disable ssh-agent from working properly system-wide because it ends up pointing things at the wrong ssh-agent.
Pretty much all game development still happens on Windows because that's where all (or at least most of) the gamedev tools and middleware libraries are.
That's a pretty ignorant view, there's tons of developers using Windows. If you take a look at the SO survey (or similar ones), Windows has 47% in the "Professional use" category.
And yet, because git does not bundle those things, which actually do belong as part of a modern SCMS, people lock themselves into GitHub or Gitlab, which provide those missing pieces.
Git includes a web interface written as a perl CGI script, so yeah it has a lot more dependencies than people realize. It is most certainly not just a simple single binary to plop on a machine and call it good.
> I do prefer the "do one thing and do it well" approach.
I prefer the "do what you do well" approach. If it's one thing, or many things, doesn't matter, as long as it's both efficient and correct.
I mean, if Fossil fails at some of these things, then we can criticise it for its failings, but it appears it actually does the things correctly.
The comparison with GitLab is quite enlightening. Fossil is a few MB, and Git + GitLab (to be feature equivalent) is a few GB and requires far more resources. Then which one does what it does well?
Now, there are a few things I don't like from Fossil, like the lack of a cherry-pick command. I have no intention of using Fossil over Git. I am just stating that criticism should be based on actual misfeatures and not on development "philosophies".
Earlier in it's history, Fossil didn't have a separate cherry-pick command, but rather just a --cherrypick option to the "merge" command. See https://fossil-scm.org/home/help?cmd=merge. Perhaps that is where you got the idea that Fossil did not cherry-pick.
Fossil has always been able to cherry-pick. Furthermore, Fossil actually keeps track of cherry-picks. Git does not - there is no space in the Git file format to track cherry-picks merges. As a result, Fossil is able to show cherry-picks on the timeline graph. It shows cherry-pick merges as dashed lines, as opposed to solid lines for regular merges. For example the "branch-3.42" branch (https://sqlite.org/src/timeline?r=branch-3.42) consists of nothing but cherry-picks of bug fixes that have been checked into trunk since the 3.42.0 release.
I think the point is that there is the option to have only git, which reduces the attack surface of only that tool, if you don't want/need anything else.
Why do you think it is a strength that git is "not bundling the kitchen sink"? I'd like to understand better why other people prefer kitchen sink vs multiple binaries.
I am in two minds when it comes to this. A lot of (server) software I write bundles server, client and tools in the same binary. I also embed the API documentation, the OpenAPI files, the documentation and sometimes related documentation in the binary so that it can be served on an endpoint on the server or extracted.
So you have one binary, statically linked whenever possible, that contains everything you will need. And where everything is the same release as everything else, so there are no chances you'll end up with different versions of client, server, documentation, tooling etc.
The binary size doesn't concern me. I could easily increase the binary size tenfold, and it still wouldn't be a problem for the software I write. I don't think binary size is a big concern today since so much software "splats" files all over your system anyway (which I think is a horrific way to do things), so footprints tend to be somewhat large anyway.
What occasionally does concern me is that this might be confusing to the user. But it is possible to make things confusing for the user even if a piece of software is more focused (ie does "one thing"). One example is CLI utilities that have multiple screenfuls of `--help` output. To me, that's poor usability. Users end up resizing terminal windows, scrolling back and forth (when you deal with related options) and perhaps even resorting to grep or less to find what you are looking for.
I try to mitigate this by always structuring CLI applications with commands and subcommands. Where you can do --help at each level. Quite a few CLI applications do this and I think it helps.
This summer I wrote the first piece of software in perhaps half a decade that has separate binaries for the client and the server. Because I want to share the client code, but not the server code. While the split was unrelated to usability, I've started asking myself if this presents a better user experience. I haven't really been able to conclude.
Ok that’s not fair. Git is pretty okay for the Linux open source project. But it’s pretty mediocre-to-bad for everything else.
The D is DVCS is a waste of effort. Almost all projects are defacto centralized. In fact the D is anti-pattern that makes things like large binary files a still unsolved problem in Git. And no Git LFS doesn’t count.
Source control should be capable for petabyte scale storage and terabyte scale partial clones, imho.
Git feels like an inferior technology that succeeded and now we’re mostly stuck with it. Maybe someday someone will create a proper Perforce competitor and we can all migrate to something that’s actually good.
Counterpoint: Lots of people actually use and like the D aspect. It’s one of the best things about Git!
Local git repos can be shallow or partial clones if you insist on it. You say elsewhere that a VCS should support an offline mode instead, but if that’s not a copy of the repository, how is it in any way equivalent and what exactly is the difference?
I’ve never understood why people like Perforce; I’ve assumed it’s some kind of weird nostalgia for mainframe user interfaces. I understand that it handles some kinds of large assets better than Git, but Perforce is so bad at everything else.
People don’t necessarily “like” Perforce. However it is functional for professional game projects, and other systems like Git and Mercurial are not. Perforce is ripe for disruption.
> if that’s not a copy of the repository, how is it in any way equivalent and what exactly is the difference?
“Distributed” implies the user machine is at least capable of downloading the full repo history. Git defaults to full clones and has various shallow clone features. It also implies support for extremely convoluted graphs and edge cases.
Offline implies a single central hub you connect to and sync with at some later point. I expect the only use case for downloading full history to be for backup purposes. Almost all operations can be performed without having access to every version of every file.
I like perforce. It scales significantly better than git, (running git status on a large project, or cloning one takes significantly longer via git than p4). Not having the entire history of a project locally on disk is a pro to me, not a con. P4merge is orders of magnitude better than git's. Atomically incrementing numbers for changelists are superior to sha's if the first thing you do is implement version numbers on top of git. The default workflow is sane and teachable in all of 10 minutes even to non technical people (designers, artists). I like that history is immutable - there are admin tools available if surgery is needed.
P4 isnt perfect, but it's definitely got significant advantages over git in many areas.
Or on a plane. It’s nice to have 100% of its functionality when you have 0% of your usual connectivity. Branches, commits, merges, all from the local file system? Yes, please.
There’s a reason we moved off centralized VCS en masse.
> Or on a plane. It’s nice to have 100% of its functionality when you have 0% of your usual connectivity. Branches, commits, merges, all from the local file system? Yes, please.
You don't need the VCS to be a distributed VCS to have offline capability.
> There’s a reason we moved off centralized VCS en masse.
No, there's a reason we moved to git specifically. We didn't move to the competing DVCS, did we?
Sorry, but you’re completely and objectively wrong. This is a huge misnomer.
Distributed and “offline support” are fully orthogonal features.
Distributed means every user has a full copy of the entire repo including full history. This is a radical and unnecessary limitation on the scope and size of source control.
You can have full support for branching, commits, and merges without a fully distributed repo. There are numerous examples of this.
That's an interesting, and wrong, definition of "wrong".
Git, Mercurial, and the like make every full copy of the repo equal. By convention, we often use central repos like GitHub, and commands like "git fetch" and "git push" as fast ways to sync their contents. We don't have to, though.
I can clone a remote repo, then work independently for months. I do this all the time with the Mastodon server I run. I periodically pull in upstream changes, merge them into my working copy, then push those changes out to my worker instances. I might SSH into one of those servers, fix something specific to it, push the changes out to the other local servers so that they're all synced, and repeat for weeks and months on end.
And because each of those repos are equally "official", all branching and merging operations are equally easy on each one of the peer machines. Although it's conceptually possible that it could be just as easy on a VCS that doesn't work with full clones, I've never seen it.
It seems you're still largely talking about mostly text files. The parent you're replying to works in video games, where 99% of the data stored in VCS by volume (if it's even stored in VCS) is not text data. It is things like textures, audio files, processed geometry..etc. It is extremely easy to have one single texture take up more disk space than the entire source code for the game.
> I can clone a remote repo, then work independently for months.
If you're working with more than 0 other people in video games, good frakking luck doing this. If you're working on a large enough game, you're also going to need potentially multiple TB of storage just to have a full shallow copy of the repo (with just the latest copies.) I have worked at a studio where a full deep clone of the repo for just one game was well over 100TB. Let's see you load that up on your laptop.
The don’t use Git for it. Emacs is a crappy Photoshop replacement, but it’s great at it was actually built for. Git isn’t great for video game assets, but it’s great at what it was actually built for.
Use the right tool for the job. The parent poster is complaining that their employer picked the wrong tool for their job, then insisting that the tool sucks.
Git is optimized for a specific use case. This prohibits other use cases. I think Git could be slightly tweaked to optimize for and enable different use cases. I think far more users would benefit from the new use cases than the old.
I want to use the right tool for the job. I wish Git were changed slightly so that it could be the right tool.
Git is not capable of handling video game assets. It could be. I wish it were. You may may be happy as is. I think you would actually be happier if my use case were supported. Because I think Git is over optimized for an ultra niche case that doesn’t matter to most users. And yes I realize most users are not game developers.
Git is great at what it was built for. Almost all current Git users do have the same requirements as the thing Git was built for. Therefore Git is a sub-optimal tool for almost all users. The world deserves an optimal tool. The hegemony of Git and GitHub makes it difficult for an optimal tool to be made and therefore all users suffer.
Interesting usecase. The "git bad for everything else" commenter was probably unaware of it.
That said, it's not very representative of what git is mostly used for and where there comparison with fossil is least favorable.
> Distributed means every user has a full copy of the entire repo including full history.
Then git must not be a DVCS; I use shallow clones all the time.
In any event, centralized systems probably can do offline support, at least in theory, but DVCSs more or less include it for free by default, which is still worth something.
Local history doesn’t imply D. The D basically means that clients and servers are equivalent. You could have a centralized VCS with client-server asymmetry and still have local history.
I was once responsible for doing this with a Perforce server. Only time in my career a VCS has lost data. This was decades ago now and the data loss might have been due to hardware; I cannot definitively blame Perforce, but man was it shitty to deal with at the time. We migrated to Subversion and never had another issue.
Git is a “distributed version control system” (DVCS). You’re just adding the term “source code” for no particular reason.
I make video games. Video games contain far more data than mere source code. Almost all game devs use Perforce because Git is insufficient.
Artists and designers desperately need more version control in their lives. They are mostly absent because their programmers use Git which is insufficient. Its a god damn shame how much of our businesses use “stuff_final_final_05.psd” for “version control”.
Amen. I also work[ed] in video games, and this rings very true to me. Git worked well enough for the programmers, but when you're dealing with artists, producers, designers, writers, audio engineers all needing to share work on assets that are also tied to specific versions of source, Git falls apart.
One studio I was at actually had a custom VCS that was a combination of SVN + homebrew asset versioning that worked decently well. By that I mean we all hated the shit out of having to use it, but it worked better than most anything we could get our hands on at the time for a reasonable price.
I think Git is a very clunky rock that results in people smashing their fingers and is custom designed for a very specific use case (Linux open source) that isn’t actually relevant to 99.99% of projects.
I think Git is a local minimum that people are ignorantly satisfied with. Everyone is using a mishappen rock and the popularity of this rock is inhibiting the creation of a proper hammer.
I could be wrong! But I am unfortunately forced to used Git due to its ubiquity even though it’s an extremely limited tool. Git does not spark my joy.
Mercurial is very similar to Git but incrementally better. In a different world we’d all use MercurialHub. I wouldn’t be fully
happy with Mercurial either. But the point is that Git’s ubiquity has to do with ecosystem factors beyond its intrinsic capabilities and design.
Git is a mediocre hammer, at absolute best. I want a great hammer. Users incorrectly think Git is a great hammer which is making it difficult for an actually great hammer to be built.
Almost all HN users will disagree with me. They’re quite happy with Git and think it’s a great hammer. I think they’re all wrong. But maybe it is me who is wrong! :)
It's been quite a while since I used other version control systems (such as Mercurial) to any greater extent, so I'll have to rely on my recollection of the feelings and thoughts I had when I did. I also don't really know what the current state of Mercurial is.
My impression of Git in relation to e.g. Mercurial was that Git was harder to get into. Once I did become comfortable and began to kind of understand how git works, though, it seemed to make it possible to do just about anything I wanted to do with source code. In particular, branches, merging, cherry-picking etc. work well and are fast. Doing local experiments before settling for a solution, rebasing your work on latest (already shared) other developments before publishing it to others, etc. are well enabled by those features.
Limited is not how I would describe Git. In fact, it seems rather versatile, as far as managing source code or other similar line-oriented text-based formats go. Grokking it well enough to use that versatility isn't easy, and some of the terminology it uses is just confusing, but once I got past that, it seems to enable lots of workflows that are useful for actual work on source code.
What it of course probably doesn't do that well is handling large blobs. Most software projects don't actually have lots of those, and when there are some, they're often somewhat static. So for most developers, that's usually not a major limitation of the tool. But it almost certainly is to you.
Another thing, of course, is that it's basically just a tool for keeping track of and managing versions and contents of text files. It's not, by itself, a tool for managing higher-level workflows of an entire development project, or for really handling specific formats, or anything else higher level that's not somehow related to managing source code or something similar. That can also be a major limitation. But I don't think that makes it a poor tool; in lots of programming, keeping track of and managing versions and contents of source code in multiple branches is the central use case. Tools for code reviews and other higher level project workflows can then be built separately on top of that.
When you say it's a mediocre hammer, I think you and other people just have different ideas of what a hammer is and what they typically need out of it.
Mercurial probably also allows for all of those things that lots of people like Git for. I honestly don't quite remember what the differences were or how I felt about them when I used both, except that Mercurial seemed easier to handle at first but Git felt possibly more versatile once I got used to it. I can't quite remember what gave me the latter feeling but it's quite possible the choice really made little difference in the end.
For what it's worth, I think BitBucket used to support Mercurial, so we did kind of have something along the lines of GitHub (at the time) for Mercurial.
To continue along the "mediocre hammer" line, is there something specific about Git that makes it seem only mediocre to you if you considered a hammer to be a tool for handling source code alone? How is it limited in that regard?
The world has non-text files that require version control support. Version control systems are mostly bad-to-terrible for anything other than source code. Implying that the use case is wrong is borderline offensive.
It’d be nice if there were some artist/designer friendly version control systems. I can teach someone who has never even heard of version control how to use Perforce in 10 minutes. It’s darn near idiot proof and near impossible to shoot your foot off. Git is not easy to use. This is evidenced by the tens of thousands of blogs explaining how easy and simple to use it is. If that were the case there wouldn’t be need for all those posts!
> The world has non-text files that require version control support
Which is fine, but saying that git is rubbish because it doesn't handle binary files as you want it to is a bad usecase.
How version control would handle diffs between PDFs is not the same as photos, which is not the same as video. They have to be content aware for these sorts of things.
Well, to be frank, git doesn't really handle binary files _at all_. I don't really consider treating binary files as opaque blobs of data as 'handling them'. It's more akin to throwing your hands up and saying 'fuck it, we'll treat it like the text stuff and whatever happens, happens.' Yes, over time git has gained some capability for handling binary data as delta patches, but it is so far away from anything even remotely resembling content-awareness.
How would we represent video diffs? I don't deal with video so I'm not an expert, but there appears to be people complaining without suggesting solutions.
Not all version controlled files can be merged. That’s fine. Git sucks at working with large binary files. Perforce does not. Therefore all game devs use Perforce.
A theoretical Git2 could provide Perforce-tier support for large binary files. But, imho, this would likely require dropping the D in DVCS. I would gladly trade distributed for large binary file support. Without even a moment’s hesitation. Other people wouldn’t. But I wouldn’t call their use case “bad”.
Git is rubbish for my needs. It’s rubbish for a lot of needs. It’s ok at some things. Mercurial is much better at those things though.
Text is a universally understood file format. That means it can be easily diffed, compressed, smartly stored (like storing only changes). That's not true of binaries, at least most of them. That's why I'm pessimistic about a good universal VCS ever existing.
I would also love some kind of good (even decent would be enough) VCS for binary/non-text files. I get a stress rash just watching our designers work with "final_final_2023_1_backup_new.psd".
I think Git is relatively simple in relation to all the complicated things you can do with it. If you don't need/use those complicated things, it of course feels like way too much work for too little pay off.
Ah, this is why you have a bad attitude about the “D” part of “DVCS”, right? The workflow for art is “I need to lock that graphics asset so that no one else edits it while I’m editing it,” which you can’t do with a DVCS. Disallowing exclusive edits is generally what most programming teams want; otherwise it’s a daily occurrence to try to get someone to unlock a file they locked for editing and forgot about.
Not really. Locking of binary assets is a separate topic I’ve ignored.
I don’t actually like or dislike the “distributed” part. I don’t care about it. At all. In the slightest. Almost all projects are defacto centralized. If decentralization was free then great, more features for free.
My experience is that distributed is not free. It appears to me to be a fatal flaw that prevents Git from ever being a good tool for large projects with large binary assets, such as games. So I would sacrifice it in a heartbeat IF it gave me the features I do care about.
Locking is just punting on the problem. If a version control system can't merge non-conflicting changes to the same file by different people, that file is unmaintainable and shouldn't have been checked in as if it were source code. Instead it should be generated from a set of files that can be edited normally.
There are features which make life a lot better when you have them: A local search to explore, a way to stash the working state to try something out etc. and once you got those you have mostly a distributed VCS. True the flexibility of multiple remotes and tracking branches most users probably don't need, but it's quite nice for various things, and be it only a simple backup way.
On an aside: I always chuckle on the "distributed" VCS. Back in the days some people refered to CVS as a distributed VCS, comparing to RCS.
You might appreciate then how fossil improves on git. When that happens with fossil you can also still update the issue tracker and wiki, while many (most?) people depends on a centralized system like GitHub or GitLab for the latter.
One thing that I have observed about programmers is how obsessed they are with tooling. Whether it’s exactly which model of ergonomic keyboard or exactly which toolchain(s) they’re using, there seems to be an endless desire or eagerness to shoot the shit over tooling.
> In contrast, Fossil is a single standalone binary which is installed by putting it on $PATH. That one binary contains all the functionality of core Git and also GitHub and/or GitLab. It manages a community server with wiki, bug tracking, and forums, provides packaged downloads for consumers, login managements, and so forth, with no extra software required
git, for all its issues, is not bundling the kitchen sink. I do prefer the "do one thing and do it well" approach.