Hacker News new | past | comments | ask | show | jobs | submit login
Tup – A file-based build system for Linux, OS X, and Windows (gittup.org)
169 points by mynameislegion on Oct 2, 2016 | hide | past | favorite | 99 comments



I'd not heard of tup, thought I'd try it out on Windows. Unfortunately I hit a bug straight away: Tup is not directly compatible with MSVC 2015 (without disabling VCToolsTelemetry.dat generation in the registry) [0].

I don't fancy adopting a tool that forces me to opt-out of being able to send compiler debug telemetry to Microsoft the next time I hit a compiler bug.

There is a nice (but a bit dated, 2010) review here [1] which discusses some other features and shortcomings.

[0] https://github.com/gittup/tup/issues/182

[1] 2010: https://chadaustin.me/2010/06/scalable-build-systems-an-anal...


I didn't see any examples of .phony type rules. Can tup do this?

I've recently found myself returning to make for multi-build system orchestration, e.g. Rust and C and PHP libraries.

Does anyone have examples of using tup for this type of thing?

Also, I've found myself really enjoying declarative build systems more, e.g. Cargo or Maven. It seems like for C there could be a simple set of standard tup files that are run by a tool like Cargo over a standard tree layout. I didn't notice this in there, but could see a simple wrapper to tup to give this experience to almost any language. In fact, maybe using Cargo as a base and adding tup as a supported src type or something through a Cargo extension. It would probably need to be a default for the entire project for sanities sake.


I heard about tup a while back and finally attempted to look into it today and try replacing some of my Makefile's with Tupfile's. Unfortunately my googling and researching all seem to indicate that tup simply doesn't support any type of .PHONY targets. To that end, `tup` also doesn't seem to support even "basic" stuff (from my POV) like a `clean` and `install` target - which is fine, except that without .PHONY targets you can't add support for such things in your Tupfile's. Even the tup project's Tupfile doesn't support any way of directly installing it, I had to just copy the files by hand.

IMO that's just a killer for me, which stinks because I don't really feel like such a feature should be that complicated, but the developers seem to take the stance that phony targets, and `clean` and `install` are unnecessary.


> ...phony targets, and `clean` and `install` are unnecessary.

For clean, there is a solution if you are using git; tup can generate a .gitignore for all the output files, and there is a git command to remove all ignored files. I'm with you on "clean" being something that is both easy to implement and useful in tup.

The author's stance on phony and install isn't that they are unnecessary, but rather that they are orthogonal to the problem Tup is trying to solve. install and phony targets are handled by an external script. I will have either a Makefile or a shell script that perform the tup invocation as part of the process.

The thing that tup does well is prevent you from making certain mistakes in your build system. If you have a missing dependency, it will error out; I have actually found bugs in makefiles that I converted to tup.


I read about the solution using `git`, but it definitely seems like side-stepping the problem. If you're going to provide a way to get a list of all the generated files so you can remove them by other means, why not just allow you to do it directly? But I don't think we're in disagreement here.

Perhaps I just don't completely understand what problem tup is trying to solve then. When I read about tup, I picture it being a complete replacement for something like make (And indeed, it says as much right on the website). Packaging a Tupfile along with a Makefile seems like an annoying solution to something that I really don't think should be a problem in the first place. It seems like taking a stance to a bit of an absurd degree for a feature I really don't think is that big of a deal. The author is free to do what they want, but I think they're sacrificing usability for purity.

Tup is appealing to me, but a tup/make combo isn't nearly as appealing.


Not providing a .PHONY equivalent makes sense to me: why not just turn those into separate shell scripts?

Clean and install seem like special cases though, the build system already knows what to clean up (or, at least, Tup seems to), and install should be handled by a proper package instead.


Generally speaking a lot of .PHONY targets require information that the build system, and build system alone has. You would have to replicate that information in the scripts if you wanted them to work, which is at the very least a messy situation, and somewhat defeats part of the purpose of using tup in the first place. This is obvious for targets like `clean` - hence why you suggest it should be supported in the build system - but most phony's require a certain amount of information from the build system.

I also disagree on `install`. `install` is useful for creating packages in the first place. The build system already knows (or is told) which files are suppose to go in which general locations (configured via environment variables for the paths) and then install the correct files for your configuration. It is easy enough to use this to create a package by installing into a separate directory then /. The key is that the build system knows all the information to make `install` work as wanted (Along with being provided some paths). Separate scripts would not.

Providing nothing at all to do for an `install` like the tup project does is simply not user-friendly. I install packages directly to `/usr/local/` all the time, and even ignoring that use-case there's no reason to leave the user wondering "Where do I put what files?". The developer know this information, the person attempting to use it or make a package generally doesn't (Or doesn't know all of them). Allowing you to define such a 'command' in the build system makes it a lot easier to write, and providing such a command makes the entire project tons easier to use and make into a package.


We use Tup to build Flynn [1], it's a pretty neat build system. The only real drawback is that it's hard to get to work on some operating systems because of dependency on FUSE. That said it still beats the crap out GNU Make or CMake etc.

[1] https://github.com/flynn/flynn


> That said it still beats the crap out GNU Make

How? Their description was useless in describing why it's "obviously" so much better.


> See the difference? The arrows go up. This makes it very fast.

I don't think I've seen a more intelligence-insulting description of a product.


Oh, you didn't read far enough; the description is written with a sense of humor, essentially parodying the very "intelligence-insulting descriptions" that one sees frequently on the internet.


Well that was a bit light hearted but that's highlights the big difference: tup works bottom up, make works top down.

The paper linked there provides a more thorough explanation.


it's a joke...


The very last part of this page explains it quite well:

http://gittup.org/tup/ex_a_first_tupfile.html


GNU make starts of with targets, and then talks the build graph (target -> source) to find source files which will re-build the targets. And GNU Make has to scan the files.

Tup starts off with the list of sources which have changed, and then walks the build graph (source -> target) to find out which targets need to be rebuilt. And Tup gets notified of file changes.

So where Make scans all targets and all source files, Tup gets told which source file has changed, and then rebuilds only the necessary targets.

That being said, the following comparison is a bit odd:

http://gittup.org/tup/make_vs_tup.html

I don't think it's a requirement of GNU Make that building a project with N source files takes exponential time. I think its just that the GNU Make code is terrible.


GNU Make doesn't take exponential time, it takes ~linear time. The Y value on the graphs is exponential, because the input is growing exponentially.

The graph is just rubbing in that a noop Tup build is O(1) instead of Make's O(n) by showing crazy large n. Which is certainly great on Tup's part![1]

The Tup author's description of Tup's algorithm in comparison has always rubbed me the wrong way, though. It isn't some entirely new generation of build system like they present in the Tup paper. It's just caching stat(2) between iterations[2]. The build naturally flows down like in the Make diagram on the main Tup page; but with Tup we cache the status of each node, and become informed when one of them becomes invalid; then the upward flowing arrows of the Tup diagram is the cache-invalidation of the lower nodes flowing up to the higher nodes. This is the totally natural way of implementing caching if we were to add that to GNU Make.

So why don't GNU Make and other systems cache the DAG of stat(2) calls? As the saying says, cache invalidation is hard. Tup gets away with caching by receiving an a priori list of filesystem changes, so it knows which nodes to invalidate; other build systems have to scan the filesystem for changes. Tup gets this list of changes by having a FUSE filesystem sit between the user and the actual filesystem, logging all changes. While this is great for many users, it isn't quite general. It will break if we edit anything "offline", perhaps on a thumb drive on another computer. It will break in a number of different NFS setups. I don't mean to say that the caching doesn't have value, but it isn't the silver bullet that the Tup author makes it out to be.

[1]: In the Tup paper, the author gives some fancy O(...) expressions. These are... odd; they're kind of hand-wavey, as they are based on expected fan-out of directories, and don't really give a good apples-to-apples comparison. They certainly are valuable to analyze, but are pretty misleading if you aren't critical.

[2]: OK, that's not strictly true. The "Beta build system" (the next generation of build systems that Tup is supposedly part of) algorithm also has the feature that it will automatically remove a generated file that was part of the DAG, but isn't anymore. This is a nifty feature that is made possible by the caching; but IMO it doesn't constitute a core change.


> Tup gets this list of changes by having a FUSE filesystem sit between the user and the actual filesystem, logging all changes

This is what I was really looking for. Instead of querying stat on all files at run time, it uses a filesystem to know at run time what changes were made.

Clever, so long as the FUSE layer never makes a mistake.


While I can believe the claim it's better than make, it would be way more interesting to see how it compares to Bazel. After using it (or rather Blaze) at Google and now Bazel at Improbable, I consider it the gold standard in build tools.

If anything, I wish Google would open source the rest of the build "ecosystem" that together with Blaze let you build the whole codebase in seconds. It was pretty amazing.


I've also been using Bazel for a couple of months now and it's fantastic. Simple, declarative syntax, blazingly fast. I can't recommend it highly enough.


I'm a Bazel person myself as well, but tup with the lua syntax might be getting somewhere.


This reminds me of DJB's ideas for a build system, redo [1]. However, it never seemed to gain any traction. (or did it? [2])

[1] http://cr.yp.to/redo.html

[2] http://apenwarr.ca/log/?m=201012#14


I can recommend apenwarr's redo implementation [0]. There's occasional activity on the mailing list [1], which leads me to believe people are using it, but not promoting it much.

The apenwarr implementation includes a full Python implementation as well as a minimal version in < 200 lines of sh. The minimal version doesn't support incremental rebuilds -- the out-of-date-ness tracking in the full Python version uses sqlite -- but it's good for understanding the redo concept. Also good for embedded contexts.

I used the full redo implementation for some data processing tasks once, with mixed results. It was a situation where I couldn't declare the dependency graph up front. With redo, each target declares its dependencies locally when it builds, and redo assembles and tracks the dynamic dependency graph. It's pretty neat, but become difficult to reason about and debug. Could be that I never got comfortable with the new paradigm, or could be that essential tooling was missing, not sure. I still think redo is promising.

Anyway after a decade of messing with shiny new build tools, I finally learned to stop worrying and love the bomb (make). It's weird and warty but surprisingly capable. Worth the learning investment. Oh and jgrahamc's "GNU Make Book" is great. [2]

[0] https://github.com/apenwarr/redo

[1] https://groups.google.com/forum/#!forum/redo-list

[2] https://www.nostarch.com/gnumake


Well, if you want to see redo in action, there are plenty of implementations available: [0]

[0] http://news.dieweltistgarnichtso.net/bin/redo-sh.html


My $0.02 on Tup:

First of all, I cannot express how much more I like it than make. If Tup is an option, I will use it.

What it does well:

1) It prevents you from making dependency mistakes: it hooks into the FS layer using fuse and tracks all input and output files that are inside your build directory. If you make any mistakes that could cause a future incremental build to be improper, it errors out rather than continuing.

2) It is opinionated about how your project should be structured. This has some negatives if you are trying to duplicate a particular structure from Make, but all in all does guide you in the right direction.

3) There isn't a lot of syntax to learn. This is good because the syntax is very different from anything else I've used.

#1 is really the killer feature for me; the amount of time I want to spend debugging makefiles is just slightly less than zero.

What it doesn't do, but I'm not bothered by:

Tup literally only manages commands that have 1 or more inputs and one or more outputs, and which must be run IFF the inputs have changed or the outputs do not exist; the outputs must be within the hierarchy of the project.

1) Configuration must be done before Tup is launched

2) Anything you might use a .PHONY for in make needs to be done outside of tup

3) Install commands must be done outside of tup. This means that configuring, installing &c. must be done outside of tup.

I find that having a make file that handles the above 3 steps works fine; others using tup tend to use a shell script.

What it doesn't do that I wish it did:

1) No clean command; I currently work around this by having it generate .gitignore file and git clean -X; still it's annoying that this isn't possible.

2) It does not handle paths with spaces. This is actually safe as it enforces relative paths, so if it works on your system it should work everywhere even if the project is unpacked to a path with spaces.


The FUSE dependancy is pretty unfortunate. Is there any way to get rid of it?

Other than that, it's pretty cool. And the creator clearly has a sense of humor, something which is far rarer than it should be.


Not sure whether it has a FUSE dependency any more.[0] Code says:

    TUP_MONITOR = inotify
[0] https://github.com/gittup/tup/blob/master/linux.tup


The same functionality should be available in various inotify / dnotify implementations.


Not really. inotify requires you to set up a watch on every single file or directory that you want to watch. To see how this escalates, install the inotify-tools and do

  inotifywatch -d /path/to/directory/with/a/lot/of/files


What about kqueue or eventports?


Kqueue and OS X's file watcher thing are worse than inotify. They give inexact reports of changes, so you have to do a bunch of manual scanning afterwards.

I really don't understand the grandparent's gripe. Inotify scales well, supports race free "watch a whole directory tree", and has a nice API.


There is a per-user limit to the number of inotify handles available (max_user_watches) and the default value is 8192.

The limit exists because there is a ~1KiB kernel memory overhead per watch (though there should really be a way for them to take part in normal memory accounting per-process).

If one wants to watch a directory tree, one needs an inotify watch handle per subdirectory in that tree. On large trees (or if more than 1 process is using inotify), that number of watches can be exceeded.

As lots of folks are looking for recursive watches, they aren't happy with needing to allocate & manage a bunch of handles when they see what they want as a single item.

That said, I'm not sure the way the kernel thinks about fs notifications internally would allow a single handle recursive watch at the moment.

In any case, the amount of info one can obtain by using fuse (or any fs like nfs or 9p) to intercept filesystem accesses is a bit larger. At the very least, one can (in most cases) directly observe the ranges of the file that were modified (though that's not quite so important for tup, afaik). There also aren't any queue overruns (which can happen in inotify) because one will just slow the filesystem operations down instead (whether this is desirable or not depends on the application).


Okay.

What about eventports?


fanotify replaces inotify


Fanotify does not replace inotify, it is a super limited thing for hooking into the vfs. It's intended audience is "people porting windows antivirus style things to linux". See the man page: http://man7.org/linux/man-pages/man7/fanotify.7.html


The related linux distribution, Gittup: http://gittup.org/gittup/


How does it handle building from LaTeX sources where you need to "rebuild" the document multiple times to get page numbers and references right?


I don't see how that could work in tup or any deterministic build system. If there is any possible way to parameterize explicit filenames unique to the stages, that will map better to tup. It requires you to be explicit about every file generated, even temp files. I've searched the mailing list for a few scenarios and the response is often to wrap tup in a makefile.


IIRC at least miktex came with a script that runs latex as often as needed until cross-references are resolved and the output is stable.

Instead of plugging latex directly into a Makefile or tup, use such a wrapper.

It seems that latexmk could be such a script: http://mirror.unl.edu/ctan/support/latexmk/README


Been using tup in production for about a year now and absolutely love it. The speed is nice, but compared to make projects where you need to 'make clean' to be sure everything gets properly rebuilt the confidence that tup will do the right thing every time is fantastic.


The index page makes it sound like a parody of something, took me a while to figure it is functional software.


I do love the quirky sense of humor though:

"In a typical build system, the dependency arrows go down. Although this is the way they would naturally go due to gravity, it is unfortunately also where the enemy's gate is. This makes it very inefficient and unfriendly. In tup, the arrows go up. This is obviously true because it rhymes."


Checkout ninja-build. Generate ninja build files automatically using CMake. Ninja is really fast.


CMake-generated ninja builds aren't fast in an oranges-to-oranges comparison:

  http://www.kaizou.org/2016/09/build-benchmark-large-c-project.html
(TL;DR: scroll to bottom of page to see "The raw results".)

A non-recursive Make build is actually pretty darn peppy for all but the largest projects. The reason ninja was faster at scale in those tests was likely because GNU Make has some very inefficient code internally which theoretically could be improved substantially with some refactoring. ninja had the benefit of a fresh implementation, avoiding decades of feature creep.

I've been writing non-recursive-style Makefiles for years. But all I ever hear is whining by project contributors about the unfamiliar syntax (e.g. having to prefix all targets and sources with a path). Yet in the same breath I'll be told to use CMake or ninja or tup or whatever the build-system-du-jour, using vastly different syntax. Ah well....

I keep dreaming of a Makefile generator that will auto-generate a non-recursive build, or perhaps add the feature into GNU Make directly. The problem with the latter approach is that one major reason to use Make (including GNU Make) is portability out-of-the-box, but OS X is stuck at GNU Make 3.81.


Thanks for the info. I just looked into it. You are right about non-recursive Makefiles. My own experience with cmake generated makefiles are actually comparable with ninja build times. Why would you want to write Makefiles by hand anyway? With CMake you can generate VS 2015 or MinGW or ninja pretty much any project file. QtCreator deals with CMake project really well. KDevelop too. I think CMake is the way to go. Then we won't be talking about ninja vs. make vs. tup


Ninja is definitely the best and simplest build system I've used. A great example of the "do one thing and do it well" philosophy.

The drawback comes from the simplicity: all dependencies have to be explicitly listed, which you probably don't want to do by hand, so you need some other tool to make the dependency list.

I haven't seen how CMake works for this purpose, and I'd like to know more. What in particular do you do with CMake to use it with Ninja?


cmake -GNinja ..


I can understand why tup would be faster than make for certain projects - when you have many targets and intermediates, it's more efficient to probe only the source files. But with ninja, it's not clear why would it be faster than make. I can see how a simpler file format makes processing the build faster, but ultimately ninja seems to be generating the same amount of IO as make.


It does seem like it should be the same as make, but Ninja really is faster -- it just has close to zero cruft, whereas make is absolutely full of cruft.

I use Ninja to build a medium-sized C/C++/ObjC project, around 1000 source files, and the dependency scan only takes a second or two. After that, it runs as fast as the compiler and linker are able.

Edit to clarify: there might not be much speed difference for a full build, but incremental builds are much faster in Ninja.


Yeah dependencies. Incremental builds are much faster. Try a make clean vs ninja clean. Ninja is a whole lot faster. I just rewrote the build scripts for a qnx makefile based system in CMake. The ninja build files brought down the build time from approximately 18 mins. to 5 mins!

Incremental build time came down from 7 mins to about 30 seconds!

Admittedly this was probably an extreme case where consultants got comfortable charging by the minute. Send me an email if you are interested in moving to CMake. I'll be willing to consult.


To the authors : It would be really great if you could compare the SCons build system to Tup (with some numbers) - so that I can convince my managers to switch :).


I may have missed how but..

Cmake solves the problem of "locate the library FOO of version X.Y, add the compilation flags, link flags, include folder, link folder, static link options, dynamic link options" and all the other details needed to make use of another software component. Sometimes that component is found in my operating systems "default" spot and other times it's in an install directory that I explicitly input. How do I tell Tup to find these components/libraries and then have Tup also add in everything needed for all the commands related to building things that use that component.

Also, I often have very different components going into different build targets that my project makes. How do the rules chain and build. In other words, just because I link one of my libraries with libssl doesn't mean I want every single source file and library I create in my project to then be linked with ssl


> How do I tell Tup to find these components/libraries and then have Tup also add in everything needed for all the commands related to building things that use that component.

You don't do that, because it's not really the job of Tup to do that. It's better to think of it as an alternative to Make as opposed to CMake, really. Because CMake is more like a generic build system (which is 'compiled' to a variety of other systems), with a billion built in rules and utilities and libraries for making the common cases easy amongst all them.

Tup is really just not in the same design space, although it is still a build tool. It'd probably be more appropriate to think of Tup as a thing that CMake would target, like Makefiles, MSVC Projects, or Ninja build files.


> just because I link one of my libraries with libssl doesn't mean I want every single source file and library I create in my project to then be linked with ssl

Is this because you want to guarantee that all libraries have the dependencies explicitly defined? If nothing in that library is used, I don't think it has an effect on the output.

Honestly just curious, bc it sounds like a pain to declare different sets of libraries inside the same project.


I can't recall what exactly but I hit expressiveness problems with tup (after proper RTFM). Probably some self referential issue. For the use case listed it's indeed very nice and very fast.


I first saw Tup a few years ago as it seemed to be the preferred way to filewatch/automagically-transpire moonscript to lua.


We've been using it for a few years. It's great. The only issue we ever had was when we tried it inside a docker container. It's related to fuse. https://github.com/docker/docker/issues/1916


Is tup capable of building out of tree? By this I mean having:

project_1/src/main.c project_1/src/something/a.c project_1/src/something/a.h

Is there some way for tup to manage discovering the files to build and everything else needed or will I need to add every file path in manually like make?


regarding "out of tree": I'm not quite sure about your explanation here (just looks like a list of source files), but presuming you mean "creates output files in a seperate directory from source", it doesn't really have complete support for that. You can use "variants" to place output files in a subdirectory of the source tree, though.

> "some way for tup to manage discovering the files to build"

Well, no. It's not a "convention" build tool like rust's `cargo` where you just place things in the default locations and it figures it out.

You can use the `run ./script args` mechanism in tup to run your own script that emits tup rules, though.

The manual has details: http://gittup.org/tup/manual.html


By out of tree I mean discover all the source files from the file tree.


Previous HN discussion from Nov 1, 2014:

https://news.ycombinator.com/item?id=8539564


> tup, transitive verb: To have sex with.

https://en.wiktionary.org/wiki/tup


It's an archaic britishism. The only time I've ever seen it actually used is in the works of Morgan Howell.


Not all that archaic. Male sheep kept, or just added to a field, for breeding are known as tups (which are distinct to rams).


Shakespeare uses it in Othello, learnt something in 9th grade English!


How does this compare in speed to cmake + ninja?


Damn this is quick.

Also, the URL made me think it was going to be on so.e GitHub competitor called gittup, heh.


Tup's main problem is it's unusual, and it doesn't have a library of build rules. But it's fast!

On a related note, I've always wondered if it was possible to have a build system based on dynamic library injection / strace.

The idea would be that you just write your build rules in shell script. Then, you run it with a special shell that catches open(), etc. in child processes (via library injection, etc). These system calls get tracked, and stored in a special build table. One that you don't have to edit.

Then, when you want to run the build again, you just re-run the magic shell. It catches the various commands, and checks their inputs / outputs, and then skips running the command if the targets are up to date.

e.g.

  $(CC) -c foo.c -o foo.o
Hmm... "foo.o" is up to date with "foo.c", so I don't need to run the compiler. I just return "success!"

That would get rid of all magic build systems. All build syntax. All dependency ordering. The build system would just take care of it itself.

I've played with this before, enough to note that it's likely possible. But I haven't got far enough to publish it.


Tup does this.

See here: http://gittup.org/tup/ex_a_first_tupfile.html

> The trick is that tup instruments all commands that it executes in order to determine what files were actually read from (the inputs) and written to (the outputs). When the C preprocessor opens the header file, tup will notice that and automatically add the dependency. In fact, we don't have to specify the C input file either, but you can leave that in there for now since we'll use it in the next section.


Not really, although it's closer than other build systems. You can omit some dependencies but not all. If I remember right (I tried out tup a few months back) you still have to specify outputs, and there are some fiddly restrictions around writing files to other directories (although they were in the process of improving that).

Writing a tup build script doesn't really feel like just writing a shell script that does the thing you want. It's very cool technology and fun to play with, but I didn't find it fundamentally easier to use than other build systems.


> Tup's main problem is it's unusual, and it doesn't have a library of build rules. But it's fast!

To be honest, the only time I haven't felt like an army of build rules isn't a weight I have to carry uphill is when I use Lein (which is more of a build project tool as opposed to something as elemental as tup).

Systems like SBT and Ant end up with things that feel to me like a complete and total reliance on hand-tooled plugins as opposed to a reliance on executing pre-existing executables. Gulp is the same way; even though there are perfectly serviceable and stream-like executables on every OS ready for use, Gulp insists you write more javascript to glue things into its awkward model.

Maybe that's not the end of the world (as opposed to yet another make-like DSL), but in cases with truly and almost comically incomprehensible codebases like SBT (oh SBT team, every time you "simplify" your codebase you totally miss the point we don't care about your abstractions and just need to write command extensions) or Gulp (oh Gulp team, please make Gulp more like Storm/Trident or Hakyll's buidler arrows so it does more than just compile javascript and sass) then you're totally out of luck.

And when we consider GNU Automake and Autoconf that do have a ton of prebuilt rules, we see an even more dire situation: where the language describing the rules is more complicated than the underlying concepts they service.


The one thing I like about Gulp is that it makes it easy to make a project out of your build process, to basically enforce your own convention over configuration. I have a single, canonical build process for all of my projects and it gets installed through NPM like every other dependency. The individual gulpfiles for my projects end up being very simple and readable. It has the side effect of drastically cleaning up my package.json file, too.

The only other thing I like about Gulp is that it works without too much hassle on Windows and *Nix.

Other than that, it's dung. Though it's certainly better than Grunt, it's not objectively good. That seems to be a problem with a lot of things these days, "relative better than the alternatives, not something you'd actually want on its own merits."


Recently I had a local Gulp expert who had written a plugin try and help me with a simple problem:

I produce a .json of load balancers from a CF stack. We need to run a command on every entry in the map to annotate it with extra information from another AWS call. It's a tricky problem not amenable to usual shell scripting (unless you use a shell with a native notion of maps and json).

For the life of us, we could not find a way to make Gulp do this without essentially writing a solid block of Javascript that just does exactly what we wanted (in which case, why use anything but the shell script approach?).

I understand that stream rejoining is a complex problem, but Gulp seems ideologically opposed to a more generic approach here. They get mad when people try to inject non-file based values into the system, and actively close off alternatives and shame them on their github issues.

Gulp is a complete mess, in my opinion.


Totally. Brocoli and Brunch are a bit better, but they all have the same fundamental problem: they require you to write plugins for things. The plugin model doesn't work here because writing plugins is so difficult, and you don't want to write this crap: you just want your project to build. In addition, all of the things you want to use in a build process expose a perfectly good shell interface, which probably streams, and doesn't require anybody to write hundred-line projects to integrate into the build system. Which just begs the question: Why aren't we using shell, and make, or npm run?


Aside, I had a similar question when I learned PowerShell. It made me question even why I was writing so many script executables.

If we rely are devoted to using reusable and small components then a shell that can directly wrap our standard libraries (and has a type system rich enough to express maps and tree types) is indispensable to truly reusable software.

Then what a make system should be is a clear and smart way to express dependencies and workloads.


I disagree, for so many reasons. PS is clunky and verbose, and textual data is actually so common that the unix toolkit is incredibly useful. Ad-hoc parsing comes up a lot, and if I have to parse a format, I'd rather it be text. Maps and trees aren't as important as you make them out to be. You can do key:value stores in text at O(n) cost, which is often G ood Enough, and rarely needed. And when you really need the extra power, take your pick: Python, Perl, Ruby, SCSH, and countless others.

A shell's primary job should be to support interactive use, and fast development of one-time-use scripts off the cuff, because those are what most of us do most commonly. PS doesn't support either all that well.


> Ad-hoc parsing comes up a lot, and if I have to parse a format, I'd rather it be text.

People often offer that "plain text is superior..." but neglect to recognize that PowerShell's model actually offers as superset of the current functionality of Bash, and that when object streams are required they are utterly indispensable.

It's also a uniquely eurocentric conceit that text is a simple stream of bytes. At the very least, it is a stream of bytes with an encoding label. Unicode is everywhere, and for some people is required to express their language. You can no more safely assume ISO-Latin-1 as you can UTF-8 or UTF-16.

Text isn't as simple as it used to be when we didn't give a damn about half the world's population.

> Maps and trees aren't as important as you make them out to be. You can do key:value stores in text at O(n) cost, which is often G ood Enough, and rarely needed.

We live in a world where Curling JSON objects or (even more problematically) protobuffs objects is the de-facto way we interact with the vast majority of remote services. Feel free to try and fit every workflow you have into jq. Feel free to keep working in an environment where even functional return values are not allowed and you keep repeating the same commands over and over.

> And when you really need the extra power, take your pick: Python, Perl, Ruby, SCSH, and countless others.

Quick question though: how is it this any different from the plugin approach from gulp? :)

> A shell's primary job should be to support interactive use, and fast development of one-time-use scripts off the cuff, because those are what most of us do most commonly. PS doesn't support either all that well.

Besides the awkward subquotation escapes for passing to commands, what exactly are you saying it can't do? Verbosity isn't exactly its problem, its editors are competent, its commandlets are quite capable, it's trivial to author more (and in many ways faster, e.g., adding a cmdlet with arguments is MUCH easier than the handrolled way in bash), you can directly call into a massive stdlib and any code you've written in .NET in any language that is supported by that target (including C#, F#, Nemerle, VB, Clojure and more) with native transitioning for all object types, and a bunch of other good features.

In essence that approach gives the shell a lot more room to do its work without calling out to black-box out-of-conceptual-model plugins, without sacrificing the power of being able to appeal to them when required.


>It's also a uniquely eurocentric conceit that text is a simple stream of bytes. At the very least, it is a stream of bytes with an encoding label. Unicode is everywhere, and for some people is required to express their language. You can no more safely assume ISO-Latin-1 as you can UTF-8 or UTF-16.

True, but how many of your text-processing utilities need to account for that? Most config file parsers will just continue to work if you insert UTF-8 into the stream, so long as you keep the markers it's working with the same.

>We live in a world where Curling JSON objects or (even more problematically) protobuffs objects is the de-facto way we interact with the vast majority of remote services. Feel free to try and fit every workflow you have into jq. Feel free to keep working in an environment where even functional return values are not allowed and you keep repeating the same commands over and over.

...I don't. Protobufs might have a market share in some places, but JSON has decent shell tooling, and you can often convert JSON to DSV in a pinch. I don't understand what you mean by "functional return values not allowed." Sure, EIAS is in full effect, but Shell pipelines are as functional as they come. And I don't repeat the same commands over and over. If I did, I'd give that set of commands a name.

>Quick question though: how is it this any different from the plugin approach from gulp? :)

Primary difference: The libraries are all already there, you don't have go get apt-get integration for the components you need, and it's often pretty quick to write what you need.

Verobosity is a huge problem. Most of the time with a shell, you'll be writing one-off scripts, or doing something interactive. If you were going to write something more than about 50 lines, or more complex than a couple of pipelines or an execution wrapper, you wouldn't be writing shell. PS is trying to go in a different direction, be a language for even those things that you'd go to another language for, but we've got perfectly adequate languages for that, tons of them, and it fails at things that shell is good at.


> True, but how many of your text-processing utilities need to account for that?

I'm not chinese, but I can have sympathy for people who are. Also who are you to tell me [UTF-8 character removed because of bad assumptions in server code] is not a valid bash variable?

> The libraries are all already there, you don't have go get apt-get integration for the components you need, and it's often pretty quick to write what you need.

This is an assumption that's very brittle though. For example, which version of Python did you target with this helper script? Do you know?

And why is that diff for Powershell? It's just a shell with better affordances for automation than bash.

> Verobosity is a huge problem.

If you want a less verbose set of powershell commands, try the latest powershell. Nearly everything you type has a short analogue. Function syntax is shorter. It's easier to do things like automate opening of remote connections (e.g., no need for a shell script to open an ssh connection and toss many streaming outputs into it on an ongoing basis).

> and it fails at things that shell is good at.

You say this but my experience has been the opposite thus far. Making functions actually functions, for example. It's amazing to be able to make a complex jq query repeatable without being forced to pipe everything over an IO stream and collect into a file and then VAR=$(cat result.json).


Well, the verbosity drop is good.

>This is an assumption that's very brittle though. For example, which version of Python did you target with this helper script? Do you know?

And PS, and every other piece of software out there doesn't have that problem? Don't make me laugh.

>And why is that diff for Powershell? It's just a shell with better affordances for automation than bash.

But shell doesn't nessarily need automation affordances: we have Python/Perl/Ruby for that. What it does need is effective interactive use, which is where I think bash has PS beat.

>I'm not chinese, but I can have sympathy for people who are. Also who are you to tell me [UTF-8 character removed because of bad assumptions in server code] is not a valid bash variable?

That's a problem with Bash itself, which wasn't what I was talking about. My point was that text processing scripts (sed scripts, awk scripts, scripts using cut, etc.) won't choke on unicode, and frequently work pretty much as expected, so long as the tool versions involved can handle unicode, and sometimes even if they can't. If bash has broken unicode support right now, that's not a problem with the paradigm, that's a problem with bash itself, which can be fixed.


Cut is not encoding aware... not all Linux distribution have encoding aware grep, and even then you'd need to tell them which encoding to use in many cases.

And I don't mean to say any one tool is free of version conflicts, but relying on many tools means relying on many versions. Reducing your total dependency graph is a lesson many language environments have taken from the mess of Java and C++.


Well, considering that the primary function of a shell is to run programs, reducing your depgraph isn't what you want.


> considering that the primary function of a shell is to run programs

The instant we start talking about automation then the repeatability of your automation has to be considered.

The shell is arguably an environment for interactive programming as much as it is a shorthand for program invocation.


See fabricate.py: https://github.com/SimonAlfie/fabricate

Here's an earlier comment I wrote about it, which is also in the middle of a discussion about Tup, funnily enough: https://news.ycombinator.com/item?id=4190804

I'm still using Fabricate quite happily today for personal projects in a variety of languages, though I've come to realise that the main downside of it is that programs run slower under strace - a typical gcc invocation to create a single object file might take say, 40% longer. Fabricate.py is also then quite slow to parse the textual output from strace.

Inspired by https://blog.nelhage.com/2010/08/write-yourself-an-strace-in... I wrote a mini-strace that's hard coded to catch the open() and other relevant calls as fast as possible. That helped a fair bit. I've done a lot of hacking on my own copy of Fabricate too which helped a bit, but ultimately I ought to make it do the dependency parsing and saving in a seperate thread, which should help a lot - I'm still not doing parallel builds so I do have spare cores sitting there doing nothing.

I tried a FUSE backend (https://github.com/SimonAlfie/fabricate/pull/60) whereby the filesystem tells you what's being accessed, and that's practically the same as full speed but the downside is that the path jiggery-pokery involved then causes some subsequent problems with eg. gdb not being able to find source files. I've gone back to the strace method for now.

Despite all these niggles, I still use this system wherever possible because it's so, so good having a 'proper' programming language to run my builds, and being able to work forwards in the manner of a shell script. And yeah the automatic detection of changes ultimately saves a lot of time. I can share some of my hacked-up scripts if anyone's interested.


I love fabricate---even helped with the OSX port. I had to give up on it when El Cap came out; I just didn't have time to work around SIP. The other problem with fabricate/tup/not-Make is they store the structure of the dependency graph. This leads to ugly build ... worries when you have subtle dependency changes caused by, e.g., updates to system headers, or interactions in large rule-bases. I just never had confidence the builds were correct.

However, using Make + ccache + a custom ar wrapper (port the deterministic flags from Linux to BSD), I got all the speed up of tup/fabricate, using off-the-shelf open source tooling. I've never looked back.


Among other options, Electric Cloud, now Electric Accelerator, is a product I tried about, oh, ten years ago. Looks like they're still around.


I don't think this really gets rid of all dependencies; it just reduces the build graph a bit. Even a shell script implicitly provides a simple dependency graph: you run the commands from top to bottom, so that's a linear order.

If you have two build targets, do you have two shell scripts? Do you create a library so they share code? That's a dependency graph too.

What's really going on is that the build system doesn't know the true dependency graph until it's run each command once. The first time around, it uses a conservative approximation that you specify.

In tup's case, you don't need to specify any dependencies that aren't themselves generated. That's generally true of header files. If you have a generated header file, you still need to explicitly add the dependency to the tup file [1].

[1] http://gittup.org/tup/ex_generated_header.html


There is such a build system, but I can't remember the name right now.

It tracks system calls to see every file opened by the compiler to produce exact dependency graphs (assuming compiler is deterministic).

The downside is that it's Linux only.

If anyone remembers the name, please do share.


I wrote something called auto-buildrequires which lets you figure out build dependencies automatically, and works roughly as you describe. https://people.redhat.com/~rjones/auto-buildrequires/


Well.. that's what tup does so it's probably what you're thinking of!


Yeah, tup does something similar with FUSE (afaik) but that's not what I'm thinking about.


Maybe you are thinking of ClearCase [1].

[1] https://en.wikipedia.org/wiki/Rational_ClearCase


No, definitely not that.




I don't know if it's the one you're thinking of, but fabricate.py does basically that.


You may be thinking of Kenton Varda's Ekam.


I've hacked on audited objects before to accomplish something similar (https://github.com/boyski/audited-objects) It uses ld preload to do it and not strace, but same idea. I was checking existing makefile rules, but was considering using audited objects to help transition to tup or bazil.

IIRC strace wouldn't work if you used any setuid binaries (horrible but our makefiles used sudo for some things).

There are also some tricky corner cases like sed -i which creates a new file then moves it.


The switch to statically compiled binaries means that approach is increasingly impractical. AFAIU Go, for example, completely bypasses libc for those system calls.

The same might be true for Java, too. Anybody know if modern JVMs bother calling into libc for basic system calls like open, or otherwise obey dynamic interposition of standard library routines?


I had the same idea. I would absolutely love this!

I haven't tried building it because it seems like it needs FUSE or something similar and FUSE is flaky on OS X. Every time I upgraded OS X I'd have a hard time getting tup to work again.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: