Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Tar and Curl Come to Windows (technet.microsoft.com)
77 points by tdurden on Dec 22, 2017 | hide | past | favorite | 57 comments


I'm looking for a Windows tar implementation that can save/restore Windows ACLs/DACLs and/or other extended attributes. Was thinking of taking the Red Hat GNU tar patches that handle Linux xattrs and SElinux attributes, and adapting the same technique for Windows, but haven't learned enough of the Windows API yet to make it happen. As for getting that info stored into the tar file, I believe this can be done by using PAX tar format (which can take any arbitrary name/value pairs without losing tar compatibility).

Specifically, I need this functionality to make my backup program (Snebu) more useful to Windows users, as it relies on an installed tar implementation on a client to gather files.

If anyone here is skilled in the Windows API and is interested in helping with this, let me know.


You want NtQuerySecurityObject to get a binary SD blob, convert it to a SDDL (text) format using respective API and store the result.


That thing’s unsupported for user mode apps. The correct API is GetSecurityInfo().

The main problem with backup/restore these things is SIDs. Windows ACLs are much more sophisticated then these 6 bits in Linux.

It will work in some cases, e.g. when the owners are, and all the permissions are granted/denied to, a well known (1) group such as "Everyone". Or when all these SIDs are domain users SIDs, and you’re restoring to a PC that’s on the same domain.

It won’t work in many other cases, e.g. when these permissions were granted to some local user and you’ve reinstalled Windows between backup & restore.

That’s why in Windows it is typically considered OK to not bother backing up/restoring these ACLs.

(1) https://msdn.microsoft.com/en-us/library/windows/desktop/aa3...


Thanks, I will read up on that. The problem I'm trying to solve, is I tried restoring c:\Users from a tar backup backup, and it took quite a while of clicking to get all the permissions sorted out.

As for "permissions were granted to some local user", in the tar file format it stores a UID/GID number, as well as a username/groupname, so when restoring to a different system the username takes precedence over the UID number. Could the same idea work on Windows? (Granted, I'd probably have to store this in the PAX header's name/value pairs, due to potentially longer length of the UID numbers).

The other item I've read is that Windows file permissions can be inherited from parent objects, so it may be useful to store those permissions, but not restore them (or give a warning if the effective permissions end up being different due to restoring to a different location).


GetSecurityInfo() will return you the effective set of permissions/owners. Good enough for a single file, inefficient for the complete hierarchy.

To backup the complete C:\Users directory to be restored later on the same PC, I’d use specially-designed APIs for that, BackupRead/BackupWrite.

See this article for an overview: https://msdn.microsoft.com/en-us/library/windows/desktop/aa3...

That article was written in the assumption that you’ll be using a real hardware tape for backups, but you can backup to any other place.

This will handle file data (including alternative NTFS streams) and file security, but you need to handle file names, file attributes, and timestamps somewhere outside of that. And TAR format can’t quite fit all that info. File names in TAR are limited to 100 characters, there’s only 1 timestamp in TAR but NTFS has 3, resolution of TAR timestamp is 1 second but NTFS keeps 100 nanoseconds, finally there’s no place to keep file attributes. I wouldn’t pick TAR format for that kind of backups.

Also note that NTFS hard links and EFS encrypted files both need special care to backup.


Thanks! I'll check that one out too. Note, I'm planning on using PAX extensions for tar, so I can store an unlimited amount of data name/value pairs per file, and also handle unlimited length filenames.


Yeah, PAX will probably work. But IMO it brings questionable value here.

You’ll surely be able to create a PAX-based format that your own tools will be able to backup and restore correctly. But it won’t be compatible, i.e. no other tool will be able to restore or read the data.

BackupRead API returns zero or more streams, each prefixed with a variable-length WIN32_STREAM_ID structure. For normal files without security descriptors, you could just unwrap the content of the BACKUP_DATA stream to the file content inside the TAR and you’re good.

But even if you’ll find a way to store BACKUP_SECURITY_DATA stream in these PAX extended names-values, there’s other stuff to keep somewhere. Alternative data streams, usually (when written by windows explorer) they’re quite small, dozens of bytes, but nothing prevents them from being gigabytes, I’m not sure PAX will be happy about values that large. And also there’re sparse files, BackupRead won’t return you these zeroes like ReadFile, it’ll return you a BACKUP_SPARSE_BLOCK stream instead, with just the non-zero portions of the file.

If you won’t bother parsing the backup data and unwrapping the streams, it’ll work for your tool, but your TARs will be incompatible with standard TAR tools which won’t see the content of your backup.


What about storing each stream as a separate logical file in the tar file? So you have the main file, with a property of "nt_stream_count=2", and the file name for each stream would be the main filename followed by ":stream_id". That way each stream would show up as a separate file upon restore (assuming a standard tar is used for restoring), but the custom tar would know what to do.

Also, unrecognized pax data would be ignored by a regular tar.

And in my use case, when data is submitted by the client to the Snebu backend, it has its own tar implementation built in for separating file data from metadata, and re-synthesizes a tar file upon restore. (The reason that Snebu uses tar format for transfering, is it was a good way to make it agentless for normal Unix/Linux servers, yet extensible enough to support other use cases. Also I figured that most tar implementations were fairly well optimized, at least more so then I could do in a short amount of time).

Thanks for your input, really appreciate it. Been a little bit tough learning the WIN APIs, after a couple decades lost in Unix land.


> What about storing each stream as a separate logical file in the tar file?

I think that’ll work. Just be sure to name these separate streams in a way so they never conflict with other files that might be in the same directory or with other streams that may be in the same file. E.g. in Windows, there’re characters forbidden in file names which may work fine in Linux: https://stackoverflow.com/a/31976060/126995

> with a property of "nt_stream_count=2"

Read this: https://msdn.microsoft.com/en-us/library/windows/desktop/aa3...

I don’t think just count is enough. Alternate data streams have names. Other data streams, extended attribute, security descriptor, etc., are essentially unnamed.

You can implement some naming schemes for them, e.g. “file” for the main data, “file:xxx” for “xxx” alternative stream of the file, “file?sd” for security descriptor of the file, “file?sparse” for sparse file data. This can result in TAR backups that are more or less readable by the standard tools, but still allow your own tool to restore the complete thing.

You have two problems to solve.

1. Directories. Quite often, they contain at least security descriptors. But AFAIK directories just aren’t stored in these TARs.

2. File names and their encoding. AFAIK modern Linux is UTF8 all the way down. Probably TAR is OK with that. But if UTF8 representation of the UCS2 name (WinNT isn’t quite UTF16, UCS2 is a subset of that) is longer than 100 bytes, you’ll have multiple files in your TAR with the same name, they’ll only differ by your extended PAX attributes that contain the complete name. If you combine that with the above fake names with security info & alternate streams, it becomes even more complex.

P.S. I’ve been programming for windows for decades, only occasionally for linux or other platforms.


Excellent. But we could have had this years ago had Microsoft maintained an up to date and standards-compliant C library instead of promoting their insular and proprietary "visual" C++ ecosystem. In fact, there probably could be a whole host of GNU/Linux software running on Windows. But no, we can't even get a complete C99 library.


Microsoft moved to C++ a decade ago, the C compiler exists only for backwards compatibility and generates substandard code vs the same file compiled through the C++ compiler. C features are only supported as required by the C++ standard, although I personally miss VLAs.


C and C++ are not alternatives, so a platform "moving to" one doesn't mean it has to forego the other. Linux, the BSDs, macOS all have both C and C++ toolchains.


I'm just explaining what they did. They obviously could have continued to support C if they wanted. But if your waiting for C99 support your wait is in vain.

These days you can use clang so a nice C compiler has returned.


> Developers! Developers! Developers!

A laughing stock turned into a respectable trend. Well done.

I think with this, PowerShell needs to stop aliasing curl to Invoke-WebRequest by default (or at least try to deprecate that alias).


The comments to the article address the PowerShell alias: it sounds like PowerShell is hesitant to change that alias and break the users that do expect curl to alias Invoke-WebRequest, so the suggestion was to get into the habit of using `curl.exe` as your invoke in PowerShell for real curl.

A deprecation warning would be a good idea, though, to suggest to developers that are using the curl alias to move to iwr.


Wondering how long it'll be before we have `curl ... | cmd` as standard installation instructions on Windows.

(Probably a while: command line familiarity is far from a given in Windows)


Using PowerShell to download and execute a scriptblock is a common malware technique right now, but it's not something any sane software vendor would recommend for installing an application.


If I'm not mistaken executing a remote unsigned powershell script is the recommended installation for chocolatey.


That's what the Linux folks thought, and here we are.


Just like open source dev tools in Linux I see plenty in Windows, like chocolately package manager and others, recommend just that.


:D Sounds like there will be no Windows 11 but instead

"GNU Windows" ;-)


Though bsdtar and curl don't have a GPL license.


I'd rather say: bsdtar and curl are not GNU software.

(Not all GNU software is under GPL.)


I wouldn't be surprised if Windows was open-sourced within 20 years.


If they would add SSH drive mapping...


What is this ad campaign Microsoft is pulling?

Porting simple UNIX tools from the 70-80s to Windows and making a big fuzz about it? Tar? unzip.exe or whatever equivalent with a gui has been around for ages.

I just can't see how any self-respecting developer would be lured to Windows by this...


That's nice but why is the example with zip files?


The ancients knew this technology. Around 1990, tar.exe came bundled into some OEM edition(s?) of MS-DOS, I don't specifically remember which.


Who is this new Microsoft?!



Still no X, correct?


I don't think it would give them strategical advantage


It would allow me to function normally on a windows box. I can apt-get install everything I need except anything that depends on X.

I'd still rather run GNU/Linux from the metal up, but my work depends on Windows for legacy data-acquisition systems. Right now, I have an experiment that's running a linux box and windows box side by side in order to get the best of both. Working out hardware access in a VM sounds unpleasant, so I'd love to see the native-windows-GNU implementation extended to X.


For the five or six years my daily driver has run windows (gaming) while actual work gets done by forwarding X from a headless linux box. In my experience:

* MobaXTerm was usable (on Win7-8, a couple years ago), but IIRC it's surrounded by a faint stench of upsell and I recall being annoyed by its attempt to be a full desktop environment (tabs, file manager, integrated editor, etc). It feels very... windows-y, the X server equivalent of PuTTY.

* VcXsrv had (on Win10, about two weeks ago) a show-stopper bug - windows would fail to redraw outside their original bounds when resized. It also had trouble using modern UI toolkits or themes.

* Cygwin/X is the best, no issues at all if you're in cygwin, but I was unable to get it to cooperate smoothly with bash on windows and ended up deciding that the environment was worth more than the tiny missing features. If you don't need WSL I'd recommend this.

* Xming is what I've settled on. I use the free version, which is a major version behind (6.9 instead of 7.7), but I haven't noticed it lacking anything compared to Cygwin/X. Seems to be just as reliable as Cygwin/X too. Getting it configured was somewhat annoying (had to create xauthority manually, etc), but now that it's set up it's working smoothly.

I am mostly annoyed by two things, both of which are common to the setup and not the chosen tools:

* I have to put up with Windows for window management. I would much prefer something like i3, but I haven't gotten the fullscreen mode to work to my satisfaction.

* Network issues. You must be wired and you must have spare bandwidth. If you're running gigabit ethernet this isn't really an issue, but if you try to do it over wifi you're going to have a bad time.

Compared to running a virtual machine... it really comes down to picking a set of annoyances. VM you have to deal with configurating another machine, it devours resources, integration with the host isn't as good, and it can't tolerate monitor switching at all. Native X you don't get a usable WM. shrug


An issue I've had with Xming and Cygwin, is if I rdp to my windows desktop from my laptop (potentially caused by using a different screen resolution / color depth), sometimes the X server will crash when I go back to my full desktop.


Is it the look and feel (I mean which gestures do a thing, location of widgets, etc.) that's the blocker, or is it the availability of certain X applications that you're used to?


vcxsrv implements an x server in win32. Might be worth a try.


There are several implementations of X for win32/64 (there's a good one in Cygwin)-- I'm looking for something integrated with Microsoft's bare-metal implementation of Ubuntu.

TL;DR: I want to 'apt-get install lyx octave gnuplot chromium darktable' with full functionality.


I've used vcxsrv to run X applications from the Linux subsystem ubuntu. It worked quite well for the Linux apps I needed to run.


I've used MobaXterm with WSL before and seemed to work pretty well, didn't push it but was functional for my needs.


Cygwin.


cool. I want wget now


I’ve been using wget on Windows for years, though it’s not built in.


It's possible to download a static executable of the latest Wget releases for Windows.

They tend to work without any issues.


Invoke-WebRequest is good enough for a lot of use cases. Curl should handle the rest.


Got to love their naming scheme, Invoke-WebRequest url just flows off your fingers.


I think it can be shortened to “iwr” (it’s aliased to “curl”)


It's also aliased to wget.

The Verb-Noun verbosity of PowerShell is great for A) discovering new verb and noun combinations that you hadn't considered before (and Get-Verb gives you a list of common verbs), and B) for keeping scripts readable in the long term. Most every Verb-Noun has at least one shortcut alias, most based on common cmd/bash-isms (Get-ChildItem has gci, ls, and dir), and Get-Help on any Verb-Noun will list the aliases for you.


With respect to (B): I do try to write out the full verbose names (Verb-Noun) when writing scripts, as a maintenance goal. At that point when you are writing a script the verbosity is often less of a problem because you can write in an IDE of your choice that provides strong auto-completion. (You can also get basic tab completion in the command prompt and use an IDE as your command prompt, if for some reason you are entirely allergic to the short aliases and want to spell out the full things even for random REPL work.)


So, you have curl and tar, but no pipe (|) on windows.


Pipes have existed in DOS since before Windows 95 was around!


Since DOS 2.0, IIRC.


But not really. Using pipe means it dumps the entire output to a temporary file then feeds that temp file to the next program.

I'm not sure if Windows still does this. Probably does.


Pipe works fine. I just ran dir | clip to make sure.


[flagged]


I may not be a console expert, but I do know not to run things unless I understand what they do.

This is a bad joke. You might say it bombed.


Well you can and apparently did google it; it's just a fork bomb. It'll spawn new processes until the system figures out what is going on or chugs to a halt. It's not particularly harmful - modern systems tend to have process limits. But I guess it is Microsoft...ctrl+s first.

Serious question, though; I'm sort of curious how the whole Windows/shell thing is coming along.


install it on Linux and see for yourself

https://github.com/PowerShell/PowerShell




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: