"This is a strange program of obscure provenance that somehow, still manages to survive in the 21st century."
-> links to wikipedia page with direct discription of lineage back to 5th ed research unix
"That weird bs=4M argument in the dd version isn’t actually doing anything special—all it’s doing is instructing the dd command to use a 4 MB buffer size while copying. But who cares? Why not just let the command figure out the right buffer size automatically?"
Um -
a) it is 'doing the special thing' of changing the block size (not buffer size)
b) Because the command probably doesn't figure out the right size automatically, much like your 'cat' example above which also doesn't
c) And this can mean massive performance differences between invocations
> Another reason to prefer the cat variant is that it lets you actually string together a normal shell pipeline. For instance, if you want progress information with cat you can combine it with the pv command
>If you want to create a file of a certain size, you can do so using other standard programs like head. For instance, here are two ways to create a 100 MB file containing all zeroes:
$ uname -sr
OpenBSD 6.0
$ head -c 10MB /dev/zero
head: unknown option -- c
usage: head [-count | -n count] [file ...]
well.. guess that wasn't so 'standard' after all..
I must be using some nonstandard version...
$ man head |sed -ne 47,51p
HISTORY
The head utility first appeared in 1BSD.
AUTHORS
Bill Joy, August 24, 1977.
$ sed -ne 4p /usr/src/usr.bin/head/head.c
* Copyright (c) 1980, 1987 Regents of the University of California.
Hmm..
> So if you find yourself doing that a lot, I won’t blame you for reaching for dd. But otherwise, try to stick to more standard Unix tools.
Like 'pv'?
edit: added formatting, sector size note, head manpage/head.c stuffs.. apologies.
The other reason for dd existence is that back in the day if you wrote to a block device with a write(2) size of anything other than a multiple of the devices actual block size you'd get a EIO or a EINVAL. So the program you used to make sure the write(2) size was always correct was dd. In the old days some truly weird block devices would accept a write(2) that used only the exact block size, i.e. you could only read and/or write one block at once (if you tried to read more than one block of data you'd get back exactly one block regardless of how big your read buffer was). Old raw CD and WORM drives come to mind. Audio CDs for example had a 2,352 byte block size (after removing the CRC).
> back in the day if you wrote to a block device with a write(2) size of anything other than a multiple of the devices actual block size you'd get a EIO or a EINVAL
It's not back in the day, it's still true. In Linux, block devices are kernel-cached by default, unless opened with O_DIRECT flag.
So, in FreeBSD "dd bs=1" will fail if it involves any disk device: disk driver will return EINVAL from read(2) or write(2) because I/O size is not divisible by physical sector size. "cat" with buffer size X (which depends on implementation) will work or not depending on divisibility of X by physical sector size, and other random factors, like short file I/O caused by delivery of the signals.
Summary: dd(1) still has its place and author of original article is getting it wrong.
Yes you are right, there are some UNIXes that ship today without a buffer (block) cache and/or don't default to using the block cache when open(2)ing a block device.
For those wondering Linux uses a unified buffer / page cache so there isn't a coherency issue. The buffer cache entries typically point to the corresponding entry in the page cache if it exists. The biggest reason the two are separate but correlated is that the block size isn't always the same as the page size.
Or, before that, just send the dd process SIGINFO to get the same output printed once. `while :; do kill -INFO %1; sleep 1; done` if you want a "progress bar" of sorts.
As an aside: On BSDs (incl. macOS), SIGINFO is also able to be sent interactively by the line driver when you type ^T (like ^C sends SIGINT.) Kind of lame that Linux doesn't follow suit [or even have SIGINFO], or we'd see a lot more programs that build in useful "prod me for an update" hooks, the way they already have "prod me to reload my config" SIGHUP hooks.
OK, I take I didn't know of BSD-specific signal. But please don't claim it's
on Unix, because it's not. Unless you can point me where in SUS (or any
other specification) it is defined, as I couldn't find it here:
http://pubs.opengroup.org/onlinepubs/009695399/basedefs/sign...
(You probably know this as you use OpenBSD, but) something I really like about BSDs is that nost of the core commands respond to ^T with progress info ofsome kin, dd included.
There's one good (?) reason to use dd with devices: it specifies target in the same command. For devices, writing to them usually requires root privileges, so it's easy to:
sudo dd .... of=/dev/...
But there's no trivial cat equivalent:
sudo cat ... > target
Will open target as your current user anyway. You can play around with tee and redirection of course. But that's getting more complicated than the original.
This is admittedly somewhat esoteric, but it seems like a stretch to say `dd` does not have some place, especially when transferring binary data in very specific ways.
Since we're sharing shell tricks: The "sudo tee > /dev/null" may be baroque, but I find it useful whenever I start editing stuff in /etc in vim, only to find that I cannot write my changes because I'm not root. In that case,
:w !sudo tee %
does the trick. (What "w!" does is send the buffer into the given shell command as stdin.)
Basically, you can use cp wherever you use dd, as long as you're not changing any low-level parameters (e.g. starting 500 bytes into the file or something).
Yes. Yes it does. Reminds me of a Sunday evening in the late 90ies when I stopped working as root all the time:
cp backup.tar.bz /dev/sda
Nowadays I would know enough to at least get the contents of the backup.tar.bz back. Back then, this was the end of both my / partition (or any other partition) and the backup of my music collection.
Still, that didn't end my love affair with Unix. It did make me a whole lot more careful though.
Well that's strictly a bug caused by mistaken use, as strings are not expanded lazily and heredocs are just another string syntax. How can one use the unix shell without string interpolation? Also, a similar programme would give the wrong result in, say, Ruby or Perl too.
Use noerror, but forget sync? Corrupt output file if there is an error. Use a bigger bs so it's not slow as treacle? A single faulty sector blows away a whole bs of data, and your output image may get unwanted padding appended to the end. Recoverable error? dd's not going to retry.
Use ddrescue or FreeBSD's recoverdisk(1). They're faster, they're safer, they're more effective, and they're easier to use.
GNU ddrescue[1] and not dd_rescue[2]. I'm adding this precision because depending on the linux distros dd_rescue package name may be ddrescue and GNU ddrescue package name may be gddrescue.
>One thing I'll often use dd for is recovering data from a failing drive.
Funnily enough, I ended up using it to accidentally name the wrong drive in the argument, and lost years of photos, music, video etc. though I suppose I can't blame dd for that :)
I think there's a rule that you're not really qualified to discuss command line tools in public until you've used dd to inadvertently eradicate an entire partition.
Personally, I'd add the use of a trailing full-stop in an rsync command in the wrong directory (e.g. in /home/user) as an alternative qualification to your rule.
I now use full paths for destination as well as source.
They could! They could be super annoying to read, too. Pipelines make more sense to humans when they always read in the same direction.
In the rare case where the volume of data is large enough to make the efficiency hit noticeable on large machines, rewriting a pipeline to eliminate a leading cat makes sense. In all other cases, it is a premature and unnecessary optimization.
would actually need to buffer the complete file, before commencing to write to the device (and showing the progress), which would defeat the whole idea of showing progress.
If `pv` doesn't know the input size, it doesn't show it. In your case, it can determine that its stdin is a file by looking at the `/proc/self/fd/0` symlink:
$ ls -l /proc/self/fd/0 < /tmp/x
lr-x------ 1 user users 64 <date> /proc/self/fd/0 -> /tmp/x
But who cares? Why not just let the command figure out the right buffer size automatically?
Because it can be a lot slower. dd is low level, hence powerful and dangerous.
And, if we are going down that rabbit hole, you don't need cat[1]
“The purpose of cat is to concatenate (or "catenate") files. If it's only one file, concatenating it with nothing at all is a waste of time, and costs you a process.”
Yep, my thought was that the UUOC critique doesn't apply to most attempts to substitute cat for dd, because typically those are copying from one (regular or special) to another, and you can't simply use redirection to accomplish this in the absence of a reader.
dd is a tool. dd can do a lot more then cat. dd can count, seek, skip (seek/drop input), and do basic-ish data conversion. dd is standard, even more standard then cat (the GNU breed). I even used it to flip a byte in a binary, a couple of times.
New-ish gnu dd even adds a nice progress display option (standard is sending it sigusr1, since dd is made to be scripted where only the exit code matters).
> Actually, using dd is almost never necessary, and due to its highly nonstandard syntax is usually just an easy way to mess things up.
Personally I never messed it up, nor was confused about it. This sentence also sets the tone of the whole article, a rather subjective tone that is.
Don't cat a file and pipe it into pv. Use "pv file" as a replacement for "cat file" and it will show you the progress as a percentage. When it's in the middle of a pipeline, it doesn't know the total size (unless you tell it with -s), so it can only show the throughput.
It's an orthogonal issue, yes, but calling stat or fstat on any block device whether from stdin or argv will return .st_size == 0, so your progress bar won't display the correct answers (or could display better answers if it used the ioctl).
It makes a decent interview question tho', "explain the difference between cat file|./prog and ./prog <file". It doesn't even matter if they get it wrong, that they even know there is a difference is a very good sign.
People who have come into SA work via being C programmers usually figure it out, they make the best SAs because they are mentally equipped to reason about a system from first principles.
A counterpoint: dd survives not because it's good or makes sense, but explicitly because it doesn't.
You wanna format a usb key? Google this, copy/paste these dd instructions, it works, move on with your life.
You wanna format a usb key using something related to cat you once saw and didn't fully understand? Have fun.
Both approaches have their weak points, but in any OS the answer to "How do I format a usb key" should not start with "Oh boy, let's have a Socratic dialog over 10 years on how to do that."
Definitely this. I have found many times that I'm an offender of these "bad practices", and usually that's because a certain pattern I learned way back in the beginnings of my Linux days still hangs around.
Embarrassingly, it took me a long time before I started reaching for man pages instead of Google. That has probably has had the biggest effect on tightening up my command line fu.
find is another tool that seems to get only one specific use case that ignores its rather large and useful toolset.
I learned Linux this way a decade and a half ago when it was far (and still is imho!) more convenient to quickly search a man page than google something. (with slow internet start times, browser startup times, etc)
Now, sometimes when people watch me work in a shared session they comment on my "peculiar" (to them) usage of flipping between -h --help and man $command, because there's a whole lot of switches I have memorized over time, but even more that I just have good reference points for.
But, bar none, what I've noticed among my peers is that the people that have always bowed to quick google solutions never really have taken the time to learn what they're doing. They almost always seems to be the 'quick fix', 'get it working now, sort it out later' types.
What about the `seek` argument which skips over some blocks at the beginning but still allocates them (unix "holes")?
Also note that there are still unix systems out there which do not support byte-level granularity of access to block devices. On those devices you must actually use a buffer of exactly the size of the blocks on the device. Heck, linux was like this until at least v2.
Also keep in mind that specifying the block size can be important, especially for efficiently reading data. standard shell tools don't just "figure it out" automatically. They guess, and sometimes those assumptions can be incorrect resulting in lower (orders of magnitude) performance.
I think dd is primarily so popular because it is used in mostly dangerous operations. Sure, using cat makes logicial sense, but if we are talking about writing directly to disk devices here I'll trust the command I read from the manual and not explore commands I think would work.
dd's "highly nonstandard syntax" comes from the JCL programming language, but it's really just another tool to read and write files. At the end of the day it's not more complex or incompatible than other unix tools. For example, you can also use tools like `pv` with dd no problem to get progress statements.
There is some truth to the fact that (if you basically already know dd like I do) then reserving it for dangerous operations is a good way to "signal" to yourself "slow down here and pay attention"
I always thought dd stood for disk destroyer, only ever used it for making low level copies of whole disks or shredding them with if=/dev/random. This thread has been informative and terrifying as I learn cat and cp are every bit as dangerous as dd! I never would expect something like cp xxx /dev/sda to actually work. Thinking about it, why should cp even support something like that? I'll copy files but I'll also DESTROY YOUR SHIT if you say so?
> I never would expect something like cp xxx /dev/sda to actually work. Thinking about it, why should cp even support something like that? I'll copy files but I'll also DESTROY YOUR SHIT if you say so?
That's the beauty of Unix.
Everything is a file. Thus every program that can work with files, can in fact work with everything.
That's the Unix way: The customer, eh, user is always right.
I once wanted to clean up backup files created by emacs (they end in the tilde character) by typing "rm <asterisk>~" - except what I did type was "rm <asterisk> ~".
(On the upside, I learned a valuable lesson that day.)
This is a good point as well. On BSD, we don't have `pv`, but we do have ^T. This will print some sort of status for just about any long running process. It prints very specialized status for certain programs aware of it.
$ dd if=/dev/urandom of=foo count=1000 bs=1000000
load: 1.76 cmd: dd 80097 running 0.00u 0.89s
11+0 records in
11+0 records out
11000000 bytes transferred in 0.947316 secs (11611752 bytes/sec)
load: 1.76 cmd: dd 80097 running 0.00u 1.68s
22+0 records in
22+0 records out
22000000 bytes transferred in 1.746013 secs (12600134 bytes/sec)
load: 1.76 cmd: dd 80097 running 0.00u 2.28s
31+0 records in
31+0 records out
31000000 bytes transferred in 2.392392 secs (12957742 bytes/sec)
load: 1.76 cmd: dd 80097 running 0.00u 2.83s
38+0 records in
38+0 records out
....
pv is available on at least FreeBSD as a package like it is in Linux. I'd be quite shocked if the other BSD's didn't have a port of it as well. There is also mbuffer.
This is a great example of why downvoting submissions should be a thing. Or at least showing the up/down tuple. I would say every upvote represents someone misled and likely to further propagate this nonsense.
I kinda thought that was supposed to be reserved for submissions that are non permissible, e.g., hate speech, sites hosting malware, or very off-topic & uninformative.
OP, your alternatives to DD are more complicated, not less complicated. I shouldn't need to pipeline two commands together just to cut off the first 100MB of a file.
The biggest counterexample to this that some people have experienced is accidentally swapping if= and of=, thus backing up their target onto their source rather than vice versa.
And I state again: if you're a future doctor and your biggest regret is that you could be $50k richer right now, I'm not inclined to do much weeping. Basically everyone has been a broke student.
True, but it feels worse to have had the opportunity in one's hands and thrown it away than never to have touched it at all. That's just how humans work - go figure.
I'll point out that dd also allows you to control lots of other filesystem and OS-related things that other tools do not. See: fsync/fdatasync. I'm not aware of any shell tools that allow you to write data like that.
An even easier solution: don't make people fall into the command line to format a USB reliably.
The command line should be reserved for times where you need the fine grain control to do something that DD is meant to do. A GUI should implement everything else in a reliable way that doesn't break half the time or crash on unexprected input.
"a paintbrush should be reserved for times when you need fine grain control to paint your bedroom. A hired painter could do it all reliably in a way that doesn't risk you painting over the crown moulding or falling off a ladder."
I've setup/configured bind a few times, and every time I wish there was a nice gui for it.. even thinking to myself I should make one.. but shortly after I've refreshed my memory of how to do something, and by then, I'm done and leave it alone for some while.
Also, I only need to remember one progress command for my entire operating system: control+t. I also get a kernel wait channel from that which is phenomenally pertinent to rapidly understanding and diagnosing what the heck a command is doing or why it is stuck.
I hate what Linux has done to systems software culture.
I think this article is full of "alternative computer science" and reminds me other article, published here as well, about the obsolescence of Unix. The only good thing is this discussion thread.
Interesting assertion. Can you show me a shell invocation without using dd that cuts off the first 16 bytes of a binary file, for example? This is a common reason I use dd.
This nearly tells you all you need to know. The other bit of info you'll want to note is that head -c +N produces as many bytes as you ask. So if you try to get the prefix using "head -c +N" and the suffix using "tail -c +N" then you'll have 1 byte of overlap.
(dd's corresponding options do not suffer from this problem.)
To expand, `-c` tells tail to start on the nth (starts counting at 1) byte. So +1 starts at the beginning, +17 starts after the first 16 bytes. `-n` is lines, `b` is 512-byte blocks.
The fact that it was added relatively recently is exactly why it's so obscure. Unlike if, of, bs and count, I haven't had status=progress drilled into my head by every single dd command I've read out of a manual or tutorial, so even now I still forget whether it's "status=progress" or "progress=status" or something else.
Also it's a victim of dd's bizarre non-Unix syntax - an option like "--status" or "--progress" would be more in keeping with expectations.
dd precisely controls the sizes of read, write and lseek system calls. This doesn't matter on buffered block devices; there is no "reblocking" benefit.
Some kinds of devices are structured such that each write produces a discrete block, with a maximum size (such that any bytes in excess are discarded) and each read reads only from one block, advancing to the next one (such that any unread bytes in the current block due to the buffer being too small are discarded). This is very reminiscent of datagram sockets in the IPC/networking arena. dd was developed as an invaluable tool for "reblocking" data for these kinds of devices.
One point that the blog author doesn't realize (or neglects to comment upon) is that "head -c 100MB" relies on an extension, whereas "dd if=/dev/zero of=image.iso bs=4MB count=25" is ... almost POSIX: there is no MB suffix documented by POSIX, only "b" and "k" (lower case). The operator "x" is in POSIX: bs=4x1024x1024.
Here is a non-useless use of dd to request exactly one byte of input from a TTY in raw mode:
dd is for handling blocked data, while cat, redirection and pipelines are completely useless for that, since they are not meant to manipulate blocks of data, but streams. They do not compare (apart from really simple cases where either will do, like copying a file into some other file); this blog posts mainly highlights that neither the author nor many tutorial writers now the difference.
I mean, are they bad instructions? They work for most people, and they're what most people are familliar with. If I ask a random *ix user what's wrong with my shell command, they're more likely to know about dd.
-> links to wikipedia page with direct discription of lineage back to 5th ed research unix
"That weird bs=4M argument in the dd version isn’t actually doing anything special—all it’s doing is instructing the dd command to use a 4 MB buffer size while copying. But who cares? Why not just let the command figure out the right buffer size automatically?"
Um -
a) it is 'doing the special thing' of changing the block size (not buffer size)
b) Because the command probably doesn't figure out the right size automatically, much like your 'cat' example above which also doesn't
c) And this can mean massive performance differences between invocations
> Another reason to prefer the cat variant is that it lets you actually string together a normal shell pipeline. For instance, if you want progress information with cat you can combine it with the pv command
Umm:
that was hard.>If you want to create a file of a certain size, you can do so using other standard programs like head. For instance, here are two ways to create a 100 MB file containing all zeroes:
well.. guess that wasn't so 'standard' after all.. I must be using some nonstandard version... Hmm..> So if you find yourself doing that a lot, I won’t blame you for reaching for dd. But otherwise, try to stick to more standard Unix tools.
Like 'pv'?
edit: added formatting, sector size note, head manpage/head.c stuffs.. apologies.