Hacker News new | past | comments | ask | show | jobs | submit login
The Cult of DD (eklitzke.org)
269 points by eklitzke on March 17, 2017 | hide | past | favorite | 171 comments



"This is a strange program of obscure provenance that somehow, still manages to survive in the 21st century."

-> links to wikipedia page with direct discription of lineage back to 5th ed research unix

"That weird bs=4M argument in the dd version isn’t actually doing anything special—all it’s doing is instructing the dd command to use a 4 MB buffer size while copying. But who cares? Why not just let the command figure out the right buffer size automatically?"

Um -

a) it is 'doing the special thing' of changing the block size (not buffer size)

b) Because the command probably doesn't figure out the right size automatically, much like your 'cat' example above which also doesn't

c) And this can mean massive performance differences between invocations

> Another reason to prefer the cat variant is that it lets you actually string together a normal shell pipeline. For instance, if you want progress information with cat you can combine it with the pv command

Umm:

  dd if=file bs=some-optimal-block-size | rest-of-pipeline
that was hard.

>If you want to create a file of a certain size, you can do so using other standard programs like head. For instance, here are two ways to create a 100 MB file containing all zeroes:

  $ uname -sr
  OpenBSD 6.0
  $ head -c 10MB /dev/zero 
  head: unknown option -- c
  usage: head [-count | -n count] [file ...]
well.. guess that wasn't so 'standard' after all.. I must be using some nonstandard version...

  $ man head |sed -ne 47,51p
  HISTORY
     The head utility first appeared in 1BSD.

  AUTHORS
     Bill Joy, August 24, 1977.
  $ sed -ne 4p /usr/src/usr.bin/head/head.c
   * Copyright (c) 1980, 1987 Regents of the University of California.
Hmm..

> So if you find yourself doing that a lot, I won’t blame you for reaching for dd. But otherwise, try to stick to more standard Unix tools.

Like 'pv'?

edit: added formatting, sector size note, head manpage/head.c stuffs.. apologies.


The other reason for dd existence is that back in the day if you wrote to a block device with a write(2) size of anything other than a multiple of the devices actual block size you'd get a EIO or a EINVAL. So the program you used to make sure the write(2) size was always correct was dd. In the old days some truly weird block devices would accept a write(2) that used only the exact block size, i.e. you could only read and/or write one block at once (if you tried to read more than one block of data you'd get back exactly one block regardless of how big your read buffer was). Old raw CD and WORM drives come to mind. Audio CDs for example had a 2,352 byte block size (after removing the CRC).

EDIT: fixed man section references


> back in the day if you wrote to a block device with a write(2) size of anything other than a multiple of the devices actual block size you'd get a EIO or a EINVAL

It's not back in the day, it's still true. In Linux, block devices are kernel-cached by default, unless opened with O_DIRECT flag.

In general UNIX case (for example, FreeBSD), they aren't: https://www.freebsd.org/doc/en/books/arch-handbook/driverbas...

So, in FreeBSD "dd bs=1" will fail if it involves any disk device: disk driver will return EINVAL from read(2) or write(2) because I/O size is not divisible by physical sector size. "cat" with buffer size X (which depends on implementation) will work or not depending on divisibility of X by physical sector size, and other random factors, like short file I/O caused by delivery of the signals.

Summary: dd(1) still has its place and author of original article is getting it wrong.


Yes you are right, there are some UNIXes that ship today without a buffer (block) cache and/or don't default to using the block cache when open(2)ing a block device.

For those wondering Linux uses a unified buffer / page cache so there isn't a coherency issue. The buffer cache entries typically point to the corresponding entry in the page cache if it exists. The biggest reason the two are separate but correlated is that the block size isn't always the same as the page size.


cheers both -

The raw/cooked device thing crossed my mind but thought it would distract from the point-by-point here..


>> if you want progress information with cat you can combine it with the pv command

Since coreutils-8.24[1], dd "accepts a new status=progress level to print data transfer statistics on stderr approximately every second."

> Because the command probably doesn't figure out the right size automatically . . . this can mean massive performance differences between invocations

For anyone who's wondering, here are two good threads on determining optimal block size: https://superuser.com/questions/234199/good-block-size-for-d... http://stackoverflow.com/questions/6161823/dd-how-to-calcula...

[1] http://savannah.gnu.org/forum/forum.php?forum_id=8309


Or, before that, just send the dd process SIGINFO to get the same output printed once. `while :; do kill -INFO %1; sleep 1; done` if you want a "progress bar" of sorts.

As an aside: On BSDs (incl. macOS), SIGINFO is also able to be sent interactively by the line driver when you type ^T (like ^C sends SIGINT.) Kind of lame that Linux doesn't follow suit [or even have SIGINFO], or we'd see a lot more programs that build in useful "prod me for an update" hooks, the way they already have "prod me to reload my config" SIGHUP hooks.


On Linux at any rate dd responds to SIGUSR1 in a similar fashion.


Erm... It's the first time I see SIGINFO signal. I think you meant SIGUSR1.


No, I think GP means SIGINFO, which the BSD family uses. On GNU/Linux SIGUSR1 is a passable substitute. See http://www.unix.com/man-page/FreeBSD/3/siginfo/ and https://unix.stackexchange.com/questions/179481/siginfo-on-g... for more about it.


On Unix, there's SIGINFO, which doesn't exist on Linux systems. That's why coreutils' dd uses SIGUSR1 instead.


OK, I take I didn't know of BSD-specific signal. But please don't claim it's on Unix, because it's not. Unless you can point me where in SUS (or any other specification) it is defined, as I couldn't find it here: http://pubs.opengroup.org/onlinepubs/009695399/basedefs/sign...


BSD is Unix. Don't believe the lies.


> pv

(You probably know this as you use OpenBSD, but) something I really like about BSDs is that nost of the core commands respond to ^T with progress info ofsome kin, dd included.


Woah. That's neat. Thank you!


cool - by default, ^T generates the non-standard SIGINFO. That seems to be true of other BSDs as well, including OS X.


Also, Linux dd responds to SIGUSR1, writing progress info on stdout. Sth. to be vary of is that the same signal kills BSD dd.


Thank you for saving me the effort. I cringe upon seeing people throwing the word "standard" like that.


There's one good (?) reason to use dd with devices: it specifies target in the same command. For devices, writing to them usually requires root privileges, so it's easy to:

    sudo dd .... of=/dev/...
But there's no trivial cat equivalent:

    sudo cat ... > target
Will open target as your current user anyway. You can play around with tee and redirection of course. But that's getting more complicated than the original.


This. The alternative:

    sudo sh -c 'cat some.img > /dev/sdb'
or even more baroque:

    cat some.img | sudo tee /dev/sdb > /dev/null
is a pain by comparison, and the `sudo sh -c` variant has env implications when spawning a sub-shell.

I have an ARM/linux installer script that writes the u-boot image to a specific offset before the first partition:

    dd if=${UBOOT_DIR}/MLO of=$LO_DEVICE count=1 seek=1 bs=128k
    dd if=${UBOOT_DIR}/u-boot.img of=$LO_DEVICE count=2 seek=1 bs=384k
This is admittedly somewhat esoteric, but it seems like a stretch to say `dd` does not have some place, especially when transferring binary data in very specific ways.


Since we're sharing shell tricks: The "sudo tee > /dev/null" may be baroque, but I find it useful whenever I start editing stuff in /etc in vim, only to find that I cannot write my changes because I'm not root. In that case,

  :w !sudo tee %
does the trick. (What "w!" does is send the buffer into the given shell command as stdin.)


Even simpler:

    sudo cp image.iso /dev/sdb


Would that seriously work?


Yep! See this: http://askubuntu.com/questions/751193/what-is-the-difference...

Basically, you can use cp wherever you use dd, as long as you're not changing any low-level parameters (e.g. starting 500 bytes into the file or something).


Yes. Yes it does. Reminds me of a Sunday evening in the late 90ies when I stopped working as root all the time:

    cp backup.tar.bz /dev/sda 
Nowadays I would know enough to at least get the contents of the backup.tar.bz back. Back then, this was the end of both my / partition (or any other partition) and the backup of my music collection.

Still, that didn't end my love affair with Unix. It did make me a whole lot more careful though.


> cp backup.tar.bz /dev/sda

Ouch. That just hurts seeing that line.


I think the trick is to do sudo sh, or some such.


There is a trivial alternative. Just use a subshell.

    sudo (cat ... > target)


I don't know if this works but I believe it doesn't.


It doesn’t. In fact, I would be very careful using subshellsm in general, because it can lead to bugs like this one: ​http://danwalsh.livejournal.com/74642.html​ .


That link 404s due to some url-encoded garbage at the end. This should work: http://danwalsh.livejournal.com/74642.html


Well that's strictly a bug caused by mistaken use, as strings are not expanded lazily and heredocs are just another string syntax. How can one use the unix shell without string interpolation? Also, a similar programme would give the wrong result in, say, Ruby or Perl too.


    sudo bash


but a (boring) su and working as root?


One thing I'll often use dd for is recovering data from a failing drive. Can head ignore read errors? dd can.

As far as I'm concerned, dd is lower-level than most of the other utilities and provides more control over what's happening.

The author does have a point that the syntax is strange though.


dd is awful and error-prone for this sort of use.

Use noerror, but forget sync? Corrupt output file if there is an error. Use a bigger bs so it's not slow as treacle? A single faulty sector blows away a whole bs of data, and your output image may get unwanted padding appended to the end. Recoverable error? dd's not going to retry.

Use ddrescue or FreeBSD's recoverdisk(1). They're faster, they're safer, they're more effective, and they're easier to use.


GNU ddrescue[1] and not dd_rescue[2]. I'm adding this precision because depending on the linux distros dd_rescue package name may be ddrescue and GNU ddrescue package name may be gddrescue.

[1]: http://www.gnu.org/software/ddrescue/ddrescue.html [2]: http://www.garloff.de/kurt/linux/ddrescue/


ddrescue is excellent.


ddrescue is so good that that particular example feels a little strawman-ish


I use gnu ddrescue for this for its ability to log and retry failed areas. Can make recovery or verifying data easier


>One thing I'll often use dd for is recovering data from a failing drive.

Funnily enough, I ended up using it to accidentally name the wrong drive in the argument, and lost years of photos, music, video etc. though I suppose I can't blame dd for that :)


I think there's a rule that you're not really qualified to discuss command line tools in public until you've used dd to inadvertently eradicate an entire partition.


Personally, I'd add the use of a trailing full-stop in an rsync command in the wrong directory (e.g. in /home/user) as an alternative qualification to your rule.

I now use full paths for destination as well as source.


Don't drink and dd. Spoken from experience.


I thought dd could be used for this purpose and ended up with a dead drive and an unusable partial image. Now I know better and use GNU ddrescue.


Personally I prefer safecopy.


This article is full of Useless Uses of Cat[1] that could just use redirection operators. For instance,

    cat image.iso | pv >/dev/sdb
could be rewritten as

    pv < image.iso > /dev/sdb
A related mistake is the Useless Use of Echo, since any command of the form

    echo "foo" | bar
can be written using here strings as

    bar <<< "foo"
or even

    bar <<WORD
    foo
    WORD
[1] http://porkmail.org/era/unix/award.html


They could! They could be super annoying to read, too. Pipelines make more sense to humans when they always read in the same direction.

In the rare case where the volume of data is large enough to make the efficiency hit noticeable on large machines, rewriting a pipeline to eliminate a leading cat makes sense. In all other cases, it is a premature and unnecessary optimization.


Yeah, in BASH at least it's nicer to use

        <image.iso pv > /dev/sdb


pv can read files directly and give more detailed progress. No stdin redirection is needed.

    pv image.iso >/dev/sdb


>pv < image.iso > /dev/sdb

Huh? pv can cat stuff on it's own, and it will be able to make a progress bar based on the filesize

  pv image.img > /dev/sdb


Okay. I don't actually use pv, so I was just extrapolating from the original post. In any case, cat is definitely the wrong choice.


So I suppose that the command

    pv < image.iso > /dev/sdb
would actually need to buffer the complete file, before commencing to write to the device (and showing the progress), which would defeat the whole idea of showing progress.


If `pv` doesn't know the input size, it doesn't show it. In your case, it can determine that its stdin is a file by looking at the `/proc/self/fd/0` symlink:

   $ ls -l /proc/self/fd/0 < /tmp/x
   lr-x------ 1 user users 64 <date> /proc/self/fd/0 -> /tmp/x


It wouldn't. It just won't be able to predict a file size.


From what I can tell, bash uses 4K buffers for pipelines.


Useless or not, I personally prefer left to right flow.


Which has the added benefit of not overwriting the source if you mistype the operator...


You can just use

    <image.iso pv >/dev/sdb


For those of you that are blissfully unaware of what the JCL DD command looks like, here's a example (with only the DD section of the JCL shown):

  //SYSPRINT DD SYSOUT=*                                                          
  //SYSLIN   DD DSN=&&OBJAPBND,                                                   
  //            DISP=(NEW,PASS),SPACE=(TRK,(3,3)),                                
  //            DCB=(RECFM=FB,LRECL=80,BLKSIZE=3200),                             
  //            UNIT=&SAMPUNIT                                                    
  //SYSLIB   DD DSN=SYS1.MACLIB,DISP=SHR                                          
  //SYSIN    DD DSN=&SAMPLIB(IEWAPBND),DISP=SHR


It links a file name (as referenced within a program) to the proper physical file[0], conceptually like an environment variable in UNIX and Windows.

Ah, I miss elements of the mainframe days.

[0] https://www.ibm.com/support/knowledgecenter/zosbasics/com.ib...


But who cares? Why not just let the command figure out the right buffer size automatically?

Because it can be a lot slower. dd is low level, hence powerful and dangerous.

And, if we are going down that rabbit hole, you don't need cat[1]

“The purpose of cat is to concatenate (or "catenate") files. If it's only one file, concatenating it with nothing at all is a waste of time, and costs you a process.”

[1]http://porkmail.org/era/unix/award.html#cat


But if you don't use any process, what process is doing the reading?

(This sounds like a zen koan somehow.)


I like the koan. But the point should be an additional process.

In a UUOC avoidance case, it's the current process which reads, generally via stdin. Say, the shell, or dd itself with an 'if=' parameter.

Which I strongly suspect you know.


Yep, my thought was that the UUOC critique doesn't apply to most attempts to substitute cat for dd, because typically those are copying from one (regular or special) to another, and you can't simply use redirection to accomplish this in the absence of a reader.


When the file is ready, the process appears?

:-D


The Ignorance Of Err Ignorant People

dd is a tool. dd can do a lot more then cat. dd can count, seek, skip (seek/drop input), and do basic-ish data conversion. dd is standard, even more standard then cat (the GNU breed). I even used it to flip a byte in a binary, a couple of times.

New-ish gnu dd even adds a nice progress display option (standard is sending it sigusr1, since dd is made to be scripted where only the exit code matters).

> Actually, using dd is almost never necessary, and due to its highly nonstandard syntax is usually just an easy way to mess things up.

Personally I never messed it up, nor was confused about it. This sentence also sets the tone of the whole article, a rather subjective tone that is.

edit: Some dd usage examples: http://www.linuxquestions.org/questions/linux-newbie-8/learn...



Don't cat a file and pipe it into pv. Use "pv file" as a replacement for "cat file" and it will show you the progress as a percentage. When it's in the middle of a pipeline, it doesn't know the total size (unless you tell it with -s), so it can only show the throughput.


Actually, that's not completley true. pv will detect the file size if you use the shell to read a file into it, like pv < file.


At first I thought, no, that's not possible. Then I thought, no, they wouldn't do THAT would they?

But I guess they do...

http://stackoverflow.com/questions/1734243/in-c-how-do-i-pri...

I've seen that kind of brokenness from programs trying to find their binary image on disk. Don't do it, it's bad.


It doesn't need to hunt for the directory entry, just needs to call `fstat()` on the stdin file descriptor.


Unless the input file is a device, and then you need to call:

    ioctl(STDIN_FILENO, BLKGETSIZE64, &size)


But I guess this is orthogonal to where the file descriptor comes from (i.e. stdin or opening a file whose name is passed in the args)


It's an orthogonal issue, yes, but calling stat or fstat on any block device whether from stdin or argv will return .st_size == 0, so your progress bar won't display the correct answers (or could display better answers if it used the ioctl).


Oops you're right, has size in the inode.


It makes a decent interview question tho', "explain the difference between cat file|./prog and ./prog <file". It doesn't even matter if they get it wrong, that they even know there is a difference is a very good sign.


I think most people wouldn't know the difference (I had no idea!) and the knowledge might fall into the realm of obscure trivia.


People who have come into SA work via being C programmers usually figure it out, they make the best SAs because they are mentally equipped to reason about a system from first principles.


If all they want is the size of the file, then what is wrong with using fstat?

It's not the same thing as trying to walk the FS to look for the filename is silly.


    pv file.bin | dd of=/dev/sdb
is nice too. Honestly while the OP sparks a nice conversation, it is actually complaining about the use of dd out of lack of knowledge.


A counterpoint: dd survives not because it's good or makes sense, but explicitly because it doesn't.

You wanna format a usb key? Google this, copy/paste these dd instructions, it works, move on with your life.

You wanna format a usb key using something related to cat you once saw and didn't fully understand? Have fun.

Both approaches have their weak points, but in any OS the answer to "How do I format a usb key" should not start with "Oh boy, let's have a Socratic dialog over 10 years on how to do that."


This probably has more truth than most unix admins would like to admin.

"Why do we do it like that? I dunno, that's how I learned how, how do you do it?"


Definitely this. I have found many times that I'm an offender of these "bad practices", and usually that's because a certain pattern I learned way back in the beginnings of my Linux days still hangs around.

Embarrassingly, it took me a long time before I started reaching for man pages instead of Google. That has probably has had the biggest effect on tightening up my command line fu.

find is another tool that seems to get only one specific use case that ignores its rather large and useful toolset.


I learned Linux this way a decade and a half ago when it was far (and still is imho!) more convenient to quickly search a man page than google something. (with slow internet start times, browser startup times, etc)

Now, sometimes when people watch me work in a shared session they comment on my "peculiar" (to them) usage of flipping between -h --help and man $command, because there's a whole lot of switches I have memorized over time, but even more that I just have good reference points for.

But, bar none, what I've noticed among my peers is that the people that have always bowed to quick google solutions never really have taken the time to learn what they're doing. They almost always seems to be the 'quick fix', 'get it working now, sort it out later' types.


I often use google to find the answer, then go read the man page for it.


Isn't that what mkfs.* is for?


What about the `seek` argument which skips over some blocks at the beginning but still allocates them (unix "holes")?

Also note that there are still unix systems out there which do not support byte-level granularity of access to block devices. On those devices you must actually use a buffer of exactly the size of the blocks on the device. Heck, linux was like this until at least v2.


Also keep in mind that specifying the block size can be important, especially for efficiently reading data. standard shell tools don't just "figure it out" automatically. They guess, and sometimes those assumptions can be incorrect resulting in lower (orders of magnitude) performance.


Very useful when dealing with raid (make the blocks stripe sized) or tape (512 byte) or esoteric devices.

An essential tool for low level repair, like when you can guess the partition table values but there is no partition table anymore.


I think dd is primarily so popular because it is used in mostly dangerous operations. Sure, using cat makes logicial sense, but if we are talking about writing directly to disk devices here I'll trust the command I read from the manual and not explore commands I think would work.

dd's "highly nonstandard syntax" comes from the JCL programming language, but it's really just another tool to read and write files. At the end of the day it's not more complex or incompatible than other unix tools. For example, you can also use tools like `pv` with dd no problem to get progress statements.


There is some truth to the fact that (if you basically already know dd like I do) then reserving it for dangerous operations is a good way to "signal" to yourself "slow down here and pay attention"


I always thought dd stood for disk destroyer, only ever used it for making low level copies of whole disks or shredding them with if=/dev/random. This thread has been informative and terrifying as I learn cat and cp are every bit as dangerous as dd! I never would expect something like cp xxx /dev/sda to actually work. Thinking about it, why should cp even support something like that? I'll copy files but I'll also DESTROY YOUR SHIT if you say so?


> I never would expect something like cp xxx /dev/sda to actually work. Thinking about it, why should cp even support something like that? I'll copy files but I'll also DESTROY YOUR SHIT if you say so?

That's the beauty of Unix.

Everything is a file. Thus every program that can work with files, can in fact work with everything.

It's actually very liberating.


> I'll also DESTROY YOUR SHIT if you say so

That's the Unix way: The customer, eh, user is always right.

I once wanted to clean up backup files created by emacs (they end in the tilde character) by typing "rm <asterisk>~" - except what I did type was "rm <asterisk> ~".

(On the upside, I learned a valuable lesson that day.)


It isn't the same though, because not all files are non-seekable streams of bytes. In fact, most are seekable and possibly sparse.

What happens when you did a sparse file? And cp?

https://wiki.archlinux.org/index.php/sparse_file

C.f. fallocate(1,2)


I'll also DESTROY YOUR SHIT if you say so?

It better :-)

But it all comes from the unix idea of everything is a file.


1. Everything is a file; why would cp care that one file happens to be your hard drive?

2. "UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever things."

- Doug Gwyn


Cult of pv. It looks to have more command-line complexity than dd. https://linux.die.net/man/1/pv


This is a good point as well. On BSD, we don't have `pv`, but we do have ^T. This will print some sort of status for just about any long running process. It prints very specialized status for certain programs aware of it.


pv is probably more useful:

  $ dd if=/dev/urandom count=1000 bs=1000000 | pv -s 1000000000 > foo
   214MiB 0:00:16 [13.1MiB/s] [========================>                                                                                         
  ] 22% ETA 0:00:55
Compare to ^T:

  $ dd if=/dev/urandom of=foo count=1000 bs=1000000
  load: 1.76  cmd: dd 80097 running 0.00u 0.89s
  11+0 records in
  11+0 records out
  11000000 bytes transferred in 0.947316 secs (11611752 bytes/sec)
  load: 1.76  cmd: dd 80097 running 0.00u 1.68s
  22+0 records in
  22+0 records out
  22000000 bytes transferred in 1.746013 secs (12600134 bytes/sec)
  load: 1.76  cmd: dd 80097 running 0.00u 2.28s
  31+0 records in
  31+0 records out
  31000000 bytes transferred in 2.392392 secs (12957742 bytes/sec)
  load: 1.76  cmd: dd 80097 running 0.00u 2.83s
  38+0 records in
  38+0 records out
  ....


Why not using native dd parameter status=progress ?


Nice! MacOS has it too (unsurprisingly)

Linux, not. I wonder why.


You're not the first to wonder: https://unix.stackexchange.com/questions/179481/siginfo-on-g...

But it looks like the answer is just "it was complicated to implement so Linux didn't add it."


Huh, I never knew about this. It looks like Linux doesn't define SIGINFO, which is what the terminal driver would generate in response to the ^T.


pv is available on at least FreeBSD as a package like it is in Linux. I'd be quite shocked if the other BSD's didn't have a port of it as well. There is also mbuffer.

SIGINFO works on gnu dd last I tried it.


Yeah, it's just that Linux doesn't even define SIGINFO


You made my day sir


This is a great example of why downvoting submissions should be a thing. Or at least showing the up/down tuple. I would say every upvote represents someone misled and likely to further propagate this nonsense.


You can flag submissions.


I kinda thought that was supposed to be reserved for submissions that are non permissible, e.g., hate speech, sites hosting malware, or very off-topic & uninformative.


The author doesn't even give correct invocations of dd (on BSD, at least, for their last example with head).

I certainly agree the syntax of the arguments is strange, due to its age, but I don't agree that learning it is difficult or a waste of time.

All I've learned is that the author doesn't like dd well enough to learn it.


Author is wrong bs IS useful, try to dd one hard drive to another without reasonable bs (1-8M) with and without and you will see a difference.


It also doesn't care about file type. If you want to copy a malformed file dd will do it - cat won't.


cat doesn't care about file type either.


Try it on a tape device with the wrong value for bs and you might corrupt the output as a result.


OP, your alternatives to DD are more complicated, not less complicated. I shouldn't need to pipeline two commands together just to cut off the first 100MB of a file.


Honestly, this is a problem with all of the examples.

* Using "cat source > target" instead of "cp source target"

* Using "cat source | pv > target" instead of "pv source > target"

* Using "head -c 100MB /dev/zero > target" instead of "truncate -s 100MB target"


Dude's missing an important point:

If you mess up the syntax on a dd invocation, a nice thing happens: nothing.

Use a shell command and pipes, and your command better be perfect before you hit return.


though I'm not a fan of this article by any means, I've definately dd'ed the wrong disk before and slapped myself for doing so..

Usually it's not past the easily-reproducable system partition yet or on a data disk that is backed up regularly so I can recover in an hour or so..


If you dd'ed the wrong disk the syntax of your dd invocation was correct.


The biggest counterexample to this that some people have experienced is accidentally swapping if= and of=, thus backing up their target onto their source rather than vice versa.


Well that's not syntax


Somewhat related short story: Earlier this week my friend said that he dd'd away just over 50 bitcoins, back when they were worth ~$3 each.

"One of the biggest regrets of my life."


If one of the biggest the biggest lifetime regrets is a $50k financial loss, your friend is doing pretty well.


$50k is a massive sum for a lot of people...


Well yeah, he's in med school. Maybe one day it won't be a massive sum, but, for now, he's broke.


And I state again: if you're a future doctor and your biggest regret is that you could be $50k richer right now, I'm not inclined to do much weeping. Basically everyone has been a broke student.


Actually it's a $150 loss.


In this case, opportunity cost was real and even quantifiable.


Anyone who didn't invest $150 back then also missed the opportunity


True, but it feels worse to have had the opportunity in one's hands and thrown it away than never to have touched it at all. That's just how humans work - go figure.


I know. I sold 40 btc at $100 EA a while back


I'll point out that dd also allows you to control lots of other filesystem and OS-related things that other tools do not. See: fsync/fdatasync. I'm not aware of any shell tools that allow you to write data like that.


An even easier solution: don't make people fall into the command line to format a USB reliably.

The command line should be reserved for times where you need the fine grain control to do something that DD is meant to do. A GUI should implement everything else in a reliable way that doesn't break half the time or crash on unexprected input.


Some of us don't "fall" into the command line, we reluctantly get out of it, and only for specific purposes.


"a paintbrush should be reserved for times when you need fine grain control to paint your bedroom. A hired painter could do it all reliably in a way that doesn't risk you painting over the crown moulding or falling off a ladder."


I'm a Linux sysadmin/developer and I've literally never used a Linux GUI.


I want to do this so bad, but the two things that keep me running X are my browser and mpv. Oh, and viewing PDFs.



Not entirely sure but I think mpv works without X like mplayer.


I've tried the gui tools in the past, but dd just seems far easier.


I've setup/configured bind a few times, and every time I wish there was a nice gui for it.. even thinking to myself I should make one.. but shortly after I've refreshed my memory of how to do something, and by then, I'm done and leave it alone for some while.


What do you mean fall into the command line ?

The better half of my computer use happens in the command line interface, way more efficient use of my time.


Ignorance on the blocksize arg.

Also, I only need to remember one progress command for my entire operating system: control+t. I also get a kernel wait channel from that which is phenomenally pertinent to rapidly understanding and diagnosing what the heck a command is doing or why it is stuck.

I hate what Linux has done to systems software culture.


Specifiying a large block size used to help a LOT with performance. From memory shell redirection used a tiny blocksize. On Solaris at least.

And if you use dd then you probably should specify a bigger block size than the default of 512 bytes.

But yeah, most usage is obsolete.


plain cat (no options) uses max(128 * 1024, st_blksize) aligned to page_size for reads and writes.

so you get the best block size for reads and writes. I can't speak to what the shell does, though.


I think this article is full of "alternative computer science" and reminds me other article, published here as well, about the obsolescence of Unix. The only good thing is this discussion thread.


To be fair, dd was mostly a toungue in cheek reference to the overly baroque JCL command for IBM mainframes.


Interesting assertion. Can you show me a shell invocation without using dd that cuts off the first 16 bytes of a binary file, for example? This is a common reason I use dd.


tail -c +17


This nearly tells you all you need to know. The other bit of info you'll want to note is that head -c +N produces as many bytes as you ask. So if you try to get the prefix using "head -c +N" and the suffix using "tail -c +N" then you'll have 1 byte of overlap.

(dd's corresponding options do not suffer from this problem.)


That seems pretty intuitive to me:

Grab the first N bytes vs. grab everything starting from the Nth byte.


To expand, `-c` tells tail to start on the nth (starts counting at 1) byte. So +1 starts at the beginning, +17 starts after the first 16 bytes. `-n` is lines, `b` is 512-byte blocks.


One of the charms of dd is its hilarious syntax. And, used properly, it's a bit of a swiss army knife for a few different disk operations.


not sure status=progress is that obscure a command, it was added relatively recently as well (in terms of dd).


The fact that it was added relatively recently is exactly why it's so obscure. Unlike if, of, bs and count, I haven't had status=progress drilled into my head by every single dd command I've read out of a manual or tutorial, so even now I still forget whether it's "status=progress" or "progress=status" or something else.

Also it's a victim of dd's bizarre non-Unix syntax - an option like "--status" or "--progress" would be more in keeping with expectations.


dd precisely controls the sizes of read, write and lseek system calls. This doesn't matter on buffered block devices; there is no "reblocking" benefit.

Some kinds of devices are structured such that each write produces a discrete block, with a maximum size (such that any bytes in excess are discarded) and each read reads only from one block, advancing to the next one (such that any unread bytes in the current block due to the buffer being too small are discarded). This is very reminiscent of datagram sockets in the IPC/networking arena. dd was developed as an invaluable tool for "reblocking" data for these kinds of devices.

One point that the blog author doesn't realize (or neglects to comment upon) is that "head -c 100MB" relies on an extension, whereas "dd if=/dev/zero of=image.iso bs=4MB count=25" is ... almost POSIX: there is no MB suffix documented by POSIX, only "b" and "k" (lower case). The operator "x" is in POSIX: bs=4x1024x1024.

Here is a non-useless use of dd to request exactly one byte of input from a TTY in raw mode:

file:///usr/share/doc/bash-doc/examples/scripts/line-input.bash

Wrote that myself, back in 1996; was surprised years later to find it in the Bash distribution.


My most common use of dd is warming up AWS EBS volumes. http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initi...

Though fio is better because it can work in parallel.


Question: Will cat do a bit-to-bit copy between disks?


dd is for handling blocked data, while cat, redirection and pipelines are completely useless for that, since they are not meant to manipulate blocks of data, but streams. They do not compare (apart from really simple cases where either will do, like copying a file into some other file); this blog posts mainly highlights that neither the author nor many tutorial writers now the difference.


Someone should write a wiki bot to crawl through the wikis for Arch, Debian, and so forth to help rewrite all these bad instructions.


I mean, are they bad instructions? They work for most people, and they're what most people are familliar with. If I ask a random *ix user what's wrong with my shell command, they're more likely to know about dd.


They're good instructions, Brent. 13/10 would follow again.


In light of this article anything seems possible, so put a pleaserobots.txt in / google bot will modify all the wikis for you.


Instead of

    cat image.iso | pv >/dev/sdb
just do

    pv image.iso >/dev/sdb


A self submitted opinion blog post pretty much entirely wrong ending up on HN front page. What gives ?


Instead of `cat file | pv > dev` why not `pv file > dev` ?


What about writing a block into the middle of a file?


This is cat abuse


TLDR This has nothing to do with dunkin donuts




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: