The Cult of DD

cat199 · on March 18, 2017

"This is a strange program of obscure provenance that somehow, still manages to survive in the 21st century."

-> links to wikipedia page with direct discription of lineage back to 5th ed research unix

"That weird bs=4M argument in the dd version isn’t actually doing anything special—all it’s doing is instructing the dd command to use a 4 MB buffer size while copying. But who cares? Why not just let the command figure out the right buffer size automatically?"

Um -

a) it is 'doing the special thing' of changing the block size (not buffer size)

b) Because the command probably doesn't figure out the right size automatically, much like your 'cat' example above which also doesn't

c) And this can mean massive performance differences between invocations

> Another reason to prefer the cat variant is that it lets you actually string together a normal shell pipeline. For instance, if you want progress information with cat you can combine it with the pv command

Umm:

  dd if=file bs=some-optimal-block-size | rest-of-pipeline

that was hard.

>If you want to create a file of a certain size, you can do so using other standard programs like head. For instance, here are two ways to create a 100 MB file containing all zeroes:

  $ uname -sr
  OpenBSD 6.0
  $ head -c 10MB /dev/zero 
  head: unknown option -- c
  usage: head [-count | -n count] [file ...]

well.. guess that wasn't so 'standard' after all.. I must be using some nonstandard version...

  $ man head |sed -ne 47,51p
  HISTORY
     The head utility first appeared in 1BSD.

  AUTHORS
     Bill Joy, August 24, 1977.
  $ sed -ne 4p /usr/src/usr.bin/head/head.c
   * Copyright (c) 1980, 1987 Regents of the University of California.

Hmm..

> So if you find yourself doing that a lot, I won’t blame you for reaching for dd. But otherwise, try to stick to more standard Unix tools.

Like 'pv'?

edit: added formatting, sector size note, head manpage/head.c stuffs.. apologies.

tankenmate · on March 18, 2017

The other reason for dd existence is that back in the day if you wrote to a block device with a write(2) size of anything other than a multiple of the devices actual block size you'd get a EIO or a EINVAL. So the program you used to make sure the write(2) size was always correct was dd. In the old days some truly weird block devices would accept a write(2) that used only the exact block size, i.e. you could only read and/or write one block at once (if you tried to read more than one block of data you'd get back exactly one block regardless of how big your read buffer was). Old raw CD and WORM drives come to mind. Audio CDs for example had a 2,352 byte block size (after removing the CRC).

EDIT: fixed man section references

mkup · on March 18, 2017

> back in the day if you wrote to a block device with a write(2) size of anything other than a multiple of the devices actual block size you'd get a EIO or a EINVAL

It's not back in the day, it's still true. In Linux, block devices are kernel-cached by default, unless opened with O_DIRECT flag.

In general UNIX case (for example, FreeBSD), they aren't: https://www.freebsd.org/doc/en/books/arch-handbook/driverbas...

So, in FreeBSD "dd bs=1" will fail if it involves any disk device: disk driver will return EINVAL from read(2) or write(2) because I/O size is not divisible by physical sector size. "cat" with buffer size X (which depends on implementation) will work or not depending on divisibility of X by physical sector size, and other random factors, like short file I/O caused by delivery of the signals.

Summary: dd(1) still has its place and author of original article is getting it wrong.

tankenmate · on March 18, 2017

Yes you are right, there are some UNIXes that ship today without a buffer (block) cache and/or don't default to using the block cache when open(2)ing a block device.

For those wondering Linux uses a unified buffer / page cache so there isn't a coherency issue. The buffer cache entries typically point to the corresponding entry in the page cache if it exists. The biggest reason the two are separate but correlated is that the block size isn't always the same as the page size.

cat199 · on March 18, 2017

cheers both -

The raw/cooked device thing crossed my mind but thought it would distract from the point-by-point here..

miles · on March 18, 2017

>> if you want progress information with cat you can combine it with the pv command

Since coreutils-8.24[1], dd "accepts a new status=progress level to print data transfer statistics on stderr approximately every second."

> Because the command probably doesn't figure out the right size automatically . . . this can mean massive performance differences between invocations

For anyone who's wondering, here are two good threads on determining optimal block size: https://superuser.com/questions/234199/good-block-size-for-d... http://stackoverflow.com/questions/6161823/dd-how-to-calcula...

[1] http://savannah.gnu.org/forum/forum.php?forum_id=8309

derefr · on March 18, 2017

Or, before that, just send the dd process SIGINFO to get the same output printed once. `while :; do kill -INFO %1; sleep 1; done` if you want a "progress bar" of sorts.

As an aside: On BSDs (incl. macOS), SIGINFO is also able to be sent interactively by the line driver when you type ^T (like ^C sends SIGINT.) Kind of lame that Linux doesn't follow suit [or even have SIGINFO], or we'd see a lot more programs that build in useful "prod me for an update" hooks, the way they already have "prod me to reload my config" SIGHUP hooks.

dTal · on March 18, 2017

On Linux at any rate dd responds to SIGUSR1 in a similar fashion.

dozzie · on March 18, 2017

Erm... It's the first time I see SIGINFO signal. I think you meant SIGUSR1.

yjftsjthsd-h · on March 18, 2017

No, I think GP means SIGINFO, which the BSD family uses. On GNU/Linux SIGUSR1 is a passable substitute. See http://www.unix.com/man-page/FreeBSD/3/siginfo/ and https://unix.stackexchange.com/questions/179481/siginfo-on-g... for more about it.

farhaven · on March 18, 2017

On Unix, there's SIGINFO, which doesn't exist on Linux systems. That's why coreutils' dd uses SIGUSR1 instead.

dozzie · on March 18, 2017

OK, I take I didn't know of BSD-specific signal. But please don't claim it's on Unix, because it's not. Unless you can point me where in SUS (or any other specification) it is defined, as I couldn't find it here: http://pubs.opengroup.org/onlinepubs/009695399/basedefs/sign...

cat199 · on March 18, 2017

BSD is Unix. Don't believe the lies.

_pfxa · on March 18, 2017

> pv

(You probably know this as you use OpenBSD, but) something I really like about BSDs is that nost of the core commands respond to ^T with progress info ofsome kin, dd included.

arjie · on March 18, 2017

Woah. That's neat. Thank you!

ksherlock · on March 19, 2017

cool - by default, ^T generates the non-standard SIGINFO. That seems to be true of other BSDs as well, including OS X.

_pfxa · on March 19, 2017

Also, Linux dd responds to SIGUSR1, writing progress info on stdout. Sth. to be vary of is that the same signal kills BSD dd.

hisham_hm · on March 18, 2017

Thank you for saving me the effort. I cringe upon seeing people throwing the word "standard" like that.

viraptor · on March 17, 2017

There's one good (?) reason to use dd with devices: it specifies target in the same command. For devices, writing to them usually requires root privileges, so it's easy to:

    sudo dd .... of=/dev/...

But there's no trivial cat equivalent:

    sudo cat ... > target

Will open target as your current user anyway. You can play around with tee and redirection of course. But that's getting more complicated than the original.

thom_nic · on March 18, 2017

This. The alternative:

    sudo sh -c 'cat some.img > /dev/sdb'

or even more baroque:

    cat some.img | sudo tee /dev/sdb > /dev/null

is a pain by comparison, and the `sudo sh -c` variant has env implications when spawning a sub-shell.

I have an ARM/linux installer script that writes the u-boot image to a specific offset before the first partition:

    dd if=${UBOOT_DIR}/MLO of=$LO_DEVICE count=1 seek=1 bs=128k
    dd if=${UBOOT_DIR}/u-boot.img of=$LO_DEVICE count=2 seek=1 bs=384k

This is admittedly somewhat esoteric, but it seems like a stretch to say `dd` does not have some place, especially when transferring binary data in very specific ways.

majewsky · on March 18, 2017

Since we're sharing shell tricks: The "sudo tee > /dev/null" may be baroque, but I find it useful whenever I start editing stuff in /etc in vim, only to find that I cannot write my changes because I'm not root. In that case,

  :w !sudo tee %

does the trick. (What "w!" does is send the buffer into the given shell command as stdin.)

quesera · on March 18, 2017

Even simpler:

    sudo cp image.iso /dev/sdb

notamy · on March 18, 2017

Would that seriously work?

skykooler · on March 18, 2017

Yep! See this: http://askubuntu.com/questions/751193/what-is-the-difference...

Basically, you can use cp wherever you use dd, as long as you're not changing any low-level parameters (e.g. starting 500 bytes into the file or something).

pilif · on March 18, 2017

Yes. Yes it does. Reminds me of a Sunday evening in the late 90ies when I stopped working as root all the time:

    cp backup.tar.bz /dev/sda

Nowadays I would know enough to at least get the contents of the backup.tar.bz back. Back then, this was the end of both my / partition (or any other partition) and the backup of my music collection.

Still, that didn't end my love affair with Unix. It did make me a whole lot more careful though.

jacquesm · on March 18, 2017

> cp backup.tar.bz /dev/sda

Ouch. That just hurts seeing that line.

digi_owl · on March 18, 2017

I think the trick is to do sudo sh, or some such.

btilly · on March 18, 2017

There is a trivial alternative. Just use a subshell.

    sudo (cat ... > target)

_pfxa · on March 18, 2017

I don't know if this works but I believe it doesn't.

deno · on March 18, 2017

It doesn’t. In fact, I would be very careful using subshellsm in general, because it can lead to bugs like this one: http://danwalsh.livejournal.com/74642.html .

throwanem · on March 18, 2017

That link 404s due to some url-encoded garbage at the end. This should work: http://danwalsh.livejournal.com/74642.html

_pfxa · on March 18, 2017

Well that's strictly a bug caused by mistaken use, as strings are not expanded lazily and heredocs are just another string syntax. How can one use the unix shell without string interpolation? Also, a similar programme would give the wrong result in, say, Ruby or Perl too.

IshKebab · on March 19, 2017

    sudo bash

_ZeD_ · on March 18, 2017

but a (boring) su and working as root?

colemannugent · on March 17, 2017

One thing I'll often use dd for is recovering data from a failing drive. Can head ignore read errors? dd can.

As far as I'm concerned, dd is lower-level than most of the other utilities and provides more control over what's happening.

The author does have a point that the syntax is strange though.

Freaky · on March 18, 2017

dd is awful and error-prone for this sort of use.

Use noerror, but forget sync? Corrupt output file if there is an error. Use a bigger bs so it's not slow as treacle? A single faulty sector blows away a whole bs of data, and your output image may get unwanted padding appended to the end. Recoverable error? dd's not going to retry.

Use ddrescue or FreeBSD's recoverdisk(1). They're faster, they're safer, they're more effective, and they're easier to use.

bigbugbag · on March 30, 2017

GNU ddrescue[1] and not dd_rescue[2]. I'm adding this precision because depending on the linux distros dd_rescue package name may be ddrescue and GNU ddrescue package name may be gddrescue.

[1]: http://www.gnu.org/software/ddrescue/ddrescue.html [2]: http://www.garloff.de/kurt/linux/ddrescue/

sillysaurus3 · on March 18, 2017

ddrescue is excellent.

ianai · on March 18, 2017

ddrescue is so good that that particular example feels a little strawman-ish

simcop2387 · on March 17, 2017

I use gnu ddrescue for this for its ability to log and retry failed areas. Can make recovery or verifying data easier

ue_ · on March 17, 2017

>One thing I'll often use dd for is recovering data from a failing drive.

Funnily enough, I ended up using it to accidentally name the wrong drive in the argument, and lost years of photos, music, video etc. though I suppose I can't blame dd for that :)

CPLX · on March 18, 2017

I think there's a rule that you're not really qualified to discuss command line tools in public until you've used dd to inadvertently eradicate an entire partition.

keithpeter · on March 18, 2017

Personally, I'd add the use of a trailing full-stop in an rsync command in the wrong directory (e.g. in /home/user) as an alternative qualification to your rule.

I now use full paths for destination as well as source.

jackyinger · on March 18, 2017

Don't drink and dd. Spoken from experience.

bigbugbag · on March 30, 2017

I thought dd could be used for this purpose and ended up with a dead drive and an unusable partial image. Now I know better and use GNU ddrescue.

FeepingCreature · on March 18, 2017

Personally I prefer safecopy.

wwalexander · on March 18, 2017

This article is full of Useless Uses of Cat[1] that could just use redirection operators. For instance,

    cat image.iso | pv >/dev/sdb

could be rewritten as

    pv < image.iso > /dev/sdb

A related mistake is the Useless Use of Echo, since any command of the form

    echo "foo" | bar

can be written using here strings as

    bar <<< "foo"

or even

    bar <<WORD
    foo
    WORD

[1] http://porkmail.org/era/unix/award.html

throwanem · on March 18, 2017

They could! They could be super annoying to read, too. Pipelines make more sense to humans when they always read in the same direction.

In the rare case where the volume of data is large enough to make the efficiency hit noticeable on large machines, rewriting a pipeline to eliminate a leading cat makes sense. In all other cases, it is a premature and unnecessary optimization.

yjftsjthsd-h · on March 18, 2017

Yeah, in BASH at least it's nicer to use

        <image.iso pv > /dev/sdb

discreditable · on March 18, 2017

pv can read files directly and give more detailed progress. No stdin redirection is needed.

    pv image.iso >/dev/sdb

j3097736 · on March 18, 2017

>pv < image.iso > /dev/sdb

Huh? pv can cat stuff on it's own, and it will be able to make a progress bar based on the filesize

  pv image.img > /dev/sdb

wwalexander · on March 18, 2017

Okay. I don't actually use pv, so I was just extrapolating from the original post. In any case, cat is definitely the wrong choice.

amelius · on March 18, 2017

So I suppose that the command

    pv < image.iso > /dev/sdb

would actually need to buffer the complete file, before commencing to write to the device (and showing the progress), which would defeat the whole idea of showing progress.

heinrich5991 · on March 18, 2017

If `pv` doesn't know the input size, it doesn't show it. In your case, it can determine that its stdin is a file by looking at the `/proc/self/fd/0` symlink:

   $ ls -l /proc/self/fd/0 < /tmp/x
   lr-x------ 1 user users 64 <date> /proc/self/fd/0 -> /tmp/x

farhaven · on March 18, 2017

It wouldn't. It just won't be able to predict a file size.

wwalexander · on March 18, 2017

From what I can tell, bash uses 4K buffers for pipelines.

alayne · on March 18, 2017

Useless or not, I personally prefer left to right flow.

_pmf_ · on March 18, 2017

Which has the added benefit of not overwriting the source if you mistype the operator...

wwalexander · on March 18, 2017

You can just use

    <image.iso pv >/dev/sdb

hvs · on March 18, 2017

For those of you that are blissfully unaware of what the JCL DD command looks like, here's a example (with only the DD section of the JCL shown):

  //SYSPRINT DD SYSOUT=*                                                          
  //SYSLIN   DD DSN=&&OBJAPBND,                                                   
  //            DISP=(NEW,PASS),SPACE=(TRK,(3,3)),                                
  //            DCB=(RECFM=FB,LRECL=80,BLKSIZE=3200),                             
  //            UNIT=&SAMPUNIT                                                    
  //SYSLIB   DD DSN=SYS1.MACLIB,DISP=SHR                                          
  //SYSIN    DD DSN=&SAMPLIB(IEWAPBND),DISP=SHR

DrScump · on March 18, 2017

It links a file name (as referenced within a program) to the proper physical file[0], conceptually like an environment variable in UNIX and Windows.

Ah, I miss elements of the mainframe days.

[0] https://www.ibm.com/support/knowledgecenter/zosbasics/com.ib...

tambourine_man · on March 18, 2017

But who cares? Why not just let the command figure out the right buffer size automatically?

Because it can be a lot slower. dd is low level, hence powerful and dangerous.

And, if we are going down that rabbit hole, you don't need cat[1]

“The purpose of cat is to concatenate (or "catenate") files. If it's only one file, concatenating it with nothing at all is a waste of time, and costs you a process.”

[1]http://porkmail.org/era/unix/award.html#cat

schoen · on March 18, 2017

But if you don't use any process, what process is doing the reading?

(This sounds like a zen koan somehow.)

dredmorbius · on March 18, 2017

I like the koan. But the point should be an additional process.

In a UUOC avoidance case, it's the current process which reads, generally via stdin. Say, the shell, or dd itself with an 'if=' parameter.

Which I strongly suspect you know.

schoen · on March 18, 2017

Yep, my thought was that the UUOC critique doesn't apply to most attempts to substitute cat for dd, because typically those are copying from one (regular or special) to another, and you can't simply use redirection to accomplish this in the absence of a reader.

smacktoward · on March 18, 2017

When the file is ready, the process appears?

:-D

gens · on March 18, 2017

The Ignorance Of Err Ignorant People

dd is a tool. dd can do a lot more then cat. dd can count, seek, skip (seek/drop input), and do basic-ish data conversion. dd is standard, even more standard then cat (the GNU breed). I even used it to flip a byte in a binary, a couple of times.

New-ish gnu dd even adds a nice progress display option (standard is sending it sigusr1, since dd is made to be scripted where only the exit code matters).

> Actually, using dd is almost never necessary, and due to its highly nonstandard syntax is usually just an easy way to mess things up.

Personally I never messed it up, nor was confused about it. This sentence also sets the tone of the whole article, a rather subjective tone that is.

edit: Some dd usage examples: http://www.linuxquestions.org/questions/linux-newbie-8/learn...

jaclaz · on March 18, 2017

Just in case (same site):

http://www.linuxquestions.org/linux/answers/Applications_GUI...

electrum · on March 18, 2017

Don't cat a file and pipe it into pv. Use "pv file" as a replacement for "cat file" and it will show you the progress as a percentage. When it's in the middle of a pipeline, it doesn't know the total size (unless you tell it with -s), so it can only show the throughput.

apostacy · on March 18, 2017

Actually, that's not completley true. pv will detect the file size if you use the shell to read a file into it, like pv < file.

angry_octet · on March 18, 2017

At first I thought, no, that's not possible. Then I thought, no, they wouldn't do THAT would they?

But I guess they do...

http://stackoverflow.com/questions/1734243/in-c-how-do-i-pri...

I've seen that kind of brokenness from programs trying to find their binary image on disk. Don't do it, it's bad.

kam · on March 18, 2017

It doesn't need to hunt for the directory entry, just needs to call `fstat()` on the stdin file descriptor.

rwmj · on March 18, 2017

Unless the input file is a device, and then you need to call:

    ioctl(STDIN_FILENO, BLKGETSIZE64, &size)

ithkuil · on March 18, 2017

But I guess this is orthogonal to where the file descriptor comes from (i.e. stdin or opening a file whose name is passed in the args)

rwmj · on March 18, 2017

It's an orthogonal issue, yes, but calling stat or fstat on any block device whether from stdin or argv will return .st_size == 0, so your progress bar won't display the correct answers (or could display better answers if it used the ioctl).

angry_octet · on March 18, 2017

Oops you're right, has size in the inode.

gaius · on March 18, 2017

It makes a decent interview question tho', "explain the difference between cat file|./prog and ./prog <file". It doesn't even matter if they get it wrong, that they even know there is a difference is a very good sign.

_ij0r · on March 18, 2017

I think most people wouldn't know the difference (I had no idea!) and the knowledge might fall into the realm of obscure trivia.

gaius · on March 18, 2017

People who have come into SA work via being C programmers usually figure it out, they make the best SAs because they are mentally equipped to reason about a system from first principles.

adrianratnapala · on March 18, 2017

If all they want is the size of the file, then what is wrong with using fstat?

It's not the same thing as trying to walk the FS to look for the filename is silly.

therein · on March 18, 2017

    pv file.bin | dd of=/dev/sdb

is nice too. Honestly while the OP sparks a nice conversation, it is actually complaining about the use of dd out of lack of knowledge.

gunnihinn · on March 17, 2017

A counterpoint: dd survives not because it's good or makes sense, but explicitly because it doesn't.

You wanna format a usb key? Google this, copy/paste these dd instructions, it works, move on with your life.

You wanna format a usb key using something related to cat you once saw and didn't fully understand? Have fun.

Both approaches have their weak points, but in any OS the answer to "How do I format a usb key" should not start with "Oh boy, let's have a Socratic dialog over 10 years on how to do that."

Aloha · on March 18, 2017

This probably has more truth than most unix admins would like to admin.

"Why do we do it like that? I dunno, that's how I learned how, how do you do it?"

xelxebar · on March 18, 2017

Definitely this. I have found many times that I'm an offender of these "bad practices", and usually that's because a certain pattern I learned way back in the beginnings of my Linux days still hangs around.

Embarrassingly, it took me a long time before I started reaching for man pages instead of Google. That has probably has had the biggest effect on tightening up my command line fu.

find is another tool that seems to get only one specific use case that ignores its rather large and useful toolset.

xenithorb · on March 18, 2017

I learned Linux this way a decade and a half ago when it was far (and still is imho!) more convenient to quickly search a man page than google something. (with slow internet start times, browser startup times, etc)

Now, sometimes when people watch me work in a shared session they comment on my "peculiar" (to them) usage of flipping between -h --help and man $command, because there's a whole lot of switches I have memorized over time, but even more that I just have good reference points for.

But, bar none, what I've noticed among my peers is that the people that have always bowed to quick google solutions never really have taken the time to learn what they're doing. They almost always seems to be the 'quick fix', 'get it working now, sort it out later' types.

Aloha · on March 18, 2017

I often use google to find the answer, then go read the man page for it.

pwdisswordfish · on March 18, 2017

Isn't that what mkfs.* is for?

knz42 · on March 17, 2017

What about the `seek` argument which skips over some blocks at the beginning but still allocates them (unix "holes")?

Also note that there are still unix systems out there which do not support byte-level granularity of access to block devices. On those devices you must actually use a buffer of exactly the size of the blocks on the device. Heck, linux was like this until at least v2.

AdamJacobMuller · on March 17, 2017

Also keep in mind that specifying the block size can be important, especially for efficiently reading data. standard shell tools don't just "figure it out" automatically. They guess, and sometimes those assumptions can be incorrect resulting in lower (orders of magnitude) performance.

angry_octet · on March 18, 2017

Very useful when dealing with raid (make the blocks stripe sized) or tape (512 byte) or esoteric devices.

An essential tool for low level repair, like when you can guess the partition table values but there is no partition table anymore.

chrisfosterelli · on March 17, 2017

I think dd is primarily so popular because it is used in mostly dangerous operations. Sure, using cat makes logicial sense, but if we are talking about writing directly to disk devices here I'll trust the command I read from the manual and not explore commands I think would work.

dd's "highly nonstandard syntax" comes from the JCL programming language, but it's really just another tool to read and write files. At the end of the day it's not more complex or incompatible than other unix tools. For example, you can also use tools like `pv` with dd no problem to get progress statements.

xenithorb · on March 18, 2017

There is some truth to the fact that (if you basically already know dd like I do) then reserving it for dangerous operations is a good way to "signal" to yourself "slow down here and pay attention"

gfody · on March 18, 2017

I always thought dd stood for disk destroyer, only ever used it for making low level copies of whole disks or shredding them with if=/dev/random. This thread has been informative and terrifying as I learn cat and cp are every bit as dangerous as dd! I never would expect something like cp xxx /dev/sda to actually work. Thinking about it, why should cp even support something like that? I'll copy files but I'll also DESTROY YOUR SHIT if you say so?

luca_ing · on March 18, 2017

> I never would expect something like cp xxx /dev/sda to actually work. Thinking about it, why should cp even support something like that? I'll copy files but I'll also DESTROY YOUR SHIT if you say so?

That's the beauty of Unix.

Everything is a file. Thus every program that can work with files, can in fact work with everything.

It's actually very liberating.

krylon · on March 18, 2017

> I'll also DESTROY YOUR SHIT if you say so

That's the Unix way: The customer, eh, user is always right.

I once wanted to clean up backup files created by emacs (they end in the tilde character) by typing "rm <asterisk>~" - except what I did type was "rm <asterisk> ~".

(On the upside, I learned a valuable lesson that day.)

angry_octet · on March 18, 2017

It isn't the same though, because not all files are non-seekable streams of bytes. In fact, most are seekable and possibly sparse.

What happens when you did a sparse file? And cp?

https://wiki.archlinux.org/index.php/sparse_file

C.f. fallocate(1,2)

dec0dedab0de · on March 18, 2017

I'll also DESTROY YOUR SHIT if you say so?

It better :-)

But it all comes from the unix idea of everything is a file.

yjftsjthsd-h · on March 18, 2017

1. Everything is a file; why would cp care that one file happens to be your hard drive?

2. "UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever things."

- Doug Gwyn

donaldihunter · on March 17, 2017

Cult of pv. It looks to have more command-line complexity than dd. https://linux.die.net/man/1/pv

merlincorey · on March 18, 2017

This is a good point as well. On BSD, we don't have `pv`, but we do have ^T. This will print some sort of status for just about any long running process. It prints very specialized status for certain programs aware of it.

sillysaurus3 · on March 18, 2017

pv is probably more useful:

  $ dd if=/dev/urandom count=1000 bs=1000000 | pv -s 1000000000 > foo
   214MiB 0:00:16 [13.1MiB/s] [========================>                                                                                         
  ] 22% ETA 0:00:55

Compare to ^T:

  $ dd if=/dev/urandom of=foo count=1000 bs=1000000
  load: 1.76  cmd: dd 80097 running 0.00u 0.89s
  11+0 records in
  11+0 records out
  11000000 bytes transferred in 0.947316 secs (11611752 bytes/sec)
  load: 1.76  cmd: dd 80097 running 0.00u 1.68s
  22+0 records in
  22+0 records out
  22000000 bytes transferred in 1.746013 secs (12600134 bytes/sec)
  load: 1.76  cmd: dd 80097 running 0.00u 2.28s
  31+0 records in
  31+0 records out
  31000000 bytes transferred in 2.392392 secs (12957742 bytes/sec)
  load: 1.76  cmd: dd 80097 running 0.00u 2.83s
  38+0 records in
  38+0 records out
  ....

bigbugbag · on March 30, 2017

Why not using native dd parameter status=progress ?

emmelaich · on March 18, 2017

Nice! MacOS has it too (unsurprisingly)

Linux, not. I wonder why.

yjftsjthsd-h · on March 18, 2017

You're not the first to wonder: https://unix.stackexchange.com/questions/179481/siginfo-on-g...

But it looks like the answer is just "it was complicated to implement so Linux didn't add it."

schoen · on March 18, 2017

Huh, I never knew about this. It looks like Linux doesn't define SIGINFO, which is what the terminal driver would generate in response to the ^T.

GalacticDomin8r · on March 18, 2017

pv is available on at least FreeBSD as a package like it is in Linux. I'd be quite shocked if the other BSD's didn't have a port of it as well. There is also mbuffer.

SIGINFO works on gnu dd last I tried it.

floatboth · on March 18, 2017

Yeah, it's just that Linux doesn't even define SIGINFO

tambourine_man · on March 18, 2017

You made my day sir

angry_octet · on March 18, 2017

This is a great example of why downvoting submissions should be a thing. Or at least showing the up/down tuple. I would say every upvote represents someone misled and likely to further propagate this nonsense.

Kiro · on March 18, 2017

You can flag submissions.

angry_octet · on March 18, 2017

I kinda thought that was supposed to be reserved for submissions that are non permissible, e.g., hate speech, sites hosting malware, or very off-topic & uninformative.

merlincorey · on March 17, 2017

The author doesn't even give correct invocations of dd (on BSD, at least, for their last example with head).

I certainly agree the syntax of the arguments is strange, due to its age, but I don't agree that learning it is difficult or a waste of time.

All I've learned is that the author doesn't like dd well enough to learn it.

betaby · on March 18, 2017

Author is wrong bs IS useful, try to dd one hard drive to another without reasonable bs (1-8M) with and without and you will see a difference.

tmccrmck · on March 18, 2017

It also doesn't care about file type. If you want to copy a malformed file dd will do it - cat won't.

pwdisswordfish · on March 18, 2017

cat doesn't care about file type either.

lokedhs · on March 18, 2017

Try it on a tape device with the wrong value for bs and you might corrupt the output as a result.

snickerbockers · on March 17, 2017

OP, your alternatives to DD are more complicated, not less complicated. I shouldn't need to pipeline two commands together just to cut off the first 100MB of a file.

chungy · on March 19, 2017

Honestly, this is a problem with all of the examples.

* Using "cat source > target" instead of "cp source target"

* Using "cat source | pv > target" instead of "pv source > target"

* Using "head -c 100MB /dev/zero > target" instead of "truncate -s 100MB target"

ocschwar · on March 18, 2017

Dude's missing an important point:

If you mess up the syntax on a dd invocation, a nice thing happens: nothing.

Use a shell command and pipes, and your command better be perfect before you hit return.

cat199 · on March 18, 2017

though I'm not a fan of this article by any means, I've definately dd'ed the wrong disk before and slapped myself for doing so..

Usually it's not past the easily-reproducable system partition yet or on a data disk that is backed up regularly so I can recover in an hour or so..

bigbugbag · on March 30, 2017

If you dd'ed the wrong disk the syntax of your dd invocation was correct.

schoen · on March 18, 2017

The biggest counterexample to this that some people have experienced is accidentally swapping if= and of=, thus backing up their target onto their source rather than vice versa.

floatboth · on March 18, 2017

Well that's not syntax…

sndean · on March 18, 2017

Somewhat related short story: Earlier this week my friend said that he dd'd away just over 50 bitcoins, back when they were worth ~$3 each.

"One of the biggest regrets of my life."

ajross · on March 18, 2017

If one of the biggest the biggest lifetime regrets is a $50k financial loss, your friend is doing pretty well.

TheAdamAndChe · on March 18, 2017

$50k is a massive sum for a lot of people...

sndean · on March 18, 2017

Well yeah, he's in med school. Maybe one day it won't be a massive sum, but, for now, he's broke.

ajross · on March 18, 2017

And I state again: if you're a future doctor and your biggest regret is that you could be $50k richer right now, I'm not inclined to do much weeping. Basically everyone has been a broke student.

bbcbasic · on March 18, 2017

Actually it's a $150 loss.

DrScump · on March 18, 2017

In this case, opportunity cost was real and even quantifiable.

bbcbasic · on March 18, 2017

Anyone who didn't invest $150 back then also missed the opportunity

throwanem · on March 18, 2017

True, but it feels worse to have had the opportunity in one's hands and thrown it away than never to have touched it at all. That's just how humans work - go figure.

bbcbasic · on March 18, 2017

I know. I sold 40 btc at $100 EA a while back

AdamJacobMuller · on March 17, 2017

I'll point out that dd also allows you to control lots of other filesystem and OS-related things that other tools do not. See: fsync/fdatasync. I'm not aware of any shell tools that allow you to write data like that.

gravypod · on March 17, 2017

An even easier solution: don't make people fall into the command line to format a USB reliably.

The command line should be reserved for times where you need the fine grain control to do something that DD is meant to do. A GUI should implement everything else in a reliable way that doesn't break half the time or crash on unexprected input.

icebraining · on March 18, 2017

Some of us don't "fall" into the command line, we reluctantly get out of it, and only for specific purposes.

2bitencryption · on March 18, 2017

"a paintbrush should be reserved for times when you need fine grain control to paint your bedroom. A hired painter could do it all reliably in a way that doesn't risk you painting over the crown moulding or falling off a ladder."

empath75 · on March 18, 2017

I'm a Linux sysadmin/developer and I've literally never used a Linux GUI.

xelxebar · on March 18, 2017

I want to do this so bad, but the two things that keep me running X are my browser and mpv. Oh, and viewing PDFs.

throwanem · on March 18, 2017

You might find https://github.com/saitoha/libsixel/blob/master/README.md of interest.

silky · on March 18, 2017

Not entirely sure but I think mpv works without X like mplayer.

incompatible · on March 18, 2017

I've tried the gui tools in the past, but dd just seems far easier.

tracker1 · on March 18, 2017

I've setup/configured bind a few times, and every time I wish there was a nice gui for it.. even thinking to myself I should make one.. but shortly after I've refreshed my memory of how to do something, and by then, I'm done and leave it alone for some while.

bigbugbag · on March 30, 2017

What do you mean fall into the command line ?

The better half of my computer use happens in the command line interface, way more efficient use of my time.

kev009 · on March 18, 2017

Ignorance on the blocksize arg.

Also, I only need to remember one progress command for my entire operating system: control+t. I also get a kernel wait channel from that which is phenomenally pertinent to rapidly understanding and diagnosing what the heck a command is doing or why it is stuck.

I hate what Linux has done to systems software culture.

emmelaich · on March 18, 2017

Specifiying a large block size used to help a LOT with performance. From memory shell redirection used a tiny blocksize. On Solaris at least.

And if you use dd then you probably should specify a bigger block size than the default of 512 bytes.

But yeah, most usage is obsolete.

barsonme · on March 18, 2017

plain cat (no options) uses max(128 * 1024, st_blksize) aligned to page_size for reads and writes.

so you get the best block size for reads and writes. I can't speak to what the shell does, though.

gabrielblack · on March 18, 2017

I think this article is full of "alternative computer science" and reminds me other article, published here as well, about the obsolescence of Unix. The only good thing is this discussion thread.

ori_b · on March 17, 2017

To be fair, dd was mostly a toungue in cheek reference to the overly baroque JCL command for IBM mainframes.

jsd1982 · on March 17, 2017

Interesting assertion. Can you show me a shell invocation without using dd that cuts off the first 16 bytes of a binary file, for example? This is a common reason I use dd.

advisedwang · on March 17, 2017

tail -c +17

to3m · on March 18, 2017

This nearly tells you all you need to know. The other bit of info you'll want to note is that head -c +N produces as many bytes as you ask. So if you try to get the prefix using "head -c +N" and the suffix using "tail -c +N" then you'll have 1 byte of overlap.

(dd's corresponding options do not suffer from this problem.)

xelxebar · on March 18, 2017

That seems pretty intuitive to me:

Grab the first N bytes vs. grab everything starting from the Nth byte.

Jtsummers · on March 17, 2017

To expand, `-c` tells tail to start on the nth (starts counting at 1) byte. So +1 starts at the beginning, +17 starts after the first 16 bytes. `-n` is lines, `b` is 512-byte blocks.

tardo99 · on March 18, 2017

One of the charms of dd is its hilarious syntax. And, used properly, it's a bit of a swiss army knife for a few different disk operations.

noir_lord · on March 17, 2017

not sure status=progress is that obscure a command, it was added relatively recently as well (in terms of dd).

ajdlinux · on March 18, 2017

The fact that it was added relatively recently is exactly why it's so obscure. Unlike if, of, bs and count, I haven't had status=progress drilled into my head by every single dd command I've read out of a manual or tutorial, so even now I still forget whether it's "status=progress" or "progress=status" or something else.

Also it's a victim of dd's bizarre non-Unix syntax - an option like "--status" or "--progress" would be more in keeping with expectations.

kazinator · on March 18, 2017

dd precisely controls the sizes of read, write and lseek system calls. This doesn't matter on buffered block devices; there is no "reblocking" benefit.

Some kinds of devices are structured such that each write produces a discrete block, with a maximum size (such that any bytes in excess are discarded) and each read reads only from one block, advancing to the next one (such that any unread bytes in the current block due to the buffer being too small are discarded). This is very reminiscent of datagram sockets in the IPC/networking arena. dd was developed as an invaluable tool for "reblocking" data for these kinds of devices.

One point that the blog author doesn't realize (or neglects to comment upon) is that "head -c 100MB" relies on an extension, whereas "dd if=/dev/zero of=image.iso bs=4MB count=25" is ... almost POSIX: there is no MB suffix documented by POSIX, only "b" and "k" (lower case). The operator "x" is in POSIX: bs=4x1024x1024.

Here is a non-useless use of dd to request exactly one byte of input from a TTY in raw mode:

file:///usr/share/doc/bash-doc/examples/scripts/line-input.bash

Wrote that myself, back in 1996; was surprised years later to find it in the Bash distribution.

paulddraper · on March 18, 2017

My most common use of dd is warming up AWS EBS volumes. http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initi...

Though fio is better because it can work in parallel.

diegorbaquero · on March 18, 2017

Question: Will cat do a bit-to-bit copy between disks?

dom0 · on March 18, 2017

dd is for handling blocked data, while cat, redirection and pipelines are completely useless for that, since they are not meant to manipulate blocks of data, but streams. They do not compare (apart from really simple cases where either will do, like copying a file into some other file); this blog posts mainly highlights that neither the author nor many tutorial writers now the difference.

nwah1 · on March 17, 2017

Someone should write a wiki bot to crawl through the wikis for Arch, Debian, and so forth to help rewrite all these bad instructions.

boondaburrah · on March 17, 2017

I mean, are they bad instructions? They work for most people, and they're what most people are familliar with. If I ask a random *ix user what's wrong with my shell command, they're more likely to know about dd.

gunnihinn · on March 17, 2017

They're good instructions, Brent. 13/10 would follow again.

0xdeadbeefbabe · on March 17, 2017

In light of this article anything seems possible, so put a pleaserobots.txt in / google bot will modify all the wikis for you.

rurban · on March 18, 2017

Instead of

    cat image.iso | pv >/dev/sdb

just do

    pv image.iso >/dev/sdb

bigbugbag · on March 30, 2017

A self submitted opinion blog post pretty much entirely wrong ending up on HN front page. What gives ?

gbin · on March 18, 2017

Instead of `cat file | pv > dev` why not `pv file > dev` ?

jeffdavis · on March 18, 2017

What about writing a block into the middle of a file?

number6 · on March 18, 2017

This is cat abuse

badatusernames · on March 17, 2017

TLDR This has nothing to do with dunkin donuts