Despite its name, youtube-dl doesn't just download from YouTube but from a ton a...

schoen · on Nov 23, 2014

It seems like quite a modern success story for the classic "Cathedral and the Bazaar" model of open source development structure and motivations.

As I recall, it was originally written by one person (Ricardo Garcia) in 2008 and worked only on YouTube using (by later standards) relatively simple heuristics to find the URL to extract the video. But it's catalyzed an explosion of interest in every aspect of the problem: tracking changes to the HTML of the video sites, adding support for more video sites, figuring out indirection and parsing through multiple pages and HTML objects, making the tool much more multiplatform and easier to install and update...

It's attracted hundreds of contributors (many of them motivated by a personal desire to be able to use the tool on a different site, or to fix a bug that was preventing them from downloading video in a particular rare case) and maintained an incredibly rapid pace of development.

fractalsea · on Nov 23, 2014

This is exactly why I contributed. In fact this morning, by coincidence, I had my first ever PR accepted, and it was for this piece of software [1]. I was using youtube-dl to download VK videos, but I really wanted to be able to download an entire playlist -- in the same way you can for YouTube. It didn't exist, so I just got stuck in and did it myself. It really helped that there was many other examples I could look at from other sites, and the maintainer of the package provided me with some very good feedback.

This kind of project that requires a lot of fairly laborious work to create support for many different information sources is a particularly good candidate for an open source project.

https://github.com/rg3/youtube-dl/pull/4233

pavlov · on Nov 23, 2014

I'm not sure if "leeching copyrighted content" was the kind of motivation that Eric Raymond had in mind for future open source projects when he wrote Cathedral and the Bazaar.

chris_wot · on Nov 23, 2014

I call it "ability to watch Youtube content offline".

rg3 · on Nov 23, 2014

That was indeed the original purpose.

lozf · on Nov 23, 2014

When I was in the Himalayas earlier this year, between poor WiFi and sketchy 3G, it was the only practical way to watch at all. Having the file offline was an added bonus that meant others could benefit too, so big thanks to rg3 & current devs on behalf of a lot of folk who've never been near HN themselves.

coldtea · on Nov 23, 2014

Yeah, because at the core of human civilization lies a respect for copyright, a BS notion that was developed for exploiting the restrictions of (analog) physical formats for profit...

crucialfelix · on Nov 23, 2014

Copyright law was developed in the 1700s precisely to prevent people from exploiting the limitations of physical formats for profit.

It's the opposite of what you've stated.

Authors, composers and publishers needed protection against cheap printing presses that would just print anything that was popular and flog it in the marketplaces.

coldtea · on Nov 23, 2014

>Authors, composers and publishers needed protection against cheap printing presses that would just print anything that was popular and flog it in the marketplaces.

The limitations I mention are the difficulties and cost of the printing itself.

What authors wanted was to restrict who can print their work -- but it's not true that authors "needed protection" because printing presses started appearing.

That makes it sound like authors were paid for the work until those "cheap printer" pirates appeared. But on the contrary it was the invention of the printing presses themselves that gave authors an industry in the first place -- for millenia authors just wrote for free.

cogburnd02 · on Nov 24, 2014

> for millenia authors just wrote for free.

Yes; a good read is "The Surprising History of Copyright and The Promise of a Post-Copyright World" [1] which I think is from Karl Fogel, the author of the (Free, Libre, CC-BY-SA) book "Producing Open Source Software" [2]

[1] http://questioncopyright.org/promise [2] http://producingoss.com

snowwrestler · on Nov 24, 2014

The reason the industry of paid authoring could develop is because of copyright. Without it, all the value of the new printing industry would have accrued to the printers, and none to the authors.

coldtea · on Nov 24, 2014

Yeah, and culture would be free.

pavlov · on Nov 23, 2014

An anti-copyright freedom fighter downloading stuff from YouTube is not unlike an anti-capitalist punk rocker buying her clothes at H&M...

coldtea · on Nov 23, 2014

At least put some effort to make the analogy more accurate:

"it's not unlike an anti-capitalist punk rocker STEALING her clothes at H&M".

That said:

First, I fail to see the contradiction from being an "anti-copyright freedom fighter" and "downloading stuff from YouTube".

Someone somehow convinced you than anti-copyright people only like copyleft works? The very idea of being anti-copyright is wanting to abolish all copyright.

Second, what's witht the "anti-copyright freedom fighter" strawman? As if someone needs to be that to want to download stuff off of YouTube?

kristianp · on Nov 23, 2014

Like it or not, modern civilisation's engine is "profit".

coldtea · on Nov 24, 2014

Well, I don't like it, and I don't bend down for anything that's not a physical law just because "like it or not it's X".

Might as well have written "like it or not, the South's engine is slavery" in 1850...

gizmo686 · on Nov 23, 2014

I'm not sure "leeching copyrighted content" is a fair description of what Youtube-dl does. Yes most of the content you will download with it is copyright, but it is content that you already have a right to see, and in most cases the expectation is that you would maintain your right to see it for as long as Youtube (or other sight) remains active. The main difference is that Youtube-dl allows one to view the content with a program other then the browser. I suspect that few people uploading to a video sharing site did so with the intention of requiring people to view it using that sites player, but rather did so with the intention of people viewing it, and the player restriction was incidental.

The one place I can see where this breaks down is in advertisements, but I consider that to fall into the incidental results. (Although Youtube-dl does have a --include-ads option)

pbhjpbhj · on Nov 24, 2014

>it is content that you already have a right to see //

You have opportunity, that's not the same as a right. The content supplier is under no obligation to provide content to you, ergo no "right to see" that content.

That said, personal time-shifting and format-shifting should IMO be a normally allowed part of the copyright deal.

spindritf · on Nov 23, 2014

Including non-video sites like Soundcloud and it supports playlists (downloads multiple files) on and off Youtube.

ubercow · on Nov 23, 2014

I love how many porn sites are supported by this

saraid216 · on Nov 23, 2014

There's not much motivation to create a non-porn site YouTube clone. You have to believe you can do it better or need to not host your content on YouTube, and you have to be able to do it.

Porn has the need (the mainstream providers generally delete porn) and the sheer resources to do it.

albertoleal · on Nov 23, 2014

If there is a will, there is a way.

dredmorbius · on Nov 23, 2014

The playlist feature is actually one that typically annoys me. So I've just re-read the manpage and found the --no-playlist argument.

phihag_ · on Nov 23, 2014

Sorry! The problem is that our userbase is split about wanting the playlist or the video. You can create a file ~/.config/youtube-dl.conf with the content --no-playlist so that you don't have to type it out every time.

dredmorbius · on Nov 24, 2014

Right. My point was that the mention prompted to me to read TFM and find the fix.

I've just done the config file setting you mentioned.

unicornporn · on Nov 23, 2014

I've installed youtube-dl via homebrew. How do I install these external sources?

sarreph · on Nov 23, 2014

You don't need to, as far as I'm aware. I think you can just request a supported URL, and it will download it automagically.

pyre · on Nov 23, 2014

Correct. Each class that handles information extraction for a different side defines a regexp to match the url against. (Note: Some of the regexps aren't hardy to the http-vs-https distinction, so you might have to remove the 's')

unicornporn · on Nov 23, 2014

Awesome, thanks. Should have tried before asking. :) Searched my system drive for these .py files, but found nothing, so I figured something was missing.

pyre · on Nov 24, 2014

All of the modules are compiled into a single file for the youtube-dl command. I've never looked into what they are using to do this, but you could poke your head into the repo to check it out.

phihag_ · on Nov 24, 2014

We're simply making use of Python's ability to load a module from a zip file [0]. Therefore, the generation[1] is just zipping up all the files and prepending a shebang.

[0] http://bugs.python.org/issue1739468 [1] https://github.com/rg3/youtube-dl/blob/640743233389714dda8a3...

pyre · on Nov 24, 2014

This:

  youtube-dl: youtube_dl/*.py youtube_dl/*/*.py
  	zip --quiet youtube-dl youtube_dl/*.py youtube_dl/*/*.py
  	zip --quiet --junk-paths youtube-dl youtube_dl/__main__.py
  	echo '#!$(PYTHON)' > youtube-dl
  	cat youtube-dl.zip >> youtube-dl
  	rm youtube-dl.zip
  	chmod a+x youtube-dl

Might be less confusing if you append '.zip' in the first two commands:

  	zip --quiet youtube-dl.zip youtube_dl/*.py youtube_dl/*/*.py
  	zip --quiet --junk-paths youtube-dl.zip youtube_dl/__main__.py

When you echo the shebang overwriting the file, I was thrown off. I'm thinking, "Why did you just zip all those contents into the file to just throw them out?" Then I see the `cat` line, and it makes sense that the `zip` command appends the .zip to the end of the file.