Despite its name, youtube-dl doesn't just download from YouTube but from a ton a different sites as well [1]. The rate in which this project keeps up with changes is incredible.
It seems like quite a modern success story for the classic "Cathedral and the Bazaar" model of open source development structure and motivations.
As I recall, it was originally written by one person (Ricardo Garcia) in 2008 and worked only on YouTube using (by later standards) relatively simple heuristics to find the URL to extract the video. But it's catalyzed an explosion of interest in every aspect of the problem: tracking changes to the HTML of the video sites, adding support for more video sites, figuring out indirection and parsing through multiple pages and HTML objects, making the tool much more multiplatform and easier to install and update...
It's attracted hundreds of contributors (many of them motivated by a personal desire to be able to use the tool on a different site, or to fix a bug that was preventing them from downloading video in a particular rare case) and maintained an incredibly rapid pace of development.
This is exactly why I contributed. In fact this morning, by coincidence, I had my first ever PR accepted, and it was for this piece of software [1]. I was using youtube-dl to download VK videos, but I really wanted to be able to download an entire playlist -- in the same way you can for YouTube. It didn't exist, so I just got stuck in and did it myself. It really helped that there was many other examples I could look at from other sites, and the maintainer of the package provided me with some very good feedback.
This kind of project that requires a lot of fairly laborious work to create support for many different information sources is a particularly good candidate for an open source project.
I'm not sure if "leeching copyrighted content" was the kind of motivation that Eric Raymond had in mind for future open source projects when he wrote Cathedral and the Bazaar.
When I was in the Himalayas earlier this year, between poor WiFi and sketchy 3G, it was the only practical way to watch at all. Having the file offline was an added bonus that meant others could benefit too, so big thanks to rg3 & current devs on behalf of a lot of folk who've never been near HN themselves.
Yeah, because at the core of human civilization lies a respect for copyright, a BS notion that was developed for exploiting the restrictions of (analog) physical formats for profit...
Copyright law was developed in the 1700s precisely to prevent people from exploiting the limitations of physical formats for profit.
It's the opposite of what you've stated.
Authors, composers and publishers needed protection against cheap printing presses that would just print anything that was popular and flog it in the marketplaces.
>Authors, composers and publishers needed protection against cheap printing presses that would just print anything that was popular and flog it in the marketplaces.
The limitations I mention are the difficulties and cost of the printing itself.
What authors wanted was to restrict who can print their work -- but it's not true that authors "needed protection" because printing presses started appearing.
That makes it sound like authors were paid for the work until those "cheap printer" pirates appeared. But on the contrary it was the invention of the printing presses themselves that gave authors an industry in the first place -- for millenia authors just wrote for free.
Yes; a good read is "The Surprising History of Copyright and The Promise of a Post-Copyright World" [1] which I think is from Karl Fogel, the author of the (Free, Libre, CC-BY-SA) book "Producing Open Source Software" [2]
The reason the industry of paid authoring could develop is because of copyright. Without it, all the value of the new printing industry would have accrued to the printers, and none to the authors.
At least put some effort to make the analogy more accurate:
"it's not unlike an anti-capitalist punk rocker STEALING her clothes at H&M".
That said:
First, I fail to see the contradiction from being an "anti-copyright freedom fighter" and "downloading stuff from YouTube".
Someone somehow convinced you than anti-copyright people only like copyleft works? The very idea of being anti-copyright is wanting to abolish all copyright.
Second, what's witht the "anti-copyright freedom fighter" strawman? As if someone needs to be that to want to download stuff off of YouTube?
I'm not sure "leeching copyrighted content" is a fair description of what Youtube-dl does. Yes most of the content you will download with it is copyright, but it is content that you already have a right to see, and in most cases the expectation is that you would maintain your right to see it for as long as Youtube (or other sight) remains active. The main difference is that Youtube-dl allows one to view the content with a program other then the browser. I suspect that few people uploading to a video sharing site did so with the intention of requiring people to view it using that sites player, but rather did so with the intention of people viewing it, and the player restriction was incidental.
The one place I can see where this breaks down is in advertisements, but I consider that to fall into the incidental results. (Although Youtube-dl does have a --include-ads option)
>it is content that you already have a right to see //
You have opportunity, that's not the same as a right. The content supplier is under no obligation to provide content to you, ergo no "right to see" that content.
That said, personal time-shifting and format-shifting should IMO be a normally allowed part of the copyright deal.
There's not much motivation to create a non-porn site YouTube clone. You have to believe you can do it better or need to not host your content on YouTube, and you have to be able to do it.
Porn has the need (the mainstream providers generally delete porn) and the sheer resources to do it.
Sorry! The problem is that our userbase is split about wanting the playlist or the video. You can create a file ~/.config/youtube-dl.conf with the content --no-playlist so that you don't have to type it out every time.
Correct. Each class that handles information extraction for a different side defines a regexp to match the url against. (Note: Some of the regexps aren't hardy to the http-vs-https distinction, so you might have to remove the 's')
Awesome, thanks. Should have tried before asking. :)
Searched my system drive for these .py files, but found nothing, so I figured something was missing.
All of the modules are compiled into a single file for the youtube-dl command. I've never looked into what they are using to do this, but you could poke your head into the repo to check it out.
We're simply making use of Python's ability to load a module from a zip file [0]. Therefore, the generation[1] is just zipping up all the files and prepending a shebang.
Might be less confusing if you append '.zip' in the first two commands:
zip --quiet youtube-dl.zip youtube_dl/*.py youtube_dl/*/*.py
zip --quiet --junk-paths youtube-dl.zip youtube_dl/__main__.py
When you echo the shebang overwriting the file, I was thrown off. I'm thinking,
"Why did you just zip all those contents into the file to just throw them out?"
Then I see the `cat` line, and it makes sense that the `zip` command appends
the .zip to the end of the file.
[1]: https://github.com/rg3/youtube-dl/tree/master/youtube_dl/ext...