Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Most open-source software is libraries or frameworks (medium.com/aserg.ufmg)
123 points by bhjs2 on July 23, 2017 | hide | past | favorite | 57 comments


This is a weird premise. It's github. Of course it is predominantly tools for coders. End user applications have wholly separate paths to the user. An application could have a handful of contributors/stars/downloads etc. on github and still have millions of downloads somewhere else (a project I work on fits that description). And, it might not be on github at all and still be used by millions daily.

I guarantee openssh, Firefox, LibreOffice, and probably a hundred other applications, are (orders of magnitude) more popular than the top applications on this list.

So, if this were titled, "Most open source software on github..." I wouldn't object. But, I have to completely reject the premise here, because I know that there's an entire iceberg of OSS software, including applications, that is completely excluded from the listing by virtue of either not being on github or being on github, but not using github as its primary method of distribution and promotion, and this data completely ignores everything below the surface.

Also, it's probably dangerous to begin to think of "Open Source Software" as only being "Software that has a public github repo".


> It's github. Of course it is predominantly tools for coders. End user applications have wholly separate paths to the user.

To the user sure, but if they are open source, bazaar-model projects, they still need a route to contributors, and GitHub is as suited for that for end user apps as it is for libraries. So while, yes, it's possible that GitHub is unrepresentative for some other reasons, I don't think that “GitHub is for developers” is a reason to dismiss this kind of statistic.

OTOH, it may well be that apps, even open source, are more likely to be cathedral model (instead of bazaar) than libraries, which might reduce the need for using GitHub or some other public repository.


Yeah, this reminds me of the annual claims regarding what programming languages are the most popular based solely on github repos or stackoverflow questions. The methodology is completely inadequate for producing an answer that can be trusted.


You're right of course. I'm the main author of Apache Curator. While it does have a mirror on Github it's mostly consumed via its Maven Central repo. The Maven Central repo is most likely a better source of open source project usage.


Here's how amazingly wide and diverse our OSS world is: I have never even heard of Maven Central.


Maybe not - but some tool that you use probably gets it artifacts from it. http://search.maven.org - https://www.quora.com/What-is-the-Maven-central-repository


I'm not saying me having never heard of it makes it less relevant, popular, etc. I'm saying OSS has become so widespread that an entire ecosystem, seemingly with thousands of packages, can exist completely out of my awareness, despite me being involved in OSS for a couple of decades and having much broader (though often not deeper) interests than most, due to my work.

In defense of my ignorance, I haven't written a line of Java in years, and I assume this is a predominantly Java ecosystem (due to the "Maven" part of the name).


Yeah that's true. Think of the npm ecosystem, the burgeoning Go ecosystem, etc. All full worlds with their own repos, etc.


Of course it is predominantly tools for coders. End user applications have wholly separate paths to the user

Libraries and tools for coders -- what's particular about these? These are the areas where programmers are also the domain experts! So is it any wonder that these are the projects that succeed?

I guarantee openssh, Firefox, LibreOffice, and probably a hundred other applications, are (orders of magnitude) more popular than the top applications on this list.

Let's look at the success of end-user open source applications. Is there a pattern of success being correlated with prevalence of the domain knowledge in the programmer population? There will be other influences, which will bring in corporate resources and skew the degree of success, but I think this pattern holds there as well.


"So is it any wonder that these are the projects that succeed?"

...on github, a community of programmers.


Honestly though, I don't think that's far off in that even those you mentioned are leveraged as a library (Firefox maybe not, not sure. Chrome can be used headless so guessing there's a possibility for Firefox. Some of the open source internals are used in other projects though). But OpenSSH for sure is used within other software. Okay, maybe libreoffice isn't a case (at least I don't think it is).

At this point, I just love seeing GUI software I leverage also coming with a CLI. I learn what it can do for me with the GUI and then can use the CLI to manage the heavy lifting for me.


We could name hundreds, probably thousands, of OSS applications that are more popular than the laughably obscure #4 and #5 in the list in the article (not laughable in the sense that I mean to devalue their work, just that they are so clearly for an extremely niche user, and that user is a programmer).

I don't really care to argue semantics about what's a library vs. what's an app. Because many libraries have a UI and many apps have an API; that's another discussion, really (though, you've got tens of millions of people using Firefox through the UI daily and maybe a few thousand using it as a library daily...I know where I fall on the "is it an app or a library" continuum).

But, seriously, this article looked at one pile of data generated by a site that is entirely by and for programmers and declared that the entirety of OSS is represented by that data. The more I think about it, the more ridiculous, and maybe even offensive, it seems.


Regarding FF headless mode:- https://bugzilla.mozilla.org/show_bug.cgi?id=1338004

Basically, WIP. For now, if you have NodeJS:-

https://slimerjs.org/

It is kinda headless.


> At this point, I just love seeing GUI software I leverage also coming with a CLI.

I really wonder what is the point of a CLI for, say, Krita or LibreOffice.


GIMP has Script Fu (and a newer Python interface) which has been used for wonderfully time-saving things (e.g. process a thousand images with the same plugins and settings and such, implement your own custom processing tools by combining existing ones, etc.)

I'm sure the same sorts of things can be done with Krita, though I'm not familiar with it.

As for LibreOffice, that seems obvious. An easy/automated way to get data into and out of it for processing in other programs. An automated way to process lots of documents that need to stay in LibreOffice but need the same change performed on them (currency conversion, date format normalization, copyright update, etc.).

It's the same reason you'd want anything to be programmable: Automating away the tedious stuff.


> GIMP has Script Fu (and a newer Python interface) which has been used for wonderfully time-saving things (e.g. process a thousand images with the same plugins and settings and such, implement your own custom processing tools by combining existing ones, etc.)

Begin scriptable and having a CLI are two orthogonal concepts. With Script FU, GIMP itself executes scripts. If gimp had a CLI it means that you could be able to do stuff like "gimp --select-pen 2px solid black --moveTo 0 0 --lineTo 100 100 --lineTo 50 50" for instance (for which I don't see the point of using GIMP to do).


Now you are picking on minor wording choices. You asked for clarification and when it was provided you start beating up the person who responded with meaningful content.



Automated file conversion, e.g odf to pdf, comes to mind.


> Automated file conversion, e.g odf to pdf, comes to mind.

but why would it have to be related to libreoffice at all? couldn't the ODF format be implemented as a library, which then both libreoffice and "odf2pdf" command could be implemented upon ?


Once you have a library for reading ODF, reasoning about its structure, manipulating it and rendering it in various formats, that's just libreoffice without the GUI. When starting fresh it might be a decent idea to make it a library, but I don't think it's worth transitioning to if you started with a more traditional design


you could get the feature from any number of areas, I'm just saying that in a lot of cases, the GUI can act as a "gateway" to understanding the functionality of the application and the CLI can then be used to implement/extend/whatever said functionality for personal use. Handbrake is one that I think implements this well.


1) Title is misleading, should have been "Most popular projects on GitHub are ...".

2) Judging from the top-5 list in the post, between 1/4 and 1/3 of the projects have been miscategorized.

Ex:

- https://github.com/chrislgarry/Apollo-11 -> should have been categorized as "Application", not "Documentation".

- https://github.com/tensorflow/tensorflow -> should have been "Library", not "Tool".

- Electron, Socket.io, Moment, lodash... are "Web libraries", not "Non-web libraries"

and probably more.

I hope the reviewers catch these errors before they publish this in a research journal.


One could even argue that Jekyll is a web framework, not an application.


Moment / Lodash are JS utility libraries for both browser and server-side runtimes.


But even then, it just goes to show that a classification into "web" and "non-web" libs is fundamentally flawed. Maybe they want "JS" vs. "non-JS".


Both server and browser are "web" in the typical use case, in my reading of the term.


Remove the words "open source" and the title makes more sense to me.

Most software is libraries and frameworks -- you just don't get to see most/all of the proprietary stuff, since it's not on github or anywhere else.


The original title is misleading. It should say Most GitHub projects are libraries or frameworks. It is quite a leap to go from 60% GitHub projects to most open source software.


I am not necessarily surprised by the results.

It is worth noting that the second most popular "software tool" tucked in between oh-my-zsh and homebrew (both command-line tools/packages) is Tensorflow.

That has to say something about the current state of the industry, though admittedly, I am a little confused as to why it was classified as a "software tool" and not say, "a non-web library or framework."


I think a lot of people star Tensorflow and then never look at it again though.


In the last week, Tensorflow hat 59 new issues files. That seems to be #1 among those on these lists by a large margin. Second place is electron with 29 (only a bit more than half as much). After that it's react and Atom with 10 each.

That seems to indicate that tensorflow is getting plenty of use.


It might also indicate that it's just moving fast, so a lot of things break. It might also indicate that people are using the issue tracker as a support forum. I don't know. But by itself, the "new issues per week" metric is utterly meaningless.

For example, I would guess that SQLite does not get more than 59 new bugreports per week, even though SQLite most definitely is much more widely used than TF.


Interestingly, I looked through their entire CSV and was surprised to find one of my projects in their 5000 most popular set. Unfortunately, it was a reading list (awesome-appsec), not an actual software project. But still, kind of neat.


Huh. After seeing you mention this, I took a look at the spreadsheet and was actually surprised to _not_ see my "React/Redux Links" repo ( https://github.com/markerikson/react-redux-links ) in the list. The dataset info page says they captured it as of sometime in January 2017. My repo should have had somewhere around 6000-ish stars at that point, but it's not included. Wonder why not.

In the meantime, I'm very pleased to note that my list will hit 10K stars within the next few days :) Based on that spreadsheet, I guess that would put it somewhere in the top 350 or 400 repos on GH.

Meaningless "Imaginary Internet Points", of course, but amusing to look at nonetheless :)


Very surprised by the ubiquity of JS in non-web libraries/frameworks. I know it's a well-played fiddle, but it's saying something damning about our ability to put together quick and easy applications with other languages - or maybe just the number of people who start with webdev - when JS becomes the first choice. Is it just a consequence of how much UI work has been put into HTML rendering engines?


Also consider how beneficial it is from an employer's perspective to just force all UI to be HTML/JS, impedance mismatches be damned.

Edit: another consideration is if HTML/JS is the standard UI, then it biases that generation of developers to favor that particular abstraction. Innovation is framed as, "look where we managed to cram HTML/JS!" Actual innovation, which would be more along the lines of, "here's a new paradigm/abstraction that replaces HTML/JS" is seen as eccentric and largely ignored. In effect, developers' over-attachment to the way things are done greatly slows forward progress. They're only able to recognize small, incremental improvements.


Big bang improvements are how you get systemd. Isn't incremental improvement the core of Agile and hence all-the-rage?


Systemd got adopted a lot despite being hated.


Many of the top items in the subset of JS Language and Non-web libraries/frameworks (moment, async, request, underscore/lodash, bluebird/q) might be included in the standard libraries of other languages. Also, a bunch of reasonably popular tools like rollup and systemjs may be miscategorized


JS is vastly overrepresented due to its popularity amongst some "hip" crowds and some small companies.


Counterpoint: most proprietary apps are probably CRUD apps or spreadsheet stuff. There's still a lot of good, proprietary apps. Likewise for FOSS. Turns out the quick and easy solutions to common problems that please project owners happen more often than solutions to hard problems or things without immediate ROI. It says more about people's priorities than proprietary or FOSS software.


What else should be top? - Applications and tools use libraries or are split up into library components ... if there were many more applications than libraries there would be a lot of NIH ...


A decade ago many of us thought that open source applications like LibreOffice and Thunderbird would dominate. Instead, open source seems to be winning everywhere in infrastructure and libraries, as companies don't want to lock into proprietary middleware anymore like Weblogic or VMware. But many (most?) applications are now web-based and closed source (e.g., Google Docs, Salesforce, Gmail, etc.).

One critique: GitHub stars are not the best indication of interest and can be gamed. Instead, I prefer using commits, issues and unique authors as a better metric of project velocity: https://www.cncf.io/blog/2017/06/05/30-highest-velocity-open...


Yes, but at the same time you'd expect each library to be used by many applications. Say you have 100 libraries, you can make 10,000 applications just by picking 1 or 2 of those 100 libraries, applications should far outnumber libraries.

Maybe applications do far outnumber libraries, but it's a fact of how people use github that everyone whose application depends on a library stars the library; as every application uses many libraries, there are many library-stars. This is a total guess, but it aligns with how I think of using stars. (though stars are separate from notifications as github documentation takes care to point out)


Open source is associated with philanthropy and with being really smart. If you're a really smart coder, out to change thw world, you'll obviously build a library or framework to help the other, not so fortunate coders around you as a way to bring relief from the tyranny of the big monopolistic corporates (read Microsoft). It's your David changing the evil ways of their Goliath.


> Most open source software is libraries or frameworks

Out of the 5000 most popular repositories of github, most are libraries or frameworks.

The author is not bad at clickbait.

edit I wonder if they actually checked if those 5000 licences are actually compatible with the open source definition[1].

[1]: https://opensource.org/osd


How many of the software in Debian or any other big distro repo is actually being developed on github though ?


Is it really accurate to call socket.io a "non-web" library? Even lodash is questionable in that category - while not strictly tied to the web, it is primarily used in web apps. I'd be curious to see lower level stuff and/or stuff further removed from the web in the non-web category.


I write a lot of ETL scripts in Node for https://www.findlectures.com use lodash in pretty much everything, for me it basically serves the role of collections libraries in other languages


Neat site. Linking the related HN discussion here:

https://news.ycombinator.com/item?id=14484549


But every piece of software can use hundreds or thousands of libraries.


Why is lodash considered non-web?


Why would it be considered Web? It is use pretty much anywhere where js is used.


And that's overwhelmingly on the Web.


But people still use it in general libraries that simply count something. Unlike angular or react which makes zero sense in non Web environment.


i would assume because you use it in nodejs... which is a bit of a stretch i think




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: