> I think you need to justify making the devs' lives harder by explaining what harms will come from the proposed collection.
I disagree. I don't need to show actual harm to reasonably object to being spied on. At least Manjaro isn't talking about making this mandatory, but opt-out is is still a very poor look that would make me avoid using it as long as there are other options that are more respectful.
Please explain what specifically Manjaro is proposing to do that you classify as being "spied on." Don't handwave this away, actually answer the question.
"espionage: The act or process of learning secret information through clandestine means."
That is, the specific information does not matter; the fact that someone wants to keep it hidden (which is their stated preference), and someone else wants to collect it through clandestine means (which is how we could interpret a sneaky opt-out mechanism) is enough to define it as being spied on.
1. Your hardware specs are secret information? How many times you clicked on i3wm's settings panel is secret information? I mean OK, you might really want to keep the latter for yourself, sure, but calling it a secret information is reaching.
2. It very much matters what the specific information is. I too wouldn't want my Linux distro scanning my GMail inbox through their distro-bundled browser, of course. But how many times I started Kitty is something I don't quite enjoy being shared but I also wouldn't be outraged if it was.
Nuance matters, just doing an extremist takes does not help anyone.
I think a good example in support of your statement is the superfluous metrics wantonly spewed by, eg, Firefox. A cursory perusal of about:config will list many many default settings which are completely unnecessary for normal browser function, eg dom-battery, general telemetry, dubious DNS and dozens (maybe many dozens) of other better examples I've seen but don't immediately remember. The privacy holes here are mostly by design. Clearly more than necessary hardware info.
There are endless examples of data flowing where one wouldn't expect. Doesn't IP6 wrap the MAC address into the IP? This alone is pretty significant. It goes on and on, but I don't see this as an excuse to go full-nudist in a fit of futility with all data.
And another thing I frequently wonder: who benefits? I honestly don't see things functionally improving in a way that I can't live without as a result of all this telemetry. I don't see that many people clamoring for the kinds of improvements this telemetry is supposed to enable. I know technology does improve, but I just can't remember where things were so bad I needed to mass-email my dossier to the world. Generally, I just made a forum post or bug report.
Of course, that's your right. That's why I vet my software on a per-piece basis. It can be exhausting but I at least know that stuff that I'd be very not okay with being shared, is not in fact shared.
As said in another comment of mine posted just minutes ago -- practice shows that anonymous telemetry is the only viable way of getting some usage data. Almost nobody fills out surveys.
Do most software need those stats? I'd say they don't, but I worked on pieces of software that absolutely needed to know which parts are most used and which are almost not used because the extra features cluttered the UI and confused people, leading to less buys / subs.
I had trouble finding exactly what MDD collects, but my assumption is that it collects data about the hardware in use and what packages are installed, at a minimum.
Okay. So you can't explain how you are harmed by this data collection, and you have an opt-out mechanism you can use to disable it anyway. What are we complaining about?
I'm not saying I can't explain harm, I'm saying that the presence or absence of harm is orthogonal to the issue.
What I'm complaining about is the evasion of having to get informed consent to collect personal data. Opt-out is a way to try to cover your ass while at the same time being able to avoid asking for consent.
The argument for it is always the same: if we make it opt-in, then not enough people will opt in. Which is another way of saying "if people won't give us permission to collect data about them, then we need to stop asking permission."
Well, yeah. If opt-in doesn't lead to useful results, then you may as well not have the feature at all. But they want the feature, because it helps them improve their software. So, "collect data in a way that preserves as much privacy as possible by default, and provide a mechanism to opt-out entirely" is the least-bad option. It gives them the data they want, and it provides an opt-out mechanism for people who don't trust them with the collected data. It seems like the best compromise to me.
It's not really a compromise. It's devs declaring that they deserve access to this data regardless of what users want, and trying to make it less objectionable. It remains the case that this is a back door method of extracting data from users that they don't really want to give.
If users didn't mind giving it, then enough would say "yes" to the opt-in screen that it wouldn't matter. But they don't, so these devs are trying to impose the very thing users don't want as forcefully as they can get away with.
What spying on, dude? Have you ever wrote telemetry handlers even once in your software?
I've done so, no less than 15 times in the last ~9 years. We always took special care to never include anything personally identifiable; it was a hard requirement and was enforced in code reviews and because of that we ended up hashing user IDs because we still wanted to do flame graphs and various distribution statistics of API endpoint usage and user IDs were one of the axii (two others were hours of day and days of week), but we didn't care who the user was.
Seriously, a little less extremism helps. I am a programmer, likely just like you. We are trying to get some data to improve our software. In several of my previous gigs even the CTOs barely cared about the telemetry graphs and aggregation dashboards and only looked at them at the middle of the quarter to make sure we're not spending too much on Grafana so the executives won't bite their heads off. And the CEO / marketing? Forget it, they don't care.
Of course there are some very predatory companies out there, no doubt. But I think we would be very hard-pressed to put the team of an open Linux distribution among them.
> We always took special care to never include anything personally identifiable
Sure, but that's not really the point. First, in every company I've worked at that has dealt with PII, their definition of "PII" excludes quite a lot of data that should count.
But even if all PII is properly excluded and everything is actually anonymized, that still doesn't address the point. The point is all about consent. Consent seems like it should be table stakes, no?
> Consent seems like it should be table stakes, no?
I agreed for most of my career but not anymore. Truth is, everywhere I worked, the voluntary user surveys had extremely low engagement rate -- which was frustrating for the dev team who wanted to make sure their users like the product. Sometimes that means deprecating / removing parts of the software.
I get your idea and I don't generally disagree. It's just that practice has shown that collecting anonymous telemetry is the only really viable way of getting information of what's being used, how much, does it perform well (I used telemetry stats to optimize a hot code path on a number of occasions) both in terms of hardware efficiency and business terms, and others.
It's one of those things that I solved for myself by trusting or not trusting each piece of software individually. That's why I am currently slowly migrating back to Linux (from macOS); Apple overdid the telemetry to downright complete spying and sometimes censorship so I am no longer okay with them.
> It's just that practice has shown that collecting anonymous telemetry is the only really viable way of getting information of what's being used, how much, does it perform well
Again, we come back around to "if users don't want to willingly give us this data, then we're just going to take it." That's what I think is ethically objectionable. Sure, the data is useful -- but if people don't want to give it, that usefulness does not justify taking it anyway.
Opt-out is better than not being able to even do that much, but in my view, it's still unethical. And, practically, it means that I have to treat all software as suspicious and can't really be comfortable with any of it.
I'm used to that with smartphones and Windows, and deal with that by avoiding installing any software if unless I absolutely have to. I'm just trying to avoid having to take the same stance with OSS. But perhaps that's a lost cause and trust in any software at all is not supportable.
I don't, but I can't speak for everybody else. In my case the telemetry was on the backend so the users had no say at all -- though my teams made sure for there to be zero personally identifiable information (plus our API endpoints never got even one piece of information about the customer's devices / desktop browsers; I code-reviewed those PRs and enforced it).
Don't look for boogeymen on HN, they are not on this forum. ;)
I'll again agree opt-out by default is not the most privacy-friendly approach but voluntary user surveys had almost non-existent user base. So some companies took a more aggressive approach. Those I don't like. But a Linux distro? Dunno, seems like an overreaction in this particular case.
I disagree. I don't need to show actual harm to reasonably object to being spied on. At least Manjaro isn't talking about making this mandatory, but opt-out is is still a very poor look that would make me avoid using it as long as there are other options that are more respectful.