I think transparency matters more. I liked Andrew Yang’s suggestion to require t...

s1t5 · on May 26, 2020

Open sourcing the algorithms (however we define it) does absolutely nothing. What use is a neural network architecture? Or a trained NN with some weights? Or an explanation that says - we measure similar posts by this metric and after you click on something we start serving you similar posts? None of those things are secret. More transparency wouldn't change anything because even if completely different algorithms were used, the fundamental problems with the platform would be exactly the same.

banads · on May 26, 2020

It's silly to so confidently assert that opening up a closed source algorithm to 3rd party analysis will "do absolutely nothing". How could you possibly know there is nothing unusual in the code without having audited it yourself?

Seeing how the sausage gets made certainly can make lots of people lose their taste for it.

SideQuark · on May 27, 2020

A lot of how big systems work is embodied in large neural networks, and the detailed structure of how they make decisions is an open research problem. So it’s not silly for OP to state that at all; it’s empirical fact.

It’s also not possible to audit the code for anything unusual without taking it all the way back through all tools source, all hardware, down through the chips, through the doping in the chips, and even lower. This stack is such a hard problem that DARPA has run programs for a long time to address this. Start by reading Thompson’s ACM article titled something like “Reflections on Trustimg Trust” where he shows code audits don’t catch program behavior, then follow the past few decades where these holes have been pushed through the entire computing stack.

devonkim · on May 29, 2020

Toolchain compromises are a non-zero risk but involve a lot of orchestrated resources to subvert systems to meaningful effect (simple exfil is sufficient for most corporate espionage v. stuff like Stuxnet to enact specific changes covertly). A company doing that given legislation to keep recommendation behavior and policies transparent to the public would be violating the spirit of the regulation by creating more opacity and delusion, no question. Admittedly, they're not going to be prosecuted in our current regulatory cyberpunk-esque hellscape, but neither would any public-benefit regulation pass anyway making the discussion of subversion moot, right? So presuming such a societal environment where regulation _could_ pass, we would hopefully have a more effective regulatory policy framework where subversion of the intent to be transparent for the sake of public safety and trust while still protecting trade secrets would be under sufficient scrutiny. All I know is that engineer-activists like Jaron Lanier are working with more tech-aware activists / politicians like Yang in proposing more effective tech regulatory frameworks than the past, and their efforts should be a lot more effective than either the current collective actions of the throwing up of our hands or yelling, whining, and screaming hoarsely.

From a regulatory standpoint mirroring the nature of our organizational tendencies, I posit that the _policy_ models should look similar to Mickens' security threat vector model - Not-Mossad or Mossad.

banads · on May 27, 2020

>It’s also not possible to audit the code for anything unusual without taking it all the way back through all tools source, all hardware, down through the chips, through the doping in the chips, and even lower.

Are you implying that since we can't audit every single thing, auditing anything is useless?

SideQuark · on May 27, 2020

>Are you implying that since we can't audit every single thing, auditing anything is useless?

No, I am pointing out that your statement implying one can know if there's anything unusual in the code can be found by an audit. It cannot. And most of the activity by big companies is not in code, it's in data, and "auditing" it is currently beyond anything on the near horizon. Some of the behavior is un-auditable in the Halting Problem sense - i.e., the things you'd want to know are non-computable.

banads · on May 27, 2020

>No, I am pointing out that your statement implying one can know if there's anything unusual in the code can be found by an audit. It cannot.

This is so plainly false its silly. Have you ever heard of a code review? What do you think security researchers do? Google Project Zero. Plenty of things are found all the time at the higher (and lower) levels of the stack, even if something unknown remains deep within.

>And most of the activity by big companies is not in code, it's in data it is currently beyond anything on the near horizon.

Audits have no problem finding out what type of data is being collected (see PCI || HIPAA compliance). That would be a great start: for people to be made explicitly aware of all the data points that are being collected on them.

SideQuark · on May 28, 2020

>This is so plainly false its silly.

You're simply wrong. Did you read the article that shows quite clearly exactly how to do this I told you about? No, you didn't, or you'd stop making this false claim. Before you repeat this, RTFA, which I'll post again since you didn't learn last time [1].

There, read it? There's decades of research into even deeper, more sophisticated ways to hide behavior. At the lowest level, against a malicious actor, there is no current way to ensure lack of bad behavior.

>What do you think security researchers do?

Yes, I've worked on security research projects for decades, winning millions of dollars for govt projects to do so. I am quite aware of the state of the art. You don't seem to be aware of basic things decades old.

Do you actually work in security research?

>Plenty of things are found all the time

You're confusing finding accidental bugs with an actor trying to hide behavior. The latter you will not find if the actor is as big as a FAANG or nation state.

If simply looking at things was sufficient, then the DoD wouldn't be afraid of Chinese made software or chips - they could simply look, right? But they know this is a fool's errand. They spend literally billions working this problem, year in and year out, for decades. It's naive that you think simple audits will root out bad behavior against malicious actors.

Even accidental bugs live in huge, opensource projects for decades, passing audit after audit, only to be exploited decades later. These are accidental. How many could an actor like NSA implant with their resources that would survive your audits?

Oh, did I mention [1]? Read it again. Read followup papers. Do some original research in this vein, and write some papers. Give talks on security about these techniques to other researchers. I've done all that. I have a pretty good idea how this works.

>what type of data is being collected

Again, you miss. I am not talking about the data being collected. I'm talking about the data in big systems that make decisions. NNs and all sorts of other AI-ish systems run huge parts of all the big companies, and these cannot yet be understood - it is literally an open research problem. Check DoD SBIR lists for the many, many places they're paying to have researchers (me, for example - I write proposals for this money) to help solve this problem. For the tip of the iceberg, read on adversarial image recognition and the arms race to detect or prevent it and how deep that rabbit hole goes.

Now audit my image classifier and tell me which adversarial image systems it is weak against. Tell me if I embedded any adversarial behavior into it. Oh yeah, you cannot do either, because it's an unsolved (and possibly unsolvable) problem.

Now do this for every learning system, such as Amazon's recommender system, for Facebook's ad placement algorithms, for Google's search results. You literally cannot.

Don't bother replying until you understand the paper - it shows that a code audit will not turn up malicious behavior if the owner is actively trying to hide stuff from you.

[1] https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref...

banads · on May 28, 2020

>Do you actually work in security research?

Yep!

>You're confusing finding accidental bugs with an actor trying to hide behavior. The latter you will not find if the actor is as big as a FAANG or nation state.

Wrong, wrong, wrong. Here's but one of many examples: https://googleprojectzero.blogspot.com/2019/08/a-very-deep-d...

>it shows that a code audit will not turn up malicious behavior if the owner is actively trying to hide stuff from you.

Yes, code obfuscation is a thing, but it's not a perfect silver bullet like you falsely claim, and often only serves to slow down researchers.

Of course it is true though that many bugs and vulnerabilities remain hidden which we may never find, and yet that's not a valid reason not to look for them, because there are many which are found every single day.

The perfect is the enemy of the good.

SideQuark · on May 28, 2020

>> Do you actually work in security research? > Yep!

Your comment history is interestingly lacking any evidence of that. Care to demonstrate it?

>>The latter you will not find if the actor is ...

> Wrong, wrong, wrong.

Tell me how you can audit an image classifier to ensure it won't claim a specially marked tank is not a rabbit, where an enemy built the classifier, and where you don't know what markings the enemy will use. You're given the neural net code and all the weights, so you can run the system to your hearts content on your own hardware. Explain how to audit, please.

Good luck. It's bizarre you claim people can find such things, when it's a huge current research problem, and it's open if such things can even be demonstrated at all.

Same thing for literally any medium or large neural network. None can be audited for proof of no bad behavior.

>code obfuscation is a thing, but it's not a perfect silver bullet like you falsely claim,

I've never said code obfuscation. Unless you understand the paper which you seem to repeatedly ignore, you'll keep making the same mistake.

The paper demonstrates how to remove the code that has the behavior while still embedding the bad behavior in the product. You cannot find it from auditing the product code. There is zero trace in the source code. The attack in the paper has been pushed through all layers of computing stack since then, and now is at the quantum level. And as these effects become more important, there can be no audit since what you want to know runs up against physics, such as the No Cloning Theorem.

That you don't realize this is possible is why you keep making the same error that looking at source code will tell you what a product does.

If your naive belief in this were that simple there would not be literally billions of dollars available for you to do what you claim you can. DoD/DARPA/Secure foundries would love to see your magic methods.

Have you read the paper? Maybe it will show you the tip of the iceberg on why audits are used to find accidents, but are much weaker against adversarial actors, to the point of not providing any value for really complex systems.

>The perfect is the enemy of the good.

I'm not saying don't audit. I'm pointing out your initial claim that this will find anything unusual. It can find common errors. It can find bad behavior inserted by unskilled people. But against groups that know about current work, you won't find anything, in the same way you cannot audit a neural network.

That you continue to think code obfuscation is the only way to embed bad behavior in a stack shows that you're unaware of a large section of security research. Read the paper.

banads · on May 28, 2020

>I'm not saying don't audit. I'm pointing out your initial claim that this will find anything unusual.

By "unusual" I mean anything that has intended or unintended negative effects on society, such as what was seen with Cambridge Analytica, or FBs emotional manipulation studies.

>It can find common errors. It can find bad behavior inserted by unskilled people. But against groups that know about current work, you won't find anything

Yep, and without a 3rd party audit, we can't even begin to approximate the degree of hypothetical bad behavior that exists affecting billions of people due to regular developers doing what they're told by their product managers (or of their own volition), let alone a nation state APT.

SideQuark · on May 29, 2020

You keep ignoring both how to audit NNs and how to address behavior not in code. Have you read the paper yet? Explain how audits work in light of the paper.

Without answering those you’re simply wasting time and effort by claiming audits can find things they cannot.

I quite doubt you work in security from your inability to grasp these things. Please demonstrate you’re not lying. Your posting history shows a tendency to be a conspiracy believer, and there’s zero evidence you do anything professionally in security, unlike the history of those I know that do work in security.

thejynxed · on May 27, 2020

The entire stack contains enough holes as to be swiss cheese, auditing the open code means nothing if and when something before that code in the stack manipulates the outcome of the code. This is one of the reasons those big security issues in Intel CPUs the last few years were such a big deal. The entire stack needs to be reworked at this point.

banads · on May 27, 2020

>auditing the open code means nothing if and when something before that code in the stack manipulates the outcome of the code.

In terms of software security vulnerabilities, there is so much low hanging fruit making exploitation trivial. Even if a small team within an intelligence agency knows about a zero day deep in the stack, addressing vulnerabilities higher up in the stack that are easily exploited by script kiddies necessarily reduces attack surface.

However, what we're talking about here is not so much about security vulnerabilities, as it is about design flaws (or features) which have harmful effects on society.

SideQuark · on May 27, 2020

There isn't a simple fix, or likely any "fix," for the issues you want to be knowable. Besides the economic impossibility of it, there are too many places to hide behavior that we cannot foresee due to quantum effects, complexity, etc. So reworking the entire stack is not reasonable or likely very beneficial.

It's better to incrementally address issues as they are found and weighted.

anigbrowl · on May 26, 2020

Not the recommendation engines. The graph. All the social media companies (and indeed Google and others) profit by putting up a wall and then allowing people to look at individual leaves of a tree behind the wall, 50% of which is grown with the help of people's own requests. You go to the window, submit your query, and receive a small number of leaves.

These companies do provide some value by building the infrastructure and so on. But the graph itself is kept proprietary, most likely because it is not copyrightable.

devonkim · on May 27, 2020

The graph in itself is pretty close to privacy issues that border closely as well. Even if FB et al were government funded that wouldn’t make it good either. And said data could be considered competitive advantages but perhaps not. If everyone got a copy of various social networks’ friends lists, the number of viable alternatives would skyrocket quickly because the lock-in effect would be gone. Perhaps this needs to be theorized more along modernized anti-trust laws (which don’t work in a tech given anti-trust laws were based around trying to lower consumer prices).

annadane · on May 26, 2020

Yeah, pretty much. It's easy for Facebook to claim that it's popular and the best thing going when you specifically need a FB account for contacting people

closeparen · on May 26, 2020

>advertising in all mass media is regulated to prevent outright lies from being spread

Advertising in mass media is regulated. You are very much allowed to publish claims that the government would characterize as outright lies, you just can't do it to sell a product.

SkyBelow · on May 26, 2020

Does that actually work? If they create some complex AI and then show us the trained model, it doesn't really give much insight into the AI doing the recommendation. You could potentially test certain articles to see if it is recommended, but reverse engineering how the AI recommends it would be far more time consuming than updating the AI. As such Facebook would just need to regularly update the AI faster than researchers can determine how it works to hide how their code works. Older versions of the AI would eventually be cracked open (as much as a large matrix of numbers representing a neural network could be), but between it being a trained model with a bunch of numbers and Facebook having a never version I think they'll be able to hide behind "oops there was a problem, but don't worry our training has made the model much better now".

devonkim · on May 27, 2020

It would make it clear or not whether the site tries at all to restrict certain recommendations like harmful content at least and the model would be different and less subject to top-down rules / policies like recommending government propaganda sites over independent sources. It could be used in later, better worded and targeted subpoenas for how said filtering and censoring works. It would also show if there exists a special promotion system for a company’s own products and so forth. In many respects, it acts like an org chart and to determine _what_ to scrutinize with more concrete actions as regulators and the public. It provides a map and that’s better than a black box or Skinner Box where we are the subjects.

root_axis · on May 26, 2020

Setting aside the concerns about the efficacy of the idea, it also seems like an arbitrary encroachment on business prerogatives. I think everyone agrees that social media companies need more regulation, but mandating technical business process directives based on active user totals isn't workable, not the least of which because the definition of "active user" is highly subjective (especially if there is an incentive to get creative about the numbers), but also because something like "open source the recommendation algorithm" isn't a simple request that can be made on demand, especially with the inevitable enfilade of corporate lawyering to establish battle lines around the bounds of intellectual property that companies would still be allowed to control vs that which they would be forced to abdicate to the public domain.