> At issue is a restriction that CDE requires researchers to sign as a condition for their gaining access to nonpublic K-12 data. The clause, which CDE is interpreting broadly, prohibits the researcher from participating in any litigation against the department, even in cases unrelated to the research they were doing through CDE.
That's an unreasonable restriction and I expect the ACLU to win this.
When you work with state level education data, you do so under a research agreement. That means you outline your research agenda and the state agrees to provide data to you to answer your research questions.
You can’t pitch a research project and then go rouge and do whatever with the data.
It looks like the state is interpreting that use of student data as part of the lawsuit to ve outside the scope of the prior approvals, therefore they are preventing Sean and Tom from using the data during the their testimony.
Nothing prevents the defense to subpoena the same data and have them use it for their testimony.
As I understand it, this is the government saying that data it provided cannot be used as the basis for supporting litigation against the government.
I am not a crazy disciple of the 1A but that seems pretty clearly to be something the government should not be able to do. Couldn't the government just slip that language into any of their FOIA agreements etc?
It would be a very different situation for a non-government actor to have the clause.
> You can’t pitch a research project and then go rouge and do whatever with the data.
That's not what this case is about. The California Department of Education is not claiming that Thomas Dee misused confidential data. The CDE said that their contract with Dee means that he cannot testify against them or participate in an unrelated case against them.
ACLU is often manipulative in how these stories are framed. They are an advocacy group advocating for their priorities.
If the headline was “Researchers publish confidential student data through litigation filings”, the crowd here would be pearl clutching about that. You may recall years ago when NYC released taxi data on their open data platform, that data let you basically track the movements of frequent taxi users.
The other question is… why don’t the researchers file a FOIA, which has no restrictions.
True but they are preventing Sean and Tom from speaking in court on behalf of the defence and preventing the defence from speaking by quoting Sean and Tom. In the context of petitioning the government, no less.
> That's an unreasonable restriction and I expect the ACLU to win this.
Looking at the details, it seems that this cannot be a blanket restriction, since a judge could compell you to provide testimony. [0] At that point it would not matter what the contract said.
I'm not the person you're responding to, but I think there's a case for optimism based on this being something which seems unreasonable, and not is thoroughly established by precedent. Especially since the organization challenging it exists to make exactly this kind of argument, and has a decent track record of doing it in the past. /shrug
I've been screwed by enough unreasonable contracts that I have little faith. But, yeah, I suppose "ACLU's lawyers think they have a case" is as good a reason for optimism as any.
How would you feel if data pertaining to your interactions with government were made public just because the government is taxpayer-funded?
How about data pertaining to your kids?
Any Joe Taxpayer doesn’t have a right to walk in and demand any data they want from a government department. That’s entirely entirely reasonable. Anonymising data isn’t nearly as easy as a passer-by with “faker.fake_name()” may think it is.
I didn't see any mention of individual student records anywhere; seems like a red herring.
It's very possible to aggregate and anonymize (remove PII); aggregate the records by zipcode, anonymized school ID, school district, grade-level of student, age of student, educational attainment, etc.
You agree that's possible, FOIA-compliant and has been done already for decades? Like how Census data is made public (the Census also uses fuzzing to prevent reverse-engineering to individuals, esp. in tracts with small populations).
> Any Joe Taxpayer doesn’t have a right to walk in and demand any data they want from a government department.
Red herring: the CDE is not trying in good faith to define what level of aggregated data would be sufficiently anonymous; they're blanket-opposing legitimate public access to this data (even highly aggregated) via the researchers being allowed to testify in court.
> How about data pertaining to your kids?
Absolutely can and should be disclosed, in aggregate. Otherwise you have a public entity spending $128.5 billion taxpayer money that is not performing so well, violating constitutional disclosure requirements, gotten worse since 2020, lost students to homeschooling [0] and move-aways.
In any case, this isn't fodder for an opinion poll, it's what the Constitution says.
Might it bump against privacy? If you do a search on third graders receiving speech services in towns of pop less than 3000 in county... at some point you have private information.
Just for context. I'm a PhD trained in education research who has met Sean Reardon a handful of times, had a meal with him, gone through methods training with him. He sits at the top of the field and has the unconditional respect of nearly everyone for his methodological rigor.
One of the weird things about America is that we all know Asian kids are better at math than other kids on average. Its pretty obvious to anyone that's been in a class with Asians or taught Asians. I've done both.
But nobody can actually say this. Instead we have to pretend like this isn't the case. Just look at math Olympiad teams. I coached one years ago. My entire team was Asian except for two alternates. One who was Russian, and the other Indian.
Yes, environment can change outcomes ... but maybe it can't change outcomes to a point where everyone is going to perform the same. Are we going to try to get everyone's 100M sprint into the same range too? People are different.
We should give every individual the same shot at opportunities but I don't think we are ever going to make Asian kids perform at the level of other kids in math or vice versa. Its not environment. Every one of us that has taught an engineering or math course knows this. Even if we don't talk about it.
To sum up another comment: it's cultural, not biological.
Race is not a useful scientific guideline for any kind of scientific study. For example: there is as much biological diversity in sub Saharan Africa as the rest of the world, but racially, the best we can do is "Black", or "African". It's a useless, dated concept that we, as species, find it difficult to work past because our brains are categorical engines.
I'm as politically "leftist" as anyone you'll ever meet, but we have to be able to do better than "Asians are good at math" to make effective decisions about education, amongst other problems. This is of course impossible with the current world and thinking. Even though I know race isn't real, I still see it. It still has an impact on my day to day actions, because my stupid brain is all too happy to categorize people on how they appear.
Taking another route: to say that Asians are good at math is categorical error. The word "Asians" represents something abstract, and abstract things cannot take action. Categorical error is basically the starting point for the various "isms" like misogyny, misandry, racism, etc.
I don't believe it's cultural only, the same way I don't believe ethiopians or kenyans excelling at marathons and long distance runs to be a cultural thing. Genetics play a factor, why can't math skills be influenced by genetics as well?
The difference in marathon times between Ethiopians or Kenyans and people from other countries is very very small, in the order of ~2%. How do you know it's not cultural? For example, in running, no one thought you could run a four minute mile, then as soon as one person did thousands did as as well. It could easily be explained that too athletic talent is far more likely to go into marathon running instead of other disciplines in those sports, as is the case for regional dominance in many other sports, so in reality the actual advantage is even slimmer if it isn't null at all.
I'll end with a question - competitive cycling requires the same abilities as marathon running, that is, optimal oxygen intake and usage, and great endurance in the buttock, leg, and foot muscles. Why are Ethiopians not dominant there?
I am pretty sure the effort characteristic of endurance running and endurance cycling are very different. You use your legs for both, but the muscle groups are different and they way the muscles are used is different.
Also, to present a possible answer to your rhetorical question: cycling is a sport for rich people. A good bicycle, that a kid that wants to pursue the sport must get early in their life, is very expensive.
Not as much as you'd think, many of the muscles are in common when you use clipped pedals.
The most important factor in either sport in any case isn't the muscles, it's oxygen intake and use efficiency.
As far as your argument, by effect of selection, doesn't that meant that top runners are more likely to come from poorer countries? You also don't really need a good bicycle to train, just to compete, but even that's pretty expensive so I take your point.
You realize cycling is one of the sports that is most under scrutiny when it comes to doping, right? Unless the advancements in doping have produced a method that is undetectable chemically or through a rider's bio-passport, doping is much less of an issue now than it was any time in the past 5 decades.
Question: if race isn’t real, are you fine with medical research continuing to being dominated by studies on “whites”, and ignoring whether it applies just as appropriately to other (not real) races?
If race isn’t real, then as a white person, my bone marrow should be just as compatible for transplant at the same probability for blacks, asians, and “mixed race” people as it is for other whites, right?
I totally agree that monitoring every single human being on the planet, and recording and analyzing their individual DNA and second by second logs of their biomarkers and external environment from womb to death would definitely be “ideal”. But we aren’t there. Yet.
In the context of trying to manage finite resources and time, broad messy abstractions have been and will continue to be crucially important, despite not being pure. Trying to erase things like race with an ideological handwave is harmful.
I have no idea if its cultural or genetic, but the difference is there in the classroom. Its pretty clear that various programs run by the state aren't making much of a difference. If you need to change the culture of kids to make them better at math, then we should do that. Although I have no idea how to quantify what that culture is and how to apply that change across a school.
edit: also, note, you've said a lot of stuff about categories. But categories can blend into each other and cause ambiguity at the margins. But that doesn't remove the validity of there being categories. When I look at the kids who come in to competitive math programs, and these are kids who are more than a standard deviations above the mean in performance, I see a lot of uniformity. One can try to construct various explanations for this, but you can't tell me that there were NO kids from underrepresented ethnicities that had two professional parents and good exposure to mathematics early. We have plenty of racial diversity in the early math programs. And in the Bay Area there are plenty of professionals sending their kids to these programs from all ethnicities. And still, ten years later its the Chinese and Taiwanese American kids that are in the Olympiad team. And even at a lower level, say SAT math, we see the performance skewed by race in the same way. I am ethnically Indian. And Indians are as interested in math as Chinese. We all send our kids to math tutoring. We are mostly engineers in the Bay Area. And even so, at the very top of the distribution, there are some Indian kids ... but far more Chinese American and Taiwanese kids. These are just facts. I don't take it as a slam against my ethnicity that we don't do as well in math as the Chinese. There's more to life than math after all.
Hi. I did my master's in computational biology focusing on androgen independent prostate cancer. After that I worked in an autoimmunology lab. My projects included rheumatoid arthritis GWAS and b-cell phylogeny. To demonstrate that we did case-control matching correctly, I looked at how well self-reported ancestry corresponds to hapmap populations. The mapping is very noisy. "Race" is a social classification, sure it's correlated with biological markers but there are better measures. So, yeah, "race" as such isn't important.
I don't follow the conclusion that you're trying to draw. It sounds like you're saying that people do not self-report their own ancestry accurately better than chance.
On the surface of it this sounds absurd, because (unless adopted) people do not determine their ancestry by looking at photos of themselves. I can see getting proximal affiliations wrong, confusing or missidentifying oneself as being half Italian when they're actually half Iberian, or or confusing turkic ancestry with Persian. But I don't think people are going to not know whether they are primarily of say East asian, african, or european ancestry.
"Race" is a social construct. We assign "race" based on physical and cultural traits, not genetic. We back into the relationship of "race" and "genetics".
You could easily have a genetic predisposition to prostate cancer without being a certain race, even though that "race" may have a higher propensity for that genetic trait.
>We assign "race" based on physical and cultural traits, not genetic.
Not really, everyone knows that an albino African is still an African. Physical traits are just the most visible aspect of genetics. And your second point is just explaining outliers, it doesn't say anything.
Race is entirely a social construct. You can't do a genetic test and with certainty determine someone's race. Certain genetic traits are common among what we call races, but not exclusive.
Take a look at services like 23andMe or other services, the genetic components of race are entirely based on self-reporting, that is, we call certain genes "Asian" because people who identify as Asian had those.
Suppose you are looking at a 52 card deck, and members of each of the four “shapes” self-identify (with some random noise, and maybe even systematic deviations — like sevens and aces are identified differently from just their shape, etc) as different “suits”.
The pairing between shapes & suits will of course be tautological because the names of the suits are cultural artifacts, but the shapes would still be distinct regardless.
> Race is entirely a social construct. You can't do a genetic test and with certainty determine someone's race. Certain genetic traits are common among what we call races, but not exclusive.
This seems confusing and contradictory. If traits are common in certain groups and not in others (needn’t be exclusive), then by Bayes rule these traits should identify groups with high probability (especially when combining multiple traits)
But genetic testing doesn't identify groups with high probability, because the overlap is so high. And it often misidentifies racial groups because of that.
This isn't a problem we have as a species. It's not biological, it is cultural. The racial categories we use today were created in the 17th century to justify the white supremacist apparatus of slavery and colonialism - prior to that, people tended to categorize humanity by tribe, ethnicity or religion rather than superficial physical traits. Asian people, for instance, didn't see each other as the same "race" until white people came along and assigned them that categorization.
You, I and everyone else are stuck in this way of thinking because we've been so thoroughly indoctrinated into a system of white supremacy which permeates the entirety of Western culture, it isn't even noticeable, like we're in the Matrix. It persists because it's useful for keeping the power centers that benefit from it entrenched, and everyone else divided.
We can move on from it, but I think the first thing we need to do is recognize that it isn't inevitable.
> Asian people, for instance, didn't see each other as the same "race" until white people came along and assigned them that categorization.
Do you think Asian people would have not come to the same conclusion, even if white peoples hadn’t said so first? I tend to think it was somewhat inevitable that Chinese, Korean, and Japanese people think of themselves as having more in common with each other than with French or Mexican people.
Sure, but the racial categories do vary considerably around the world.
And even in a single location, if you look back in time, you can see how people got categorized shifting. People used to insist Italians weren't white here in the US.
> One of the weird things about America is that we all know Asian kids are better at math than other kids on average. Its pretty obvious to anyone that's been in a class with Asians or taught Asians. I've done both.
> But nobody can actually say this.
“Asian kids in the US are, on average, better at math and, furthermore, this effect is stronger the fewer generations removed from immigration they are, and is in large part due to well-established general familial impacts on performance and the selective filter of immigration.”
> we all know Asian kids are better at math than other kids on average.
Mostly because of who immigrates to the US. Those Asian and Russian kids you mention probably have software engineers as parents. The Hispanic kids probably do not.
If we got the poorest and least educated immigrants from the same places, we'd be seeing rather different results.
That wouldn't change the end result though ... which is that for kids in America, we aren't going to get everyone at the same level of performance as Asian kids by money alone.
This is absurd cable news pundit-level commentary. It doesn’t sound like you’ve actually looked into this. More that you’ve taken some snippets of your life experience and explained it using your preconceived worldview. Nothing empirical about it. Nothing scientific. And the cherry on top is the implication that you’re “saying what we are all thinking”. Your experience teaching engineering or maths courses doesn’t qualify your baseless intuition as to causality, especially when the stakes are so high as to typecast such large groups of people.
This is a classic case of a misplaced assumption of transferable expertise.
I think the problem is more about skill floors than skill ceilings. I never really cared that much about Asian's positive stereotypes, but the same arguments have been used to propose arguments that hispanic/black students are inherently inferior. Despite the face that much of that, was and is environmental issues.
Once you hit a certain point, sure. There will simply be people who's brains work differently and efficiently, similar to runners who have different pique physiques for their respective category (I don't think we even tap into half of that in classical public teaching but I digress). But it's a much bigger sale to say that some people simply can't pass high school level acedemics and use that against them.
It doesn't seem entirely unreasonable that if a school system gives a researcher access to data that isn't shared with the public, the researcher agrees not to use that information to sue the school system. Such agreements would allow the school system to be more free to share information.
The issue here seems to be that the school system is saying that the researchers aren't allowed to be a witness in any lawsuit against the school system regardless of whether it has to do with the data that was shared with the researchers.
I think a bigger issue is whether the school system should be allowed to keep any information private in the first place. If the information can safely be shared with a particular researcher then it seems like there is minimal benefit to society in letting the school system pick and choose who gets access and who doesn't.
If the institution is public, data should be public as long as individual PII is removed. No exceptions. And FOIA requests should be able to make this data available to anyone filing for an access within a reasonable amount of time. Period.
I did some consulting on this WAAAAY back in the day.
In the case of education, good enough anonymization often isn't really possible. A lot of information about school students is public -- yearbooks are gold mines, as are results from extra-curricular activities such as sports, class notes about whether/where they went to college or first jobs after college, etc. Still more can be purchased. As yet more can be inferred from public data (eg home address, rough estimates at parental income, etc.). This was back in the day. I'm sure now it's much worse.
Most of the questions you want to ask about education are about treatments and outcomes. If these treatments (eg extra-curriculars) and outcomes (eg attended college, graduated HS, etc) are public then you can often figure out which student corresponds to each supposedly anonymous data-point.
Maybe not perfectly. But way more than you would think. It's like the statistics version of those little logic puzzles from grade school -- "Four people have red hair. Five are girls and two are boys. Billy and Sally are chewing gum. Boy gum chewers have brown hair. No one over 5 has red hair. Sally is 4. etc. etc. etc. Match each person's name with their hair color.". You sort of figure out a small set of data points, then look at the results of the paper and reverse engineer some statistical calculations, and then a surprising amount of the others start falling into place.
We didn't have a name for it when I did this work, but the basic point was "if you publish the dataset everyone will know little Johnny Table's test scores and GPA". Today this is called a reidentification attack.
(I don't really know enough about the topic to have an opinion on the article per se, but "just publish everything" is definitely not a workable solution :))
LOL. I'm not going to deanonymize datasets for an internet stranger. Certainly not for free, but also not at any hourly rate unless I know that the organization employing me either has the original non-anonymized dataset or at least have very strict internal controls about how deanonymized data will be handled.
(I also haven't done this work in a LONG time, and there's now a whole lot of academic work on the topic that didn't exist back then, so there are probably much better consultants for legitimate organizations looking to hire for this work.)
If you want proof, you can use google to find LOTS of papers along these lines analyzing real datasets. I think https://arxiv.org/pdf/cs/0610105.pdf is a fairly typical example.
I work with large private K-12 datasets regularly. Even as a passer-by, we had one for a big government-issued dataset in Australia. It might’ve been census data? It might’ve been just over five years ago? I actually don’t care. The principles of effective data anonymisation, esp. in education, are understood by people that actually work in this area. It seems odd to be so demanding.
I’ve noticed a trend on commenters making these sorts of demands. I can’t tell if it’s a new (particularly dumb) rhetorical ploy or if they’re just looking for free labor.
> If the institution is public, data should be public as long as individual PII is removed.
PII is much broader than most people understand because reidentification of what amateurs would see as deidentified data is easy (often trivial), and, as a consequence, to be useful for research data is often not fully deidentified.
EDIT: As an example, the HIPAA safe harbor deidentification standard requires removing 18 kinds of identifiers, including, as one of them:
All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP code, and their equivalent geocodes, except for the initial three digits of the ZIP code if, according to the current publicly available data from the Bureau of the Census:
(1) The geographic unit formed by combining all ZIP codes with the same three initial digits contains more than 20,000 people; and
(2) The initial three digits of a ZIP code for all such geographic units containing 20,000 or fewer people is changed to 000
To add to this, PII isn't always even clear. Different jurisdictions identify PII differently, there isn't One Master Definition that you pass a unit of data through, upon which an authoritative "THIS IS PII" or "THIS ISNT PII" is returned.
We had a multi-month project to get a subset of our data considered 'clean', and it required a consultant, a stats PhD and many dev hours. It was healthcare, so on the high end of paranoia (justifiably) but nowhere it is as simple as dropping the "name" column
HIPAA also allows for “expert determination” [0] for deidentification that differs from safe harbor and can allow for all sorts of things since there’s no definition of what an expert is.
And reidentification risk can be as high as even 1% and still be acceptable for hipaa. In a dataset of a million people that’s 10,000 people identified and still be “acceptable.”
But hipaa doesn’t apply to these CA data, it’s just the clearest example of deidentification regulations I know of.
But it’s totally possible to deidentify data suitable for release to these researchers. It’s just what CA considers deidentified and if it’s still useful enough to these researchers. For the topic they are researching it should be pretty straightforward to remove PII enough to protect individuals and only remove some really unique characteristics (ie, only a single 20 year old or a particular race and ethnicity).
But I’m guessing age groups by race and gender and socioeconomic are possible to preserve without tying back to an individual. Id go so far to say as it would be non-trivial, but pretty easy, for CA to produce this for the researchers, if not to the general public.
The intent of the comment was not to say the process is trivial or that removing PII is sufficient. However, it is not as impossible as people are making it out to be. I’ve worked on datasets at social media companies where literally thousands of columns were considered PII but realistically removing/scrambling just a subset of columns would make it impossible to identify individuals.
> I’ve worked on datasets at social media companies where literally thousands of columns were considered PII but realistically removing/scrambling just a subset of columns would make it impossible to identify individuals.
Maybe, though I doubt it was that easy against any but the most trivial reidentification efforts, but since most privately held PII isn’t regulated (in the US at least), there's little consequence for a social media conpany getting it wrong other than PR.
> data should be public as long as individual PII is removed
This is one of those things when ideology doesn't match the real world. If the amount of data is large enough and with enough parameters, removing PII doesn't do anything to protect privacy.
What about medical records? What about protected classes? What about data about vulnerable people or victims?
Student data is protected by another layer of regulation and for good reason. Also, the judiciary is a 'public' institution in general. Should we not seal records for minors? 'No exceptions' - year right..
> If the amount of data is large enough and with enough parameters, removing PII doesn't do anything to protect privacy.
Honestly it doesn't have to be that large. We see this all the time with data websites or apps collect. Sure, you remove John Smith's name, but you still have his GPS coordinates. For the school, you remove Professor Smith's name, but you have a professor who teaches CS 123 and has 4 graduate students. You bet you can guess who that is.
I really do support open data, especially about public institutions, but at the same time we are in an era where this information is quite powerful. Seems to make a case for something like homeomorphic encryption or something, but will that even stop these collisions?
The appropriate notion here seems to be Differential Privacy, which is a mathematical definition informally saying "a scrambling of the dataset that is information theoretically indistinguishable from that where one arbitrary person is added or removed". It's a surprisingly deep topic, with entire (very good!) textbooks dedicated to it.
We're getting a little off topic here, but the Federal government has published specific guidance on de-identification of medical records. You can construct some artificial scenarios where re-identification might be theoretically possible through record linkage with other data sources but in practice it's unlikely. In principle a similar approach could be used for student data, although I'm not familiar with the legal issues.
But all of this is orthogonal to the core issue of whether a state government should be allowed to prevent researchers from participating in lawsuits. There is no student privacy issue involved there. Witnesses in a civil suit still aren't allowed to violate student privacy laws regardless of the data they have access to, so it makes no sense to conflate those issues.
> We're getting a little off topic here, but the Federal government has published specific guidance on de-identification of medical records.
But releases (even without patient consent, with an IRB waiver) of non-deidentified PHI data for research is allowed, and this is specifically because deidentification necessarily destroys elements that would often be necessary in research.
> You can construct some artificial scenarios where re-identification might be theoretically possible through record linkage with other data sources but in practice it's unlikely.
It is explicitly part of the HIPAA safe harbor standard that, in addition to removing the required identifiers, you cannot come up with such a scenario, and if you can, the data is not deidentified. (The last criterion of the standard is “The covered entity does not have actual knowledge that the information could be used alone or in combination with other information to identify an individual who is a subject of the information”.)
What does any of that have to do with the legal issue of whether a state should be able to prohibit participation in certain lawsuits as a condition of gaining access to research data? Neither party has raised re-identification as a concern, nor have there been any allegations of privacy law violations.
The top-level concern is something like this: professors use their trusted relationship to schools in order to make bank on expert witness fees, which feels a bit corrupt and calls into question the researcher's motives.
A rebuttal to this concern is that we can side-step that issue entirely because these data sets should be public anyways (anonymized, of course!). This obviates the above concern, since the researchers won't need to compromise themselves in order to get exclusive access to data that allows them to be expert witnesses and rake in $$$$.
But the problem with that proposal is re-identification: if we can't make the data anonymous, then we all agree that it shouldn't be released (implicit in the "anonymized, of course!" caveat to "just release all the data" proposal).
Then you pointed out that even for more important data like healthcare data, FDA apparently has ways of allowing release of data that takes into account the risk of re-identification risk (I didn't know this; thanks for sharing!)
Then dragonwriter and you got deep into the weeds on HIPPA stuff.
TBH I have no idea which of you is most correct here. But anyways, there are two ways for this conversation to go:
1. You are correct, good enough anonymization is possible: Stanford researchers should not be silenced; it is problematic that they have access to data other people cannot access, but the correct solution is to negate the originally problematic distinction between those researchers and the general public by making data public. Then there is no reason for the researchers to agree to these contract clauses, because they will have access to the data.
2. dragonwriter is correct, good enough anonymization is not possible: We can go back up to the top-level concern and observe that "just release all the data with anonymization" isn't a feasible solution to this problem. Or maybe there isn't actually a problem here at all. IDK. But in any case, "obviate the problem in the top-level post by releasing anonymized data" isn't a workable solution.
Again, not following closely enough to have an opinion, but that's where we are now.
I think a good compromise position is that we should have a law stating that K12 data should be available to certain education researchers -- subject to IRB approval and so on -- without any other strings attached. Including "don't sue me" clauses in releases of public data sets does feel like an inappropriate abuse of student privacy concerns.
The researchers don't have a trusted relationship with schools. They have a contractual relationship with the state government. The fundamental issues underlying the lawsuit are First Amendment freedom of expression and contract law; expert witness fees and researchers' motives are irrelevant.
Whether student data de-identification is good enough or not is a total red herring. No one has accused the researchers in this case of violating privacy rules. The comments here about such privacy issues are largely hypothetical and tangential.
If you think that California needs a new law expanding research access to educational data then feel free to suggest that to your state legislators, or sponsor a ballot initiative.
It's not a red herring. It's a side conversation about a different but related topic.
Someone proposed just releasing all the data.
Someone else replied with why that wouldn't work.
Ie, a conversation happened and the topic of discussion shifted.
FWIW I agree with you on the object level question. No idea why you're being so abrasive, especially when you're the one who initiated/continued the conversational thread about deanonymization and even prefaced with "We're getting a little off topic here".
Presumably at that point you understood that the topic of conversation had shifted, and people's agreement/disagreement didn't necessarily have anything to do with the original topic... since you literally said so and no one disagreed... so your reaction here is pretty odd and off-putting.
> What does any of that have to do with the legal issue of whether a state should be able to prohibit participation in certain lawsuits as a condition of gaining access to research data?
As you yourself noted upthread, you had already taken this subthread afield from that topic.
This is true but it’s quite possible to correct for unique or infrequently occurring combinations so privacy is still preserved and data are made available.
It’s not that hard to design data release to compensate for privacy protections and statistically test for a specific level of risk. There’s a whole body of work on statistical disclosure control and there’s plenty of open source or cheap enough privacy enhancing technology available.
I’m not familiar with CA, but I expect they have someone on staff who can produce a “safe” dataset that preserve privacy and still allows for this question to be researched by low level geography, demographic, and socioeconomic factors.
The devil is in the details. These records were likely not designed to be shared and I'd assume the entire system contains vulnerabilities that could create leakage. Leakage that could be used to harm individuals in a variety of ways - from discriminating future prospects to harassment and much much more.
I agree in principle to what you're saying but we need to be truthful about what these current systems are capable of.
Removing PII would probably also involve removing individual grades and other information that's necessary for any research to be effective. Thanks to predatory data collection practices on the internet, we know how little information you actually need to deanonimize someone. The problem only becomes worse when we're talking about kids.
That said, research that cannot be reproduced is useless. There's a balance to be struck here, and it's somewhere between "make all data public" and "lock the data in a vault".
You'd have to remove so much PII as to make any examination worthless "A student of age <redacted> and gender <redacted> at school <redacted> has a GPA of <redacted>". As little information as "a 16 year old black male at Main Street High School" can be narrowed down to a handful of possible candidates at a lot of CA schools.
I disagree as it depends on how many 16 year old black males there are in that high school. It’s pretty simple to apply k-anonymity to control for an acceptable risk level. And add in generalization of age into groups and many questions can be answered.
I think you could definite answer race x gender x grade but it will be harder when you factor in more unique characteristics like household income or vaccination status, etc.
If FERPA makes it illegal to share the data with researchers, then certainly it shouldn't have been shared.
If FERPA allows sharing the data with researchers, is it right/proper/legal to share only on the condition it can't be used to harm the schools in court? Presumably that part is not in FERPA.
(And to be clear, California here didn't just say they couldn't use the data in court, they said the researchers could not testify in any court case at all against the state. But we're talking hypothetically)
Even anonymous records? That would seem to preclude studying the effectiveness of the education system. And if they weren't anonymous, what possible conclusions could you draw compared to anonymous records that would warrant that access?
Of course not, but there are tests that can be applied to determine if privacy is protected.
It’s not possible to just aggregate and be done. But it is possible to set some privacy threshold and then insure that all records conform to that acceptable risk level.
In theory this makes sense, and I agree with the spirit. In practice, you can often re-identify individuals even in an anonymized dataset. For example, if you're dealing with a very rare disease, or a small minority group, you can usually figure out an anonymized row of data is referring to if you really try. So, it's not so simple, and the responsible thing to do is not have a blanket policy that takes human judgment and accountability out of the loop.
I very very strongly disagree. It's important that researchers get access to diverse data and industry collaboration is often crucial for this. If companies are required to make all their data public they will be far less willing to collaborate with research. It's already hard enough as is to convince corporations that it's worth their time.
> The issue here seems to be that the school system is saying that the researchers aren't allowed to be a witness in any lawsuit against the school system
Exactly. With this bit being particularly outrageous:
> “Also, be aware,” wrote Cindy Kazanis, the director of CDE’s Analysis, Measurement, and Accountability Reporting Division, “that your actions have adversely impacted your working relationship with CDE, and your response to this letter is critically important to existing and future collaborations between us.”
How is that relevant? Because both have the word "public" in their descriptions?
(Incidentally, I think a lot of ills in the modern world exist because of companies that exist only to increase their value in stock exchanges, rather than to be useful.)
The state exists for the people. They serve me (and you).
Publicly traded corporations are very different.
My tax dollars pay for the government to operate and collect data. Not so much for publicly traded companies.
That being said, for publicly traded corporations there are regulations on what data they must release, but I think it’s mainly about financial performance.
So a private education system would not need to release anonymized data on its students. But a public education system has a legal duty.
I think a bigger issue is whether the school system should be allowed to keep any information private in the first place.
Are you genuinely suggesting that the public should have access to all attendance records, grades, test scores, etc etc of all students everywhere? That's the sort of information these researchers have.
The issue is the "sufficiently anonymized" part. Given a large enough number of dimensions, you may be able to identify students well enough.
For example, if you take all students that took course A at time X, course B at time Y, course C at time Z and so on, eventually you might be able to narrow it down to a very small group, perhaps to even a single student.
How can the researchers simultaneously publish research and not be allowed to testify to their conclusions in litigation though? It seems clear that this is not a privacy concern and is rather a protective measure.
This will also probably follow a power series too. So it isn't unlikely that you could deanonymize someone given just 2 courses. Not much information is needed to encode a lot of things.
You're attempting to make an argument against any anonymized data being used in research. You'd have to do better than a hypothetical to make headway with it.
Moreover, the logic would have to carry over to the very common practice of anonymizing data in professional communications (like training). Which would have HIPAA implications for some students.
The common anonymizing practices have been utilized for decades without privacy breaches of note. That track record is also what your argument would have to defeat.
What I mentioned applies to safeguards against de-anonymization in the event of public access. Ie: a published research paper or professional notes left behind on a bus.
Anonymizing a huge data set like this is impossible.
Also, the burden of proof is on those that say that the data has no privacy implications, not on those who are like "ehhh, it's probably safe to release this."
> Anonymizing a huge data set like this is impossible.
That depends. The entire dataset of course because it’s everyone’s student records. But you can probably subset it to the extent that it’s still useful and perturb enough to protect individuals and be statistically equivalent.
And you could also generate a bunch of aggregate results that do stuff like identify average grade differences before and after periods while correcting for other differences without including individual identifiers.
You're moving goalposts, friend. What you're suggesting and what OP are suggesting are in two completely different categories of disclosure extent. I don't think anybody here is suggesting that no data should be available to the public.
I agree with the person's criticism of your comment.
Yes, obviously there is a level of aggregation where privacy concerns no longer hold.
But there is no trivial transformation that allows education researchers the data they need but preserves anonymity. Education researchers want to aggregate and statistically sample the data in new ways; pre-aggregating it removes most ability to do so. If you want to do a principal component analysis of a few variables-- good luck with aggregate data.
If you provide nearly any data at the student-level, there's a pretty high chance that it can be deanonymized.
At the same time, the state's position of attempting to prevent education researchers from participating in litigation (when using only public, non-restricted data) is egregious.
I’m talking about perturbing the micro data enough to allow researchers to answer their questions while remaining analytically valuable.
For example with school attendance data you could easy release a dataset at the county level with every student’s record with unique generated student id, race, grade, gender, absences by year (or even month) and still have 5-20 of each category to be able to show attendance trends before and after Covid without being able to identify individuals. And, if necessary, suppress really unique race or gender instances (eg, maybe there’s only one trans, Native American in a school) while still being useful enough to make useful findings for general trends.
I don’t know what specific questions, but the state not releasing any data to them and claiming privacy is silly.
The census and department of ed already do this and the department of ed has a very useful description of how they apply privacy protections and validate that data are sufficiently anonymized for public release, https://studentprivacy.ed.gov/sites/default/files/resource_d...
> For example with school attendance data you could easy release a dataset at the county level with every student’s record with unique generated student id, race, grade, gender, absences by year (or even month) and still have 5-20 of each category to be able to show attendance trends before and after Covid without being able to identify individuals.
Yes, for any kind of specific study you want to do, you can form aggregates that support it. Indeed, lots of aggregate data is already released publicly.
Lots of aggregate data is already released publicly.
If you want the actual, real data, so you can answer questions like --- "what about attendance on Mondays in students receiving subsidized lunch-- what does it predict about that student's attendance in the future?" --- you'll either need the real data, or for the state to basically do your data aggregation for your specific question.
The word "anonymized" needs to be excised from our collective vocabulary. "Anonymization" is not a thing that can be meaningfully done to a dataset about individuals. Coarse aggregation is possible, and the only practical way to achieve this end, but this has its own drawbacks in a research context.
I've actually gone through this process with the CDE and I was denied access. The privacy issue is a huge red herring, used to co-opt well meaning people like you.
I requested data about STAR, the California standardized test used for Lowell's admissions. I wanted rows of the form (randomized student ID, STAR question ID, answered correctly), however they were recorded, and literally nothing more.
They rejected the request because (1) they claimed such records didn't exist, which makes no sense because how exactly did they administer the test then; and (2) because standardized testing is carved out, in their opinion, from the related sunshine law.
Why did I want these records? I wanted to show that scoring well on tests and using them to gate admissions doesn't mean what people think it means. Specifically, that if you administered the test Lowell used (STAR) by hardest question first, then terminated the test after the student gets N (close to 1) questions wrong, you would select nearly the same list of students. Only asking the vast majority of students only e.g. 1 question, which they all get wrong, can't possibly measure how much they study, how comprehensive their knowledge is, etc. But these claims are routinely made in defense of the test and its purpose in selecting a class. This is coming from someone who wants test based admissions.
So clearly political, right? I had to carefully word my request around all these conclusions. If you read the CDE's requirements, they really have specific political goals. You either align with them or you don't. And I tried to work around that, and I stilled failed. They just looked at the absence of a political bent, and correctly concluded that it wasn't evidence of absence.
If you want to do good, politically impactful educational research: run your own school. That's what the CDE wants you to do. It's not about discovering how to improve public schools.
Not sure it's fair to say it's a red herring, or that I'm "co-opted" like you suggest. Transparency is kind of my main dig -- I get it. Like, I recently helped a small team of researchers with some FOIA requests to get access to similar information you were denied.
But at the end of the day it's fundamentally important to understand at what point transparency and privacy intersect.
> But at the end of the day it's fundamentally important to understand at what point transparency and privacy intersect.
"At the end of the day," these conversations about privacy are like 15 minutes long at private schools. People still keep sending their kids to private schools. I just don't know how much it matters.
They surely care about privacy in their internal research and metrics, but they don't employ a full time Privacist. They might employ someone who checks the right boxes for them and deals with FERPA shit. But because they are aligned with the parents in delivering the best educations, for the most part, they are trusted to do with data what they want, and that sometimes includes inviting outside collaborators to look at it, without anywhere near the same faff as the CDE.
If you're a journalist and you want to help a private school make a better education, out of the thousands of private schools, one of them will both let you write about it and also tell them something they don't wanna hear. Some might use privacy or whatever as the reason they don't want to collaborate with you, but on average it will be about trust.
The CDE is never going to do that. There's only 1 CDE, and they are there to preserve the status quo.
Very similar things happen when investigating criminal cases. There's possibly hundreds or thousands of instances of some type of misconduct or improper arrest... but none of the defense attorneys with those sorts of cases will talk about it with the press because of the very real potential harms of talking with the press. Or the ones that do talk are too high level.. or they might have some ulterior motive like self-promotion. It's really hard to express how many issues are a direct result of lawyers understandably, but systematically not raising any public awareness about truly awful things.
Have you tried getting your data through CPRA requests? I'm out in Illinois and our law is pretty decent and not super familiar with CA's public records nuance, but it's really worth a try. What I know though is that California CPRA officers get away with a strange amount of abuse of the law. But even with that, you might be surprised what records are available. So if you do submit some requests, don't exactly expect it to be easy or immediate. Expect to be stonewalled, and need to sue at some point though. But IME public record suits are pretty hands-off (except when they're not..). And most of the lawyers I've worked with are upfront about what they will and won't litigate over.
One thing you'll find is that.. basically nobody is looking into most of the awful things you'd expect would have eyes. It's very likely you'll be the only one doing those requests, or incrementally identifying how to get what you want through multiple requests over months. But each step breaks new ground and turns into feedback loops if you can build a community around it.
If going until the student gets one or two of the hardest questions wrong is highly predictive of whether they get selected, that implies that students near the selection threshold are getting very few questions wrong, right?
> Only asking the vast majority of students only e.g. 1 question, which they all get wrong, can't possibly measure how much they study, how comprehensive their knowledge is, etc.
This seems like a strawman?
Yes, a single question can't measure those things to a high degree of certainty.
But if you have students that do poorly on all the hard questions, and students that do well on all the hard questions, then asking them a single hard question might be 80% predictive of what group they're in.
Why is it bad for that percentage to be high?
The reason the test has lots of questions is specifically to increase the predictive quality. Being able to loosely predict from a small subset of questions seems reasonable to me. It doesn't mean the test is failing to measure the student's knowledge.
>I had to carefully word my request around all these conclusions.
Aren't you clearly saying you already had a desired outcome and were just fishing for the data to confirm it? I mean, I wouldn't give you any data in that case either. It's a strong signal that you are motivated by something other than what the data shows.
Do I think that, in principle, data sets can be anonymized? Of course I do.
Your incredulous tone and excessive ellipsis seems to imply you find this position to be ridiculous, so maybe you'd better be a little less snide and a little more expansive on, what, exactly, your problem is.
It's because their confidence in deidentifying data doesn't match the significant risk. If you think it's worth it and are willing to take that risk that's on you and those you risk harming.
That's not strictly true. There's some recent work (as fascinating as it is incomprehensible) on generating datasets that share most aggregate properties with the actual dataset (measured through joint probability distributions), but do not reveal more than some epsilon of information about any individual contained in the original data set.
These have the potential to revolutionize private computation and analysis, as they provide provable hard (theoretical) limits on the amount of information you can learn about individuals regardless of the type of analysis performed on the proxy dataset.
While not _as_ "entirely unreasonable" as what the state is actually doing -- and I think we should be clear, that, as you say, the state is doing way worse and trying to prevent researchers from testifying on any matters at all...
I'm not totally sure it's actually reasonable for a government to withhold data from researchers because they think it might be used against them in a lawsuit either. Is that a valid reason for a government institution to withhold data?
Perhaps a court case will end up establishing that the broader thing is in fact unreasonable under the first ammendment too, perhaps this is a good "test case" being even so much more egregious, you always want an especially egregious case.
> The issue here seems to be that the school system is saying that the researchers aren't allowed to be a witness in any lawsuit against the school system regardless of whether it has to do with the data that was shared with the researchers
I think the issue isn't being a witness in the general sense, but an expert witness which is either a paid gig or one which payment is waived because of other alignment of interests. Being an expert witness against someone you are in any kind of working relationship with is a clear and obvious conflict on interest.
> If the information can safely be shared with a particular researcher then it seems like there is minimal benefit to society in letting the school system pick and choose who gets access and who doesn't.
So HIPAA-protected data that meets the standards for research sharing should instead be made public? (And if you say, “well, its different, this is the government”—government holds lots of data protected by HIPAA.
The state is not a "someone". The state is in an extremely privileged position legally, and as such is bound by the First Amendment which you and I are not.
Sure, the State is bound by the First Amendment, and there is a fair debate as to whether the clear conflict of interest involved in being an expert witness against the state must be tolerated alongside research data sharing for that reason, either in general (unlikely, IMO) or at least in the specific case where there is no nexus with the shared data (more likely).
The problem is your claim of conflict. Measurement of the government by the people can never be considered a conflict. If the data shows that the government, the CDE, failed to improve outcomes, that is just data. It is the opposite of a conflict. The CDE is required to improve outcomes: suppressing information that it failed to do so is antithetical to that outcome. The CDE needs this information to do its job, regardless of claims by PHBs to the contrary.
What data do you want to see? Most of it exists publicly but is very messy. You can get basic financial information here,[1] but data on student outcomes and school climate is very siloed - if there's a specific school/state you're interested in, I could help you find information.
Even if you're a researcher, good quality data rarely exists. In NYC, which collects more data than any other school district, you're mostly relying on a (publicly available) 100 question survey sent to every student. The survey author must have never talked to a child because the questions are worded like a clinical psychology paper. At low income schools the survey has a 20-30% response rate.[2]
If you discover evidence of wrongdoing, then you are ethically obliged to act on it regardless of any other contract. We have whistle blower laws for this reason.
Even if the wrongdoing is not a criminal matter, if you discover a reason that someone can be sued, then you have an obligation to inform those who could sue and act as a witness for them in court. The only exception to this is if you are the lawyer for the party you discover the data - and then you have an obligation to inform them they can be sued so here is how to fix the problem in good faith (good faith meaning if it is discovered you as a lawyer will argue that when the problem was discovered they fixed it, and thus court should dismiss the problem as an honest mistake that was corrected - the courts should in turn if not dismiss the case at least award minimal damages)
The above needs to take precedence over all contracts.
> The issue here seems to be that the school system is saying that the researchers aren't allowed to be a witness in any lawsuit against the school system regardless of whether it has to do with the data that was shared with the researchers.
While that does seem overbroad, if the restriction were only on cases related to the data shared by the researchers, then for many cases there would need to be a demonstration that it did or didn't relate to the data, and there isn't really a way to do that without disclosing the data.
> It doesn't seem entirely unreasonable that if a school system gives a researcher access to data that isn't shared with the public, the researcher agrees not to use that information to sue the school system. Such agreements would allow the school system to be more free to share information.
That is not what is going on here. The research is being asked to testify against the school system by someone who is suing them.
Wouldn't a more reasonable position be a prohibition on researchers acting as paid expert witnesses in cases against the school system? I can imagine that might disincentivize 'gold-digging' behavior by researchers.
The complete ban on researchers engaging in any litigation seems over-broad, and designed to keep potential litigants from having access to anyone 'in-the-know'.
There might be some information that could be combined with other data in ways that would violate the privacy of students and their families. Obviously discipline records with student names shouldn't be public, but what about records without names where the students name could be found by linking it with other data.
AOL released a bunch of search queries in 2006 with they idea that they were anonymous, but it turned out you could get quite a bit of personal information from them by linking searches together.
Presumably all data about student educational outcomes, which are protected by FERPA. The school system doesn't have the option to just ignore the law and make this information public.
There's a lot more going on here than the initial story reports.
For more than a few academics, making big $$$ as an expert witness is a magnificent source of side income. (Fees of $1,000/hour, including lots of open-ended prep time, can be found.) That begs the question: Did the research lead to the desire to be an expert witness? Or did the desire to be an expert witness define the nature of the research project?
We'd need to know a lot more about the origins of this project before being able to referee this one. But if the state of California is worried about litigants using "researchers" to find and filter data that ordinarily would be available only through legal discovery processes, that's not a crazy worry.
I went to the Apple v. Samsung trial in 2016 or so, and the highest paid expert witness that day was $850. The other two were $450 and $350. Where are you getting this number?
The prep time is included in your hours. The $850 guy said he'd put in 900 hours.
(btw, it IS excruciatingly boring work. But of course, the money.)
I litigated mesothelioma cases and our experts were paid $600-$1,100/hr, depending on the expert. $1,000 is high but not unheard of. What’s really wild is, in addition to prep time, they get paid that from the second they cross the threshold of their front door through when the return; many of our experts were flown in from the middle of the country to Oregon so they sure pocketed hefty sums.
I think there are cost maximizing lawsuits (like mesothelioma) and then lawsuits that aren’t seeking to recover damages. And they pay their expert witnesses very differently.
I also think there are many academics unwilling to serve as expert witnesses for tort lawsuits and they are different from “professional” expert witnesses.
They need to be able to have 5 months where they can clear the calendars and just work on that. It's still a lot for 5 months, but I imagine there's a lot of downtime too. Are they getting 5+ months every year?
Yeah. Having a job seems like it could keep you from regularly being able to stop everything for 5 months of high paid work. Maybe the money is enough from the few months that they're fine with it (and maybe it's easy for them to get a new job after or go back somewhere they've worked before). I'm genuinely curious. It seems like a lot to make for 5 months, but what do their earnings look like over a 5 or 10 year period?
Is that what they make or what they BILL. IT, Admin staff, paralegal, Jr lawyers, building, pro Bono and other marketing activities etc. It's paid for somehow.
The discussion was whether billing over $800/hr was "ridiculous." It's actually common for credentialed professionals who are at the very top of very specific fields.
a friend of my is a full time expert witness. he went to school for an engineering degree and did 1 year of industry work. he now provides expert testimony on technical cases all over the country. they fly him out to nice hotels with a generous per diem. he gets paid very well. they give him the materials to present in court. it’s a very well paying position
My main recollection was that the opening statements from the lead Samsung attorney weren't that charismatic or convincing to me. I was surprised it just wasn't that... Good.
Years later I suspect the strategy of Samsung (which certainly worked, if it was the approach) was to build a good case for appeal, rather than to focus on winning the trial itself. As it turned out, apple won the trial but Samsung won the appeals.
One month of work for $765k. I was expecting one or two orders of magnitude lower payouts for a single expert witness in a single case. Who can afford to pay this?
There aren't even 900 hours in a month. That's 765k for 900 billed hours, and you have to imagine that a good chunk of unbilled hours also occurred. So maybe that's for the equivalent of 8 months of boring work. Not continuous 8 months either, you have to schedule other things between prepping. A lot of money. But not for a billion dollar lawsuit.
As for "who can afford this" is a company worth tens of billions suing over a major product line vs a trillion dollar company.
Once they get the contract. I guarantee that the recruitment process and negotiation process was more involved than a phone call. And there could be work specifically excluded as "billable hours" that is still work. For instance, is the time to fly out compensated?
All work that an expert does for a case is billable, including travel time. However, experts will frequently provide discounted or even unbilled work for individuals in certain circumstances (like criminal cases where the expert is testing in a forensic capacity to counter improper forensic analysis presented by a prosecution expert).
Some experts charge for their time at a reduced rate (e.g. 50%) for travel, some a predetermined amount (taking the risk of delays on themselves), some only for the cost of the tickets, hotels, meals, etc.
There is, AFAIK and based on what I can Google, no universal answer.
It could easily be more than 9 hours. How many people spend longer than that interviewing for a technical position across five rounds? And this is for one of the few experts Samsung will put up to defend a $xxx million suit.
Plus, he works for MIT. He probably needs to clear his consulting work, which could be quick or not. MIT might have wanted a percentage. And if he wanted to use a grad student to assist him in prep work, negotiating that can add up too.
There are other ways to add to the precontract numbers, but that should be enough.
I’d love to hire people that can work 900 hours in a single month. Just tell me where to find them. Or, wait, maybe they work in higher dimensions. Drat.
What you can work and what you can bill are two different things. I know of a few people that charge their rate from the minute they leave on a trip which is basically the min they put down their phone after accepting the contract to the minute they get back. However, they are all doing emergency, the company is losing tens of thousands per hour on the low end until this is fixed, kind of things.
In practice it’s equivalent to charging a higher hourly rate, but it makes billing simpler for these kinds of contracts.
900 hours is a bit more than one month. Even if he only worked 24x7 that's over 5 weeks. Assuming 10 hour days and 5 days a week that's 18 weeks, just shy of 13 weeks if 10 hours a day and 7 days a week.
The main point of the article is that the CDE is preventing those who partner with them from testifying about anything, even what's unrelated to the data CDE provides - 'Viewpoint discrimination'.
> That begs the question: Did the research lead to the desire to be an expert witness? Or did the desire to be an expert witness define the nature of the research project?
I don't think these questions are productive. You can't truly know why someone does what they do. And making the suggestion that the researchers tainted their research because of the money is purely speculative and unfair.
Is there a reason not to take TFA at it's word, which says that the litigation in progress (for which expert testimony was requested) does not relate to the research those experts were conducting through agreements signed with CDE?
The whole problem here is that as a soon as a researcher signs the contract, they are barred from participating in any litigation against the department even if it doesn't involve the private data they were working with. So you have a large population of experts removed from the pool, because all the experts are likely to be involved in some type of research.
It's not a "crazy worry" but defendants in civil suits have all kinds of worries. Regarding impugning Stanford researchers (N.b. no scare quotes) as being motivated by a consulting fee, that's what those fees are for: to get the best possible expert witnesses.
I don't begrudge a good defense attempting to block a litigant's experts, either. However, everyone is better off for expert witnesses being motivated by fees to provide the best expert testimony. If there was something untoward about their motivation, it would be Stanford's problem.
There are a lot of comments on my answer about expert witnesses, so I'll collect it all here:
Martin Rinard is a star. They pay him $850/hour because he testifies well, and he's done it before. He's got the credentials from MIT so juries tend to listen to him. I remember this exchange:
Apple lawyer: So that was a lot of money!
Martin: It was a lot of work.
People seem to be doing some object inheritance from an ancestor post's "one month" but I didn't say that. His work would have gone over many months.
They interview him, and he writes something. Then the lawyers rewrite it. Then they all go over it, line by line. It's excruciatingly boring. I sat in on two days of the review of a different expert witness' 300-page declaration, and they had another day planned after me! They probably have a mock trial, where he practices his testimony (I'm not sure how prevalent that is).
I didn't work on Apple v. Samsung; I was just a spectator.
I don't know what an expert witness would get in this Stanford thing, but it doesn't seem to me like the spending would quite so wild.
A huge part of the civil litigation metagame is just finding ways to legally exempt yourself from being sued. It used to be the case that only sovereign states could declare themselves immune from litigation, but now that power has been delegated to anyone who can convince someone else to sign a binding contract. Which is literally everyone because almost every business relationship requires contracts. And now we're going to "you can't testify against us because you had an NDA" which seems even more abusable.
By $NEAR_FUTURE_YEAR the only people who wind up in civil court will be victims of extortion.
Requiring more education/outcome data to be public would help prevent this. If education researchers are forced to get data from California's Department of Education, there's tacit pressure to find results that make DoE look good.
Years ago, there was a site which had photos of people, and asked you to guess: murderer or software engineer? (I'm going from memory here, so let's not get sidetracked by the details).
In a similar vein, we need a site that lists actions taken by a state government and asks: was this in Ron DeSantis' Florida or California?
There's a distinction between a fact witness and an "expert witness". A private agreement can't prevent a court from subpoenaing a fact witness to testify. "Expert witnesses" are overwhelmingly hired guns paid to come in and voluntarily spin a narrative, and I'm not sure why they shouldn't be able to make that a provision of a contract just like any other commercial arrangement.
Mainly because it is a government agency, and government agency do and should have lots of restrictions on them that are not like "any other commercial arrangement"
One of the biggest things I disagree with republican on is that "government should be run like a business" no... it should not
Only one aspect of a business is efficiency, not all businesses are efficient, and finally your view of government programs is based on right wing propoganda and not facts
>>finally your view of government programs is based on right wing propoganda and not facts
No it is not. That is reality today. Almost no government programs or spending is measured on their results.
I would love for you to prove me wrong, and show me a government program where the resolution for any failure of that program was not "we need more money"
I am a bit confused by the case that is wanting to use the researchers data.
So there was measurable learning loss from remote learning and during the pandemic.
Ok this is known in education.
The state has only relied on individual districts to make up the learning loss.
Ok so that makes sense. There is no magic bullet on fixing the learning loss issue. The state relying on individual districts taking a multi approach to learning loss .. seems reasonable.
I don’t understand the merits of the lawsuit. The state of California is already aware of learning loss and is looking at ways to address.
To be sued because the state of California didn’t do x,y,z by the paintings seems incredible short sided and unrealistic. We are still learning how to best address learning loss from 2020.
If I could wear a tin foil hat for a minute: it could be plausible that CA could fight this to allow it to escalate to the Supreme Court and establish a judicial standard for these types of cases.
I don’t really understand why, of all the CA government institutions, the CDE finds this to be appropriate stance though. An educational office should absolutely be held to a much higher standard than this, and should at its core value openness of information and freedom of speech. The fact that this lawsuit exists at all is an indication of deeply problematic internal values within CDE that are completely misaligned with its mission and governmental function.
Observers say the dispute has the potential to limit who conducts education research in California and what they are able to study because CDE controls the sharing of data that is not available to the public.
All data in Florida from public institutions are public. There would never have been controversy in the first place. But yeah, you're right - the Sunshine laws have nothing to do with testimony.
Has anybody found a link to the contract in question or a quote from the relevant part of it? I'm curious how it seemed ok for the researchers to sign a contract with this provision.
Probably because they didn't have any other choice if they wanted to do the research. Redlining won't get you anywhere, so need to wait for a situation like this to argue the unconstitutionality.
This has everything to do with demographics and science that presents findings contrary to ideology/politics. The same kinds of people pressure police to omit demographics data in police reports.
Interestingly, the article states that the data sharing agreements do not limit what the researchers can publish. They can share results/conclusions critical of the state, which could then serve as a basis for litigation.
What's weird is that they are being prevented from voluntary testimony on cases unrelated to the specific shared data, thus unnecessarily removing many experts from the pool.
It's heartening to see efforts being made to address the isolation that incarcerated individuals often face and promote better communication with their loved ones. Improving access to communication could have significant positive effects on inmates' mental health, family relationships, and potential for rehabilitation. It's essential to continue exploring ways to support and uplift those within the prison system to create a more humane and effective approach to criminal justice.
There's a long history of nearly every major freedom supporter or civil rights supporter being investigated and wrongly imprisoned and even killed across the world. And that's just the famous ones we've heard of. There are an order of magnitude more who were done with well before they became historically famous and no one even knows about them.
This bumper sticker quote doesn't really track in the real world.
So are you meaning to say that California is afraid these Stanford researchers are going to imprison people, wrongly or otherwise? Come on. This isn't the police, these are academic researchers.
The "shallow platitudes" cut through the BS. The government is trying to gag researchers because they want to hide their own failure. The narrative that the government is afraid of PII being revealed during a trial is straight horse shit. The courts themselves will decide what is or isn't appropriate information for a witness to share on the stand.
Furthermore, am I to believe these researchers are trusted not to share student PII when doing their normal academic research, but at soon as they become witnesses against the state that trust is no longer warranted? Bullshit. If protecting PII were the motivation they would not allow researchers to access that PII and publish their findings. What they're actually doing is preventing those researchers from testifying against the state. They're not protecting students, they're protecting the state's interests.
Shallow platitudes are themselves BS - populist soundbites that can be weaponized both for things you like and things you don't. There are better conversations happening elsewhere in this very comments section that don't have to lean on them as a crutch.
The world is too messy for naive morality and honesty. People are too easily swayed by anecdotes or irrelevant facts.
In your "moral" world of brutal honesty: the children of serial killers would never find work, people who were caught cheating on a test in kindergarten would never be allowed in positions of power, and people with non-mainstream interests would be sidelined in favor of those that people more closely identify with.
Is it right to hide things from people who would use that information incorrectly and to society's detriment? I think so, and that's why I believe people should have a right to privacy.
I don't think you'd feel the same if you were the defendant in a lawsuit, even if you had a rock solid case.
You might be completely vindicated, but bankrupted.
Or, perhaps your lawyer is a dud, and fumbled the ball.
Or perhaps the jury were idiots.
Or perhaps the law has some unknown (to you) technicality that you end up hanging for.
Or perhaps during the investigation you honestly misremember something or misspeak and the police / investigators become convinced you're guilty and spend all their time and resources trying to pin it on you. Or maybe they're just lazy, and you end up being an easy target. Don't worry, if you plead guilty you'll avoid a lengthy court battle that you can ill afford, and potential prison time if found guilty (are you that confident in your lawyer, your finances, the jury, and the legal system?). If you plead no-contest, you avoid jail, weeks or months of time off work defending yourself, and just do probation. But wait, I thought you had the Truth on your side?
Why do you keep posting this malarky here? Your account has been active for over a decade but most of your stuff ends up dead because you post crap like this for basically nobody to read.
How can data collected by the government be private? That should all be available to the public since it was gathered with public funds. Has no one issued a freedom of information request?
> Personally, I think an individual’s privacy should take precedence here.
There's no individual's privacy even at stake here. None of the data that's non-public is even material or relevant to the dispute here, beyond that the professors in question signed an agreement to access the data for unrelated matters.
> How can data collected by the government be private? That should all be available to the public since it was gathered with public funds. Has no one issued a freedom of information request?
Agreed. What gives the government the right to reject my FOIA requests for the exact specification and design files for gaseous centrifuges, implosion devices, and nerve gas?
Extreme natsec examples aside, there are a thousand reasons to keep government data private, not the least of which is constituent privacy. Deanonymizing data is far easier than preparing it for release and the data schools keep on students is particularly sensitive (I'm not claiming that that's the case with this data, just making a general observation).
Just about every accepts that it's reasonable for some government collected information to be kept private. FOIA requests exclude "personnel and medical files and similar files the disclosure of which would constitute a clearly unwarranted invasion of personal privacy". https://www.ecfr.gov/current/title-21/chapter-I/subchapter-A...
In this case it was for "student-level data that detail the demographic information and the performance records over time of California’s 5.8 million students but without any names or identifying information. That data is the gold standard for accurate research. A partnership contract details the department’s commitments and researchers’ responsibilities, including strong assurances they will have security protections in place to protect students’ privacy and anonymity."
The thing about this sort of data is, removing PII from the dataset doesn't make it fully or even sufficiently anonymous. If there's only one Pacific Islander student in the Shasta Union High School District then it's easy to figure out who that is by coming it with other public data.
] Statistical organizations have long collected information under a promise of confidentiality that the information provided will be used for statistical purposes, but that the publications will not produce information that can be traced back to a specific individual or establishment. To accomplish this goal, statistical organizations have long suppressed information in their publications. For example, in a table presenting the sales of each business in a town grouped by business category, a cell that has information from only one company might be suppressed, in order to maintain the confidentiality of that company's specific sales.
The clear justification for keeping this information private is that the government won't get sufficiently useful data without this promise. The United States Census Bureau released "confidential" information about draft evaders and Japanese-Americans; if you think they might do that again, perhaps you'll lie about some of the questions.
People who receive this sort of information are required to take special care to maintain the needed level of anonymity.
There's of course no reason why this should be used to muzzle researchers for completely unrelated fields.
IMHO,Making the public pay for records, at high expense in a digital age is how the government limit information. Police arrest\crime data, Court data, Zoning Data, Meeting transcripts, Budget Data, etc, and yes, Education data.
Society shouldnt accept this data should be behind paywalls or accept high costs to access it. Or paper only releases to stop release restrictions for costs and size.
Zoning data and meeting transcripts generally are public? At least in NY that's been my experience.
A lot of the rest I'd rather was private. Although it'd be nice to get aggregated data for certain crimes which currently are tracked at each individual department level and not in any sort of national manner.
That's an unreasonable restriction and I expect the ACLU to win this.