It's pretty hilarious and somewhat frightening I found my dad's arrest 25 years ago for a speeding ticket he had "forgotten" to pay. I remember being 11 years old and having to wait 8 hours for my parents to come back from picking up a pizza.
Data availability is crazy.
the frightening part is although your father's record is available to the public, police officers who are caught lying while testifying get to seal the record.
in many jurisdictions, sealing a record is the equivalent of destroying it.
your crimes will haunt you forever because the system never forgets, meanwhile they simply go back to business like it never happened
I think judges can seal records for anyone. A friend of mine had it done after his conviction several years ago. Sure enough I can't find him in the database. He still notifies potential employers about it though.
Do you think the data should be removed from the government portals? Those are interesting points. What do you think is the right balance to strike?
I can see why it might be surprising to find some results when searching. The same data has already been available in many other databases that have existed long before this one and in those described on the info page as well.
It's on the internet forever now. If there's a balance to strike it would have had to have been done in 2007 when the court digitized their records and put them online.
A search for "minor consuming" reveals a few hundred thousand cases against children. I'm a little surprised to see that.
>Why do taxpayers have to pay expensive court proceedings, and offenders have to spend a lot of money for an attorney, and waste a bunch of time.
They don't have to. The usual process for something like a speeding violation in the US is:
1. You get stopped for speeding (or get caught on a speedcam).
2. You receive the ticket in the mail.
3. At this point, you have an option to agree with it and pay the fine OR appear in court and hope they will rule in your favor (which could easily happen if you genuinely believe they were wrong; and half the time, the cop himself will fail to appear in court anyway, so you get the ticket dismissed if it wasn't anything too wild).
You don't have to appear in court (if you choose to accept the ticket and pay the fine) or waste money on attorneys (if you choose to contest the ticket in court). You can literally play it the same way in the US as you just described, by getting the ticket in your mailbox and paying it off (for something like speeding). That's it. Contesting the ticket in court is just another option available to you.
What happened to the parent commenter who mentioned failing to appear in court, they basically didn't pay the ticket they received (aka ignored it) and didn't show up in court to contest it either. That's pretty much it.
"In Nix v. Hedden, 149 U.S. 304, 37 L. Ed. 745, 13 S. Ct. 881, the question presented was whether tomatoes were to be classed as fruit or vegetables under the tariff act. The court found no particular help from the witnesses called, and decided the point through the use of a dictionary, which was not evidence, but an aid to memory and understanding."
Compelled prostitution of a person under 17, duh. I mean, obviously. By a person named "bitch" in "whoresville, usa". It was all authorized by "PIMPDADDY", as shown there, plain as day.
"John Julio Bitch" there is quite the baller. Released the very same day for a $25 million cash bond. He's also apparently an albino who stands 5'1" tall and weighs 335 pounds.
I feel this is bad news. Some things should be forgotten. In my country your record gets soft wiped after 8 years. With this the employer could just look up your name.
I found my conviction for assault from a bar fight 23 years ago. Since then I quit drinking, went to college, raised a child who is now 20, and turned everything around. It's pretty disheartening to see it can be found by anyone 23 years later. Unfortunately, I'm not surprised in the least that we allow this in the US.
I'm looking into that right now again. It's a pretty tedious process where the Governor has to personally approve the request. Right now there is a Republican governor in that state so they're less likely to approve it. Since background checks can only go back 7 years, 10 in some states, I let it go and decided it wasn't worth it considering that. I thought it was behind me, but this definitely changes things. Thanks for mentioning it.
The general idea has various problems. For example, would newspapers or accounts of things/people, in various media, of objectively public information be required to be retroactively removed from any mention? Does it make sense to force and dictate what entities/individuals can do with basic information at the discretion of anyone who doesn't like it? Just a few thoughts. The records exist in the database because they are public information. If a record is removed from public view, that's done when requested because it's the right thing to do, although there is no strict legal obligation to do so.
It says MySQL 8 and Elasticsearch 7.8. I don't have much experience in elasticsearch, I wanted to know how does elasticsearch makes it faster? Is it like an extension that makes it faster? Or Elasticsearch has its own data store that consumes data from the database and magically makes it faster?
Elasticsearch, Lucene under the hood, implements an inverted index which is an extremely fast data structure for text search. ES has clustering as a primary feature too and many search features that can significantly improve relevance that you won't find in MySQL and most other databases.
Have you tried Toshi[1] or MeiliSearch[2]. I wonder how it would compare in terms of operational costs (monthly cloud hosting bill) at the current data size.
Plus it does not accept joins. So you basically have to denormalize all your data before injecting into Elastic.
It helps speedup things. But is a headache to manage on a day to day basis.
Yeah. What I do is create a view that does all the joins then the middleware just needs to do "SELECT * FROM my_view". If the DB has good JSON support, I will also convert the data into an ES index request with SQL so the middleware becomes even simpler.
Let’s say you have a mostly read-only DB (otherwise things are different).
Does it work when the view is insanely big? [i am not an expert at DBs, so my vision of a big DB might amuse you, but let’s say I have millions of rows to assemble as a view].
The gist is Elasticsearch is a full-index database. Whatever data goes in gets indexed as compared to only indexing certain fields in MySQL on which you perform search frequently. Think of Elasticsearch as MongoDB + full-indexing. It's a document storage with blazing fast search and aggregation.
"PACER notwithstanding, CourtListener is the most powerful case law research tool available online — and in many ways is much more powerful."
This is based on CourtListener's 4 million+ written court opinions, which judyrecords has recently integrated. But you're right, CourtListener has more case law research features.
Interesting point. If you know the state where someone lives, you can look up the same info on the government website. Additionally, many many other databases have same public data but they ask for a payment to search.
Whoa. Searched my name and found my sealed court records from when I was barely a teen (25 years ago). The records even state "sealed/exempt from public" at the top.
This is pretty neat as I have never seen the records, even though I requested them (out if curiosity) 10 years ago, only to be told they had been destroyed years prior.
Does anyone know if the race data is used for crime statistics? I did a quick sample of people I know, and almost every South Asian was miscategorized as Black or White.
The race data in court case records is very often used for crime statistics. It's probably the most analyzed data point after what the incident was about.
No it’s likely not, arrest records are separate from traffic citations and are two different databases. Also, your race may come from the cop filling out the ticket, or it may come fr9m your license in more advanced jurisdictions. The source for crime statistics is usually not court records, those are held by the courts.
Wish we would have known about this years ago. One search would have prevented the hire is someone that ended up costing us a ton of money. Most background searches don’t get local or state court cases like this without major expense that small businesses can’t afford.
Many similar databases and people finder sites are behind a paywall. There are a lot of positives to being able to use public records data available to make more informed decisions, whether it's to let your kid stay at someone's house you don't know or whatever it might be. Thanks.
15KB is maybe the average case size, including HTTP request data.
That's 1024 * 15 * 439,000,000 = 6.7TB roughly.
The cases are all compressed, so I'm not using 6.7TB non-compressed for cases. But there are other request and non-request related records needed too. Just my backups currently.
Being as you're offering use of the site for free, would you be open to the idea of also offering publicly available DB dumps? There's plenty of fun projects that I can imagine doing if I had that data locally.
I've downgraded from that. I talked about that in that post. It was most definitely a knee-jerk reation to getting slashdotted on a popular subreddit and not wanting that to happen again. However, still on some very good hardware and handling current workload pretty well right now. That estimate was high.
I understand the open court argument, we need to see what goes on so nothing funny happens there. But unless we're talking about a major crime, what good does it do to list and index on Google everything from 30 years ago?
If our society decides it is necessary to act with the full weight of the law behind it, then it would seem better to have the information available for the public to verify than not. I'm not saying it is all great, but that it is far better to have information available so that things like average sentence length for a given crime based on demographic and psychographic information can be queried by all. If a city that is 50/50 male/female and 20/80 black/non-black finds their speeding tickets are 70/30 male female and 35/65 black/non-black, then it may be worth investigating to see if police are being fair who they give warnings to, who gets reduces tickets, and who gets neither.
As for major privacy concerns, it is generally the more major crimes that have the larger issue with the victim being known. Knowing that some one was the victim of mischief vandalism is far less a privacy invasion than knowing they were the victim of sexual assault of a child (and even hiding the victim's identity often doesn't do more than hide the name from a passive search).
Then there are the benefits that other posters have raised, such as being useful for knowing past decisions used even in minor trials.
The general privacy issue that most jurisdictions have decided they just don't care that much about is that easy, indexed, free access to public records is different from the case where that same information is in a dusty file cabinet somewhere. There are a lot of things that people are, in principle, OK with being a matter of public record but are maybe less OK with their neighbor being able to casually discover it through Google.
Totally agree. I'd be all for open court records, requested in person, received in paper form against a small processing fee.
I do have a different cultural background so it's probably natural this feel horrible. Everything about this site would be so illegal in my home country it's almost hilarious in comparison. I'm used to (and fully approve of) a law that you can't keep a list of names in a notebook without a proper reason and everyone's consent, that would already be an illegal register.
If you look at the info page there is a specific example about how to look up codes of cases that had the same charge.
Being able to see how other offenders are sentenced is useful to make sure people are being treated fairly. Lawyers use this kind of data up to the point of producing analytics from data like that to understand outcomes. Major legal data companies have a large segment of business doing analytics for lawyers handling high and lower level cases.
Worse, there's no obvious business model or disclosed funding source or institutional affiliation here.
That leaves me with the distinct impression that they're monetizing data about visitors and searches in some horrible way. (Data targeting for mugshot shakedown operations?)
Only 3 pages are indexed on Google. Actually, most of the other legal databases (listed on info page) have their cases indexed on Google. However, judyrecords cases aren't indexed on Google. I understand your general sentiment.
On a whim, I decided to search for "quicksort", and found a judgment where a loan company was trying to sue for infringement on the grounds that a competitor copied the SQL schema of their product. The complaint was upheld.
"The Court finds that New Century had access to the SQL Data [pg. 536] Structures and that there is enough probative similarity to find that New Century factually copied the SQL Data Structures."
The next question might be to have 'Positive Software' demonstrate that they did not, in fact, take their table schemas from some place else. Like... textbooks? Or... example database schemas from vendors. Or tutorial sites? Or competing products?
There may be something extremely unique about part of their structure, perhaps, but... at the same time, there's often very little variety in how most similar data (crm/sales/lead gen/etc) might be stored to be remotely usable for reportin anyway.
"misappropriation of confidential information". Without seeing the structures in question it may be hard to say, but typically 'confidential info' is qualified with "not elsewhere available"-style clauses.
"... Likewise, the Court finds that there are more than one or a few ways to organize the data structures required for programs such as LoanTrack and LoanForce..."
Yeah, but usually there's only one good way to do stuff. Yes I could just have one row with 940 columns - technically, I could make my program work with that - but it's extremely suboptimal - regardless of whether I've seen anyone else's table structures or not.
This is a proximity search, to ensure it's actually turning up one of the various permutations of the name (as different court protocols may refer by surname first), rather than documents that just happen to contain each of the terms somewhere.
For fairness, "hillary rodham clinton"~4 turns up 193 cases.
I've mentioned other legal databases on the info page. It's public information. judyrecords is the largest free database of court cases, but there are many other free/not free ones as well.
In my state you can get some kind of understanding of whats going on, but it's so legalese vague that half the time you only know if someone got a speeding ticket, underage, or divorced.
Weapons of math destruction. This would be one of them. The data here is emvarassing for individuals and it can be looked up for decades in history.
I know this was always public but this makes it too easy for masses to dig through the troves.
Scares me. Next thing I see is some AR glasses that do facial recognition and correlate name -> public records. Could be a nasty blackmail tactic. Some things are close to Black mirror in reality.
Sounds like that would be an easy use case for elasticsearch indeed. I've seen it handle much bigger data sets. Solr would work as well. There are probably a few other options on the market but elasticsearch would probably do pretty well on this even without a lot of tuning.
For reference, I once threw the entirity of open streetmaps at it before it even hit 1.0 to implement a simple reveres geocoding thing. Basically a couple hundred million street segments, some polygons, etc. At the time the geospatial support wasn't great and very new and very CPU intensive. I got away with indexing all of that and running it on a single node cluster with a xeon and 32G of RAM and spinning disk (RAID 1, no SSD). It worked great. Very responsive. Indexing only took about 50 minutes or so. Most of that was my parsing logic. That's not comparable of course, I'd expect this to be faster on the same hardware with a current version of Elasticsearch. They've made a lot of leaps with improving performance, memory usage, cpu usage, disk usage, robustness, etc. in the 7 major versions since then.
From other comment:
CourtListener has about 4 million opinions, which are included. On top of that, 435 million additional cases from throughout the US.
When I type a query and press search, would like it if the URL updated with the search in the query string. It would make it easier to share specific queries.
Where are they getting public domain opinions that CL doesn't have? Are these states or counties that CL doesn't scrape? It would be nice to have a breakdown by jurisdiction.
Also, by "case" do you mean "opinions"?
Full disclosure, I've written and contributed to several scrapers for CL, and if there's a large source they're missing I'd like to know.
Note that the CL opinion number you're quoting doesn't include orders from Federal courts that are in the RECAP collection, which accounts for several million additional opinions.
Congratulations on the launch. I have worked in open source and public record research for the last 15 years, and your coverage is extremely impressive.
Do you have any long term plan for the site? I can see this going in a lot of different directions depending on your goals.
Thanks, as far as I know it's the largest database of court cases on the Internet. If there's enough traffic I'll support the site with ads. Don't have any other specific plans currently.
I run a similar free site and was looking at add ads. Google Adsense rejected it for not complying with their program policies. My data is on large US federal bankruptcies, so I really couldn’t pin point why but just a heads up that it might be more difficult.
All the data is from government databases directly, aside from CourtListener, which was recently integrated. It would be good to specifically mention CourtListener's contribution.
It is all be public records. The source of the original data is the court system. If a 3rd party physically scrapped it from the court system, others should be able to digitally scrape it.
From what I understand, he had some kind of academic library access for PACER and used that to bypass what others would be changed for. There are lawsuits against PACER charging fees for what's public information generated by taxpayer money. He ended up being charged with various crimes related to maybe computer fraud and eventually committed suicide. A very sad story.
That may be the reality but if the court or due process ordered something expunged from a record it should be updated in all records and the details not present.
Should just do a search for expunged or similar terms and remove those entries.
Well, no. Are there names of minors in this database? I thought the US had a mechanism to prevent that, or at least to petition to have records of minors removed or anonymized.
Yes. The mechanisms are shit. Many of these cases are juvenile cases with a note saying the case is sealed, along with full details of the charge, name, and outcome.
Edit: wow, plus family court stuff like a four year custody dispute, kids being adopted, etc
"The US" has 39,044 distinct local governments and municipalities and they all do their procedural nuances differently and to varying efficacy and different points in time! :D
I don't know what culture you come from, but in the US and UK and similarly influenced cultures justice being seen to be done and recorded is a pretty important principle and mechanism against overreach of the state.
Super cool--and very fast! Anyone looking to collaborate on these can easily add Kontxt (https://www.kontxt.io) right on to them and have localized discussions directly on page-parts.
I used React client-side, Node server-side, and MySQL as the db. I only mentioned Kontxt here because I demoed it for Thomson Reuters because it could be helpful for their legal professionals as a collaboration tool after they find documents via their WestLaw legal search product, and your tool reminded me of it. I actually used Kontxt as a sales pitch to highlight their annual report and add some calculations and explanations about how much money they could make. Nice work, again!