Does anyone want to talk about the hack itself? Can anyone give more details than "left their database open"? I came to this site hoping for a real discussion about that and didn't see it here yet...
IMO, they're worse than that. You can teach an intern things, correct their mistakes, help them become better and your investment will lead to them performing better.
LLMs are an eternal intern that can only repeat what it's gleaned from some articles it skimmed last year or whatever. If your expected response isn't in its corpus, or isn't in it frequently enough, and it can't just regurgitate an amalgamation of the top N articles you'd find on Google anyway, tough luck.
LLMs are to interns what house cats are to babies. They seem more self sufficient at first, but soon the toddler grows up and you're stuck with an animal who will forever need you to scoop its poops.
Without a mechanism to detect output from LLMs, we’re essentially facing an eternal model collapse with each new ingestion of information from academic journals, to blogs, to art. [1][2]
> You can teach an intern things, correct their mistakes, help them become better and your investment will lead to them performing better.
You can't do the same way you do with a human developer, but you can do a somewhat effective form of it through things like .cursorrules files and the like.
The difference is that today's digital natives regard computers as magic and most don't know what's really happening when their framework du jour spits out some "unreadable" text.
So much this, I was interning at a government entity at 20 and I already knew you needed credentials to do shit. Most frameworks have this by default for free, we're so incredibly screwed with these folks running rampant and destroying the government.
It's definitely both. A bunch of 20 year olds were let loose to be "super efficient." So, to be efficient they use LLMs to implement what should be a major government oversight webpage. Even after the fix the list is a few half-baked partial document excerpts with a few sentences saying, "look how great we are!" It's embarrassing.
> At least my experience is that ChatGPT goes super hard on security, heavily promoting the use of best practices.
Not my experience at all. Every LLM produces lots of trivial SQLI/XSS/other-injection vulnerabilities. Worse they seem to completely authorization business logic, error handling, and logging even when prompted to do so.
Just checked the DOGE website; I'm not too sure about this theory given that POST requests are blocked and the only APIs you can find (ie. /api/offices) only supports GET requests and if the UUID doesn't match, it 404s.
I don't see any CRUD endpoints for modifying the database
Put a CMS behind a well-configured CDN and it's essentially a static site generator. If you have cache invalidation figured out, you get all the speed and scalability benefits of a static site without ever having to regenerate your content.
I’m guessing it didn’t have much in front of it because the management endpoints were accessible from the public Internet. I think you mentioning the “well configured CDN” is key here. If there was a CDN in front of it, it wasn’t well configured.
BTW, I spent a lot of my career configuring load balancing, caches, proxies, sharding, and CDNs for Plone (a CMS that’s popular with governments) websites.
Yeah sorry, I didn't mean to imply these folks have any clue what they're doing. I misread your comment as "it's been a while since I saw a CMS-based site, big sites are all static now" instead of "it's been a while since I saw a CMS rawdogging it."
I'm not too sure about this theory; just went on the DOGE site and the API endpoints don't allow for POST requests, and I can't find anything that allows me to upload
I mean the article is paywalled but it sounds like this is isolated to their site-displayed twitter feed; basically the site was hosted by cloudflare and you could insert your own fake tweets into what was recorded on the site (but not on the actual DOGE twitter feed). I don't think any data was actually compromised
I can't speak to any data that may or may not be compromised, but this isn't about inserting fake tweets. Anything in their "government org chart" can be edited unauthenticated.
Yeah, it's just tremendously embarrassing. These are supposed to be the tech geniuses who can parse 50 years of accumulated legacy code and find all the government waste? In 3 weeks?
I'm not yet sure whether they are even doing data science.
Anecdote time (pinch of salt required):
A relative of mine studying accounting went to the Doge site to see the "audit" and "analytics" records that some acquaintance arguing with her said "see the doge site!" for the proof.
What she found when visiting the site was no "audit" at all, but instead a word count of how often objectional terms appear in legislation or government sites. (DEI? Trans? LGBTQ?).
Being in the analytics/data engineering space myself, I was pretty amused to hear that was the quality of "analytics" being done.
Wasn't "word count" the "hello world" example for Hadoop big data back in 2013?
Some of the "data science" people I've met certainly believed that they could architect entire software systems just because they understood how to structure data in databases.