"ChatGPT, he says, doesn’t have the ability to describe the feelings seen on a player’s face when they win the World Cup,"
Oh piffle. My wife and I roll our eyes and mock the sports reporters when they run down to the field and ask the question "How does it feel to have won the $CHAMPIONSHIP?" "Oh man, it's... it's just so incredible, you know, you work all your life to reach this height and... here I am... it's just... sniffle it's really amazing. I have a really great team and I'd like to say HI MOM and... just.. oh man..." "We'll let you go join the celebration. There you have it, $OTHER_ANNOUNCER."
(Our favorite is: "Hey, $COACH, what do you plan to do to win this game?" "Well, you know, that's a good question, and we've got a plan we're going to execute to, you know, go out there and play strong, we're gonna try to, you know, score points and, uh, prevent the other team from scoring points. We're pretty confident that if we can execute that plan we're going to win." In fairness, sometimes the answer is better than that. But a lot of the times, if you think about what they just said, that's what it amounts to without any particular loss. After all, even if they have a good answer, they're not terribly interested in giving it to you before the game....)
I mean, I get it, I'm not so cold hearted as to not understand why they ask the question. It just doesn't happen to speak to us. But I would rate it more on the easy-mode side of sports writing, way easier than, say, being accurate about statistics or being correct about what strategy a team needs to pursue against another team for victory or what strategy they're going to pursue.
Extremely, extremely stereotyped responses. Can't hardly be wrong, it's not like saying "excitedly" instead of "energetically" is going to be the difference between accurate and inaccurate. Easy mode for a transformer architecture.
Reporter: "How does it feel to have won the Super Bowl?"
Star quarterback:
Completion:
"It feels amazing. We worked so hard all season and to finally have won the championship is truly a dream come true. I'm so proud of my teammates and coaches for their dedication and hard work. It's been a long journey and I'm so grateful for this victory."
Heh, the only reason GPT-3 doesn't sound like my quote is no news reporter would be so crass as to directly transcribe a player like that. Which means this is correct for GPT-3 as well, in this context.
(I was in a Zoom chat yesterday with the transcription on and I noticed it was doing the same thing. It was aware of the "ums" and "uhs" and sometimes those phonemes would be in the original transcription, resulting in the wrong words/phrases, but once the transcription stabilized they are edited out for the most part.)
You might enjoy “How Tracy Austin Broke my Heart”[0] by David Foster Wallace. He reads her (young tennis phenom) autobiography and it’s filled with this type of language.
He wonders if, in order to be a top tier athlete, they almost have to think this way. Simple, repeatable mantras drilled into their conscious and subconscious. Too much thinking would throw them off.
They obviously didn’t ask chatgpt to act like a sports announcer. And, given the expressions on the players faces, as you point out, is completely predictable and has been described a million times with a high rate of consistency, I would expect it could completely and reliably describe the feelings seen on a players face when they win the World Cup.
Funny observation! Soccer is even worse than this (in England at least, Italian players for example can seem very philosophical but it might be a side-effect of getting it through translation), with the added twist that you have to get your tenses all wrong for some reason.
"So I was sat there waiting for the cross, and it comes over me head and I just hits it and bang it's in the back of the net"
I think ChatGPT could easily replace commentators too. Of a missed shot - "He'll want to have got that on target!" - Oh he will? Such insight!
> "Hey, $COACH, what do you plan to do to win this game?"
"Well, after analyzing hundreds of hours of video of the opposing team, we've pretty much decoded their play-calling signals. We're also going to monitor them remotely and communicate with our players on the field to give ourselves the edge we need. We're also planning on deflating the ball a bit to make it easier when we have possession. Also we've stolen our opponents' play sheets, so we hope to make use of them as best we can. And we're hoping that our opponents' headsets might 'mysteriously' stop working.
And of course as always our players are going to bring their A game to the field to really win this!"
Crash: "You're gonna have to learn your clichés. You're gonna have to study them, you're gonna have to know them. They're your friends. Write this down: We gotta play it one day at a time. ...."
Nuke a.k.a. Meat: "That's pretty boring."
Crash: "Of course it's boring, that's the point. Write it down."
When I explain to people what I love about baseball, it’s that I know what is supposed to happen. It’s when what isn’t supposed to happen happens that makes it exciting. It’s the same with these questions. We know the question but it’s the response s, in particular from younger players (like collegiate athletes), that make the questions still worth asking
and journalistically relevant, even if sometimes they come off as rehearsed.
There’s genuine loss of value going on if we fall into the trap of language-generated SEO - but it’s more subtle than just ‘the internet will be full of crap’
The thing is that webpages don’t ‘exist’ in the way most people mentally kind of think of them - as ‘documents on the file system of some computer’. And google is not an index over all those ‘documents’.
How search actually works is Google bots go out to a bunch of servers and ask them ‘hey, what kind of documents can you make?’, and the servers respond with dynamically generated lists of links to those ‘documents’, and google it requests them, and the server makes them and sends them to google, which indexes them so that later when someone searches the internet, google can send them to the same link, and the server can make the document again for them.
In an internet that works like this, using an LLM to generate documents that google indexes is a waste of everyone’s time.
This turns google into an index of the output of various LLM prompts that someone has run that they think someone might later search for.
This wastes everybody’s time.
An LLM outputted summary of every earnings report of every company doesn’t need to be pregenerated - we can make that later, at the time the customer asks the query ‘what was in the earnings report of Foocorp in Q4 2013?’
This breaks the ‘document-centric’ conceit of search; and that might not be a bad thing.
It's even worse than that.
Sites like MarketWatch auto generates pages for each ticker at the end of trading day to tell you if a stock went up or down and in comparison to broader market.
Just take a look at a random stock in apple stocks, the news section is just filled with this garbage.
> Google requests a set of documents that turn out to be LLM generated
> Other prompts using Google data make more content for Google to crawl and request
> Lather, rinse, repeat
AI contamination is already starting. There's no way that content written by an LLM hasn't already been used to train an LLM.
I’m not talking about LLM feedback loops (that is a problem but a different one. Honestly LLMs slurping up low effort human produced SEO docs is just as bad.)
This issue is more about the value of search finding ‘documents’ when ‘documents’ are thin layers of low-value-add LLM cruft over raw factual data.
The facts are what you’re searching for; search indexing has always involved trying to figure out the underlying informational content of a document; LLM prettification is just adding obfuscation over that data that search indexers then have to try to remove.
There has to be a more efficient way for a website to say ‘we have all the earnings data for every US public company’ without them having to dress that up in LLM prettification that google’s language model can index.
conversely, that which is digital and on the net, is indexable, while so much of intelligent information, factual content and also human communications.. is not digital and not on the net.
in the early oughts, I said to anyone that would listen that the powerful popularity searches shown by google will be sort of self-referential given a simple definition of what can be indexed and what cannot..
Interesting point, but there's a lot of use cases where chat bots still need some novel data, like your "earnings report of Foocorp in Q4 2013" example. So maybe we end up with [smaller] sites of raw data for bots to digest when someone wants a readable form of it.
Frankly that might be better then everything being wrapped up in prose.
I'd say roughly 70% of the articles Google pushes to my Android's 'news feed' is clearly AI generated garbage designed to maximize SEO. It's absurd. This has been going on for more than a year.
It's usually pretty obvious because you'll read something and some form of uncanny valley starts to creep in where you're like "no sane person writes like this."
If old media plays their cards right, the proliferation of AI-generated content may prove a blessing in disguise for them. If they take firm stances against publishing AI-generated content, then paying old media for the news may become the best way to reliably get real content written by humans.
Yeah I've had similar thoughts about this. The areas where AI can't really compete (probably ever) is in reporting / photo journalism.
We'll always need someone to be on the scene to capture the events / investigate the stories, and I don't think the world is dystopian enough to start manufacturing AI news drones.
Efforts will be put in place to filter out the garbage AI content with human made, real world reporting.
It already essentially is. Free news is almost always going to be absolute garbage, although paid news is frequently intelligent but designed to confirm their target audience’s priors
I don't want to live in a world where I'm tricked into reading things that were not directly created by other humans; I don't want to consume AI-generated or AI-infused content. Day by day, we're getting siloed by technology alienating us from the essential things that make us humans, like having meaningful interactions with others.
I feel the same way, but I was struggling to understand why. Could you explain why you would not want to read content generated by AI? If you’re reading an article from CNET, it’s probably for the information value and so therefore should be no different from that perspective. Are you looking for opinions or other color as well that the AI won’t be able to provide?
-- for me it feel like the final nail in the coffin - in this area - i think - the race to the bottom removed ability of human to say what they want - be provocative - inspiring - forward thinking - now even the people who are supposed to thoughtfully give word view - journalist + reporters - seem to be at: what a camp wants to hear - not what they want to say - feel like AI generated is maybe the final straw in this? - hard to explain but feels bad --
The problem with consuming AI generated content for the human reader in 2023, is that the content is more likely to be in the uncanny valley and requires the reader to exercise extreme vigilance to triple-check priors, facts and conclusions. There is no human author that can be held accountable for lies or misrepresentations of facts. There is only the language model immune to cancellation on Twitter.
When the content on the other end is written by a sapient human (or, eventually, an AGI), as much vigilance is not needed. Vigilance is always necessary, but the level required for parsing the output of language models is much higher.
This requirement is why publishers like CNET quietly mislead their audience and do not clearly mark each submission as AI generated. If it were not an abomination and an abuse of the reader to the gain of the publisher, then they would proudly claim to be doing it.
It depends. Purely factual things, like reporting on the markets and a majority of objective news could be AI-assisted. Opinion pieces and reporting on subjective material (or trying to shape narratives), that's a whole different game and a topic that society is not ready to even begin tackling yet.
There is no interaction to be had in the consumption of most material online, and any perceived interaction is surface-level as best.
Factual reporting always has background, though. The bottom 1/2 to 2/3 of the article will be descriptive context, like what $SUBJECT did in a previous similar situation. I don't think the current AIs are up to the task of figuring out which accessory bits of information are most pertinent, and in what order.
Several years ago I realized that the answer to "why" for me was that I found it entertaining to read a lot of "news". When I examined this however I decided that for me it was largely a waste of time.
The fact is a lot of what web sites (and print journalism before it) publish has never been more than entertainment thinly disguised as something important.
I still scan the news sites every day to look for things that might actually be important information that I feel I should be aware, but otherwise I read a LOT less "news" than I used to.
The interesting thing about this moment in AI, it seems, it’s not how good the AIs are, but how bad the lower end of human work and ability is relative to the AI.
In this case, how much SEO-farm rubbish content has been put out by salary-slave-humans and is it really better than being tricked into reading AI content?
Two evils I know. But maybe there are deeper issues than the met existence of decent non-general-AIs …?
We've been reading articles edited by grammarly for many years on various websites all without realizing that we're consuming "AI-edited" content. I doubt AI-generated content will feel much different.
There is an awful lot of content published on the internet that is highly predictable, which is to say, ripe to be generated by large language models. But that stuff is already so low bandwidth in new or useful information that I wonder "who cares" if it's now not only predictable, but literally predicted? It was already there just to generate page one search results, and it does, and it will continue to do so.
Which raises the interesting question of how self-referential and useless search results go, if they are primarily great big generative, predictive models feeding huge indexing models for the purpose of predictively generating the right additional content intended to snag an actual human neuron (that is, to select advertisements)? Once the whole thing is computers talking to computers hoping some gullible living person is watching, it becomes indistinguishable from bitcoin mining - a kind of viral, self-referential internet onanism.
In a way its ad-money mining via SEO/engineered content to get high page views/CPM. The issue with this comparison to crypto mining is that the surface area of low-quality content prevailing means their ad campaigns are likely going to be less effective because humans reading the low-quality content just aren't going to stay long enough and/or will avoid the content. I can imagine the ad companies shifting to other types of content like video, then the AI generated content doesn't bring ad-revenue.
I agree, the comparison is inexact, and I expect that advertisers will have to change strategies. My use of the word "indistinguishable" overstates the comparison. I should have said "not entirely unlike."
Oh I wasn't being pedantic about wordings. You are right on target that this is a computer automated revenue stream. Crypto miners have had to change strategies too I assume, buying bigger iron or whatever.
Let me guess, they sound exactly like the first page of google results on any term before ChatGPT. "Helpful" articles that all tell you the same generic useless stuff.
Except now they're not on 3000 sites, they're on 3000 sites plus CNET.
Is this ChatGPT written or minimum wage copywriter written? Does it matter? It doesn't help me in the least and it's the first result for "crucial bx500 vs mx500".
And of course we must ask 'cui bono?' for this atrocity and the answer soon comes with the button for the free download of the minitool disk clone tool. Prime SEO work.
Yep. And it was a genuine search, I'm messing with an old macbook white that only has (finnicky) sata, i've been told the bx and mx will definitely work with it and I was trying to decide which one to get. I did find my answer, but at the end of the first search results page on a ... reddit thread. This was duckduckgo btw.
Edit: seriously, doesn't it sound exactly like ChatGPT articles?
CNET has been publishing garbage content for a while now. Look no further than their 'reviews.' I work with web hosting reviews and look at what they disclaim “While we didn't test the services, we did carefully examine each service's offerings and ranked them according to essential web hosting features.” (source: https://archive.ph/UtlsB )
Reviewing web hosts based on comparing feature checklists? Oh dear.
For those who aren't aware, feature checklists for web hosting services are 90% bullshit. Not that they aren't true, but the items that get checked off are often trivial nonsense which are implicitly available from any commodity hosting provider like "control panel", "access logs", or "password protected directories".
The real differentiators are usually things which don't show up in marketing materials, like "is the support team halfway competent" or "how badly does this provider try to nickel and dime you". Which is why you need to actually work with the provider to give them a meaningful review.
After doing web hosting reviews for over a decade now... 99.9% of the people do any form of review are lying or manipulating things for affiliate commissions. The few that aren't, most don't actually know what reviewing/benchmarking/testing well looks like.
It's atrocious. And Google just pumps this garbage. I need to finish writing about all bullshit from big brands doing these reviews. CNET is just one easy example.
I think the problem is that recently, the quality of pages linked from search results has increased. There seem to be a lot of people out there who are algorithmically generating product reviews and comparisons, especially.
In years past, I think it was much easier to discern such low-value content. But lately I find that it's not immediately obvious, and I waste time in drawing that conclusion and moving on.
So my search strategy, at least for the kinds of things most susceptible, is to shift from "open-ended with exceptions" to "only trust sites whose names I recognize".
Sad to see that I should consider moving CNet from the good column to the bad.
> I think the problem is that recently, the quality of pages linked from search results has increased. There seem to be a lot of people out there who are algorithmically generating product reviews and comparisons, especially.
> In years past, I think it was much easier to discern such low-value content. But lately I find that it's not immediately obvious, and I waste time in drawing that conclusion and moving on.
I know what you mean, but I wouldn't describe that as an increase in "quality."
> So my search strategy, at least for the kinds of things most susceptible, is to shift from "open-ended with exceptions" to "only trust sites whose names I recognize".
IMHO, one interesting thing "AI" might do, is finally kill the open, free-to-access Web, returning things to something like the 90s, where if you wanted to know something, you had to buy a newspaper, magazine, or book published by an institution. It would do it by filling the web with low value stuff that's too hard to detect.
DALL-E/Stable Diffusion are already killing the appeal of certain styles of fantastic art, due to overexposure and mediocrity.
I look forward to the day when Wikipedia is overrun with "AI"-generated vandalism that's difficult to detect, but designed to corrupt it according to various agendas. They could even include generated citations to documents that sound like they could support the vandalistic claims (because few actually go through the trouble of verifying those).
Well, if your goal is information does the provenance being a machine matter? Quality here I assume means more detailed and better written with more nuance. For most of my knowledge seeking searches I don’t care much who wrote it and more whether I gain the knowledge I seek
I’ve been using it for a while and while I get bad information at times it’s generally pretty decent and much easier to get through traditional IR methods, especially now that they’re just advertising and SEO wastelands. I find combining the output of chatgpt with some searches yields a much better and faster result in learning for me. I fully anticipate that we will see similar systems tied to IR and semantic reasoning systems soon - the problems are too obvious but so are the solutions, and it’s not like we don’t have amazing tech in the areas chatgpt fails in that can be adapted to constrain and inform the output.
> Since then, the news site has published 73 AI-generated articles, but the outlet says on its website that a team of editors is involved in the content “from ideation to publication. Ensuring that the information we publish and the recommendations we make are accurate, credible, and helpful to you is a defining responsibility for what we do.”
This is the thing that actually scares me, because I don't believe them when they say they have editors scrutinizing everything the AI writes. They may give it a glance, but I would bet any amount of money that they don't rigorously check everything. And as time goes on, they're going to spend fewer and fewer resources on editorial, I guarantee it. My basis for saying this is that editorial has already been deeply cut — forget rewrites, forget fact checking, it feels like half these articles have outright typos that never get caught. I'm not sure anyone but the author closely reads an article before it goes into publication, and if the author is an AI...
They aren't going to staff up on editorial if AI can generate content, they will just spend $0 on writers, and as little as they can get away with on editors, approaching $0.
We're entering a strange time in the history of written media. Most articles will be written by the AI, and the AI will be trained by the articles that was written by AI. I wouldn't be surprised to see a badge similar to "Organic" to be created for any written content, to certify that it was written by a human.
The AP has been doing this for years. Play by play for sports teams and summaries of earnings reports have been "ai-written" and shared out with nary a peep.
-- there is this guy on youtube - simon whistler - has a bunch of channels - he basically reads wikipedia articles in an exciting british accent - some of the paid text to speech stuff getting so good - been thinking - should just text to speech wiki articles with some tweaking so they read like a script - or just copy paste a wiki article to GTP3 and ask it to rewrite it as a script - coupled with video diffusion - upload them and see what happens - but the whole thing feels very gross to me - so not done it --
Simon Whistler is a parasitic knob. I own and operate a website where I have been researching and writing original non-fiction content for almost 20 years, and Simon used to do paid voice work for me, narrating my writings for podcasts and audio books. Everything seemed to be on the up-and-up, but one day I discovered that he had quietly started making videos for a site that was systematically borderline plagiarizing my writings. Then he started multiple other YouTube channels where he covered the same topics he had recorded for my site, often with striking similarities.
To date there are well over 100 of his videos that appear to be "inspired" by my content. For example:
The list goes on and on. I've been writing online long enough that I've been poached and plagiarized six ways from Sunday, but no one else has come close to leeching so long and systematically as this unoriginal, abject freeloader. Simon Whistler is downright seaward.
It’s funny to me that there is such a strong emotional response to progress in AI (e.g., ChatGPT). It really seems like it stems from a strong biological desire to maintain the feeling that “humans are special” even if no one will state that outright. Banning generated art or text comes across as a last ditch hurdle on the way to the inevitable. I think it’s better just to embrace this head-on and address any problems early before they become a bigger problem in the future (bias, factual correctness, etc.) This stuff really has the potential to improve society; let’s not unnecessarily cripple it out of some misplaced sense of human uniqueness.
I'd suggest having a bit more respect for the people you disagree with.
People are raising quite specific concerns about this technology and its potential impacts, and are doing so in a world that has been coping with the unintended consequences of new technologies since the industrial revolution. You may perfectly validly disagree with their concerns, but to ascribe them to an emotional need for humans to be "special" doesn't contribute to the debate. It's just incredibly condescending.
You could do worse than to avoid making arguments of the form "your points are so wrong you can only possibly be making them out of spite/jealousy/insecurity/other" in all debates on principle.
Maybe that is a reason for some people's negative reactions, but on the other hand there seems to be a lot of unwarranted enthusiasm from another crowd, who seem to think that Friend Computer will soon usher in an utopian age.
What currently exists is far from a human-level AGI, and it's not likely that just making a few tweaks would even get closer to that goal. The model doesn't in any sense understand what it is writing, it generates text that is optimized for appearing as if there is intelligence behind it, as long as you don't read too closely.
And while we don't have to fear a godlike superintelligence turning us into paperclips yet, there are a lot of potential negative consequences to this kind of limited AI, and very little of value to society. We don't need more efficient ways to generate bullshit or propaganda.
This has nothing to do with biology vs computer. It's simply the fear to lose your place; getting replaced by the cheaper poorer solution even. It could be AI, it could be the younger worker fresh from college, or someone from another country. Those who have to lose, will fear.
I would be more surprised if journalism isn't some form of computer generated garbage checked for a minimum level of sanity by an editor who couldn't care less.
I believe that writing on the internet is about to become more public and personal if you want people to follow and trust you. While we have been able to get away with being able to write semi anonymously and still be successful there will now always be a question about whether the words were generated by an AI.
It's kind of like the deal with the More Plates More Dates YouTube Channel, and the Matt Does Fitness guy. Matt has an amazing physique, but because of that people thought he was on steroids. He paid MPMD to randomly drug test him over the course of six months like WADA (world anti-doping agency) to prove he was natural. He had to do this because steroids exist in the world.
Same thing with writing. Using perfect grammar and a more formal sentence structure will probably cause people to think you are a robot. But writing about your life, your hobby projects and what you are doing with them will show people that you are a real person
Personally I love ChatGPT and think this should be implemented in schools in the future for writing courses. Many times people have concepts they wish to express but are not so proficient in what is the cultural “prestige” writing. I think it is much more important to creatively combine concepts and apply them, compared to the nitty-gritty of fine writing.
Also, I feel the rise of Chat GPT will result in more value for verified human-written or human-supervised content. I will much prefer hearing an organic, person’s personal experience in another country, for example, as a verification for things that actually happened, compared to a fictitious AI generated story. Now combine a verified human + AI assistance, you have a reliable narrator and good content.
The hardest hit will probably be to the anonymous writers and commenters, who cannot provide verification of being actual sources, and will be washed in the storm of AI content.
> Many times people have concepts they wish to express but are not so proficient in what is the cultural “prestige” writing.
Many times people are bad at writing, I agree. I see what you mean, but if you have a concept and cannot express it, do you really "have" a concept. It is like knowing mathematics and being unable to solve any exercises.
CNET is unrecognizable to me these days. Practically all of CNET's great editors have long since left the company. It's just a once trusted brand slapped on top of ad-optimized time-relevant trash. Prime fodder for the "More articles like this" junk recommendations the likes of which ZergNet and OutBrain serve up.
Oh piffle. My wife and I roll our eyes and mock the sports reporters when they run down to the field and ask the question "How does it feel to have won the $CHAMPIONSHIP?" "Oh man, it's... it's just so incredible, you know, you work all your life to reach this height and... here I am... it's just... sniffle it's really amazing. I have a really great team and I'd like to say HI MOM and... just.. oh man..." "We'll let you go join the celebration. There you have it, $OTHER_ANNOUNCER."
(Our favorite is: "Hey, $COACH, what do you plan to do to win this game?" "Well, you know, that's a good question, and we've got a plan we're going to execute to, you know, go out there and play strong, we're gonna try to, you know, score points and, uh, prevent the other team from scoring points. We're pretty confident that if we can execute that plan we're going to win." In fairness, sometimes the answer is better than that. But a lot of the times, if you think about what they just said, that's what it amounts to without any particular loss. After all, even if they have a good answer, they're not terribly interested in giving it to you before the game....)
I mean, I get it, I'm not so cold hearted as to not understand why they ask the question. It just doesn't happen to speak to us. But I would rate it more on the easy-mode side of sports writing, way easier than, say, being accurate about statistics or being correct about what strategy a team needs to pursue against another team for victory or what strategy they're going to pursue.
Extremely, extremely stereotyped responses. Can't hardly be wrong, it's not like saying "excitedly" instead of "energetically" is going to be the difference between accurate and inaccurate. Easy mode for a transformer architecture.