There’s honestly so much interesting stuff here, esp. the llm-related things - large concept models (operating on and predicting concepts, not tokens), dynamic byte latent transformers (byte-level alternative to standard tokenization), sparse memory layers (successfully scaling key-value memory layers without an increase in computational requirements).
Here they are presented as separate things, each of which apparently improves quality / efficiency. I wonder what the quality / efficiency increase is of all those methods put together? Maybe that’s what Llama 4 will be?
This looks like a lot of innovation is happening at Meta in those areas, really cool!
I hope that Llama 4 or 5 will have a different architecture. All released llamas are +/- same inference with a better training pipeline. The downside is that llamacpp will probably not be able to run new models and maybe it will be too much big rewrite, so we will need new c,cpp,go,rust programs.
I'd put a table of contents-like page up front with some exciting short description of each section and use hyperlinks, allowing the user to navigate to the section and back
This is so cool! Playing around with the first demo is a lot of fun. First one to get the model to moonwalk wins. My best attempt was probably something like `(body_speed_forward < -0.3) * (head_height > 1.0) * (stay_still > 0.2) * (body_speed_vertical < 0.1) * (stay_upright > 0.9)`
Then the "Meta Explore Theory of Mind" is even more interesting. There was a thread about a month ago in which some of us were discussing some of the concepts here like "beliefs" and updating a model of the world accordingly. https://news.ycombinator.com/item?id=42035985
I really hope Dynamic Byte Latent Transformers work out. Death to tokenizers!
Interesting that it's a a hierarchical structure but only two levels of hierarchy. Stacking more levels seems like an obvious direction for further research.
Author here :), I do think it’s a good direction to look into! That said, aside from it being a bit too much to do at once, you’d also have to be careful about how you distributed your FLOP budget across the hierarchy. With two levels, you can make one level (bytes/local encoder) FLOP efficient and the other (patches/global encoder) FLOP intensive. You’d also need to find a way to group patches into larger units. But ya, there are many directions to go from here!
In a way I'm kinda sad that if tokenizers will go the way of the dinosaurs as asking someone to give me a Unicode character from the private use area was one of the last ways you could actually distinguish a co-operative human from an LLM online
They simply don't have those characters tokenized, so they can't output them. (But this is technically moot if the LLM has a python interpreter handy)
When I wonder about the business behind Meta doing this, I see they have $70B in cash, so giving a bunch of AI experts hundreds of millions is pocket change.
Imagine that something fundamental shifts in the world of AI research. It could be anything: AI suddenly makes programmers much more productive, AI becomes very good at identifying vulnerabilities, AI chat becomes a new major source of entertainment, AI images become an item popularly shared on Instagram (etc)
Suppose any one of these things happened and suddenly Facebook wished that it had access to state of the art models so that it could customize them for its uses (internal developers or tools, embedding in their app).
Imagine how they would feel if the only way they could access these models were by signing 7-9 figure deals with a model dealer like OpenAI. Even worse, imagine if one of their main competitors in advertising started providing robust AI tools to help advertisers adapt their creatives to various form factors. Facebook is now way behind and possibly has to shell out millions to a company like OpenAI all while also losing ad market share worth billions per quarter (ads on Google start performing much better, so Google gets more ad spend)
If this worst case scenario came to pass, Facebook would look foolish. If even one of these things were likely their investments make sense. The rest (open source, make meta a cool place to work) are a strategy credit.
“Commoditize you complement” may be a good way of framing it. Consider that if OpenAI succeeds dramatically and is the only game in town, they could extract huge rents for anyone using their service. So it’s in other companies interests (or anyone who wants to use AI) that the AI ecosystem have lots of competition to keep prices low.
everyone that has responded so far has it wrong (naively so).
FB sells ad space on several apps. those apps needs people on them in order for the ad space to be worth anything. people, in turn, need content to attract them to the apps. so it's simple: enable people/companies/whomever to generate tons of content for cheap and consequently share it on the apps. that's it.
Couldn't the same argument be made for all kinds of things companies have made open? Some examples:
• Tesla gave away its EV patents.
• Pixar and DreamWorks have both open-sourced some of their tools, including tools used to make some of their best works. For example DreamWorks' MoonRay renderer has been used on everything they have done since "How to Train Your Dragon: The Hidden World", including "Puss in Boots: The Last Wish" and "The Wild Robot", and will be used on their upcoming films.
Yes, it can. But my reply is to the person I directly responded to that claimed these tools are for meta product benefit, but ignored that same argument applies to competitors.
A better answer is meta releases them for some combination of they see it benefitting the business and/or a desire to provide broad benefits to everyone. They certainly expend tremendous resources to create these models. No other company has provided this much value to such a large base of users in this space.
this is like saying that AMD making chips that intel/nvidia employees can buy and use to do their jobs is a bad strategy for AMD. lol. ok not every single strategic choice needs to both grow the top line and be anti-competitive. some can just grow the top line.
That’s a fun idea. I’ve always wondered about experimenting with u-nets and hourglass nets for text data since they’re so efficient at capturing global and local context (in vision, anyway). But I’ve never tried it.
It lets those providing AI video generation services watermark all of their videos. So it isn't intended to by voluntary. You would be left with those services that don't comply with whatever the current Big Tech rules are, like people who used Grok/X.ai to generate images in support of Trump despite Grok/X.ai being inferior. https://arstechnica.com/information-technology/2024/08/musks...
Think this the wrong / older article - when I click the link, this is twitter's hosted Flux model making pictures of Kamala and Trump flying into the world trade center and Trump on a surfboard with busty cat girls. The X.ai one launched this week
How much does it take to train a model at this point? I’d tend to expect that it’ll be in range of any major state or most oligarchs in the next couple years (if it isn’t already). So, making it is probably best of everybody understands the watermarking to be voluntary. Images and videos aren’t worth the bits they are printed in at this point, as evidence of anything in particular.
Crazy stuff. Everyone’s covering how exciting all these are (especially LCM and the non-tokenizing-tokenizer), but I have to ask in case anyone’s been paying attention: why are they using the term “advanced machine intelligence”?
My initial thought is that they want to please/distract the doomers, but I’m prolly just self-centered!
It originates in Yann LeCunn’s paper from 2022 [1], the term AMI being district from AGI. However, the A has changed over the past few years from autonomous to advanced and even augmented, depending on context
I would guess it’s in response to the recent market studies showing that the general public views anything labeled “AI” as a likely scam and untrustworthy.
Even though Meta doesn't sell I/PaaS, Meta's fitness goes up when AI is in the hands of more players than just Google and OpenAI. Commoditize AI and you create a diverse set of businesses that will reach customers through Meta's platforms.
It's not a hype when it's delivers and I'm also not seeing a ceiling yet
Yet again interesting progress.
Also I like the idea of using the pose model to generate not a NPC but a avatar living in my phone or glas cube as a hologram. That would be quite scifi futuristic
Meta is a very large organization, and I'm willing to believe that a good chunk of Meta FAIR (the lab releasing all of this stuff) truly do care about innovations for advancing AI safety and are doing great work along these lines. I'm not disagreeing with your point about the company being led by its financial incentives as a unit, but let's also allow ourselves permission to celebrate this work by this group of people.
It is a shame that this is flagged for being denigrating or negative. The better comment could be to ask where is the documentation for safety? How do we define it? Where are the disclosures about failures, negative results, etc? Perhaps these things are unanswerable, but raising awareness of them is important.
Meta's "Video Seal": Because nothing says "trustworthy" like a digital chastity belt. Imperceptible, they claim, yet robust enough to survive the gauntlet of internet mangling - sounds like the perfect tool to invisibly track content, not just watermark it.
I think it's reasonable to assume that any large social media company is already tracking video similarity in reuploads/edits. The remix and reused audio features are already baked in. Reverse image search screen caps of tiktok/reel pretty often return the source/original
I want to have a way to detect if content is AI generated. You might want to run that model on your own creations to ensure you get the credit for them and that no one can steal them.
Like all tools it can be used for good and evil. It could be installed directly in cameras to sign videos. And people with the power to turn it off could make AI fake videos that much more believable.
I would make the argument that these AI safety initiatives yield messaging that muddles and confuses the public on the simple fact that they should not, under any circumstances, use a video or image as proof or assume its veracity. When I tell someone this it is common for them to come back with something like "aren't they working on things to detect if a video is fake?" I think this idea, that video content can still be trusted and that {COMPANY} is being responsible is the real goal of the money pumped into these watermarking techniques. These techniques will not actually help people, images and video will continue to be used for disinformation. The only thing that can stymie that is a broad cultural shift to default to distrust of photographs and video footage, to treat it all like you might a painting or animated cartoon depicting an event; maybe an accurate portrayal, but just as easily totally fabricated. The responsible thing for companies to do would be to spread messaging indicative of this fact, but they would rather engage in safety theater and score some points while keeping users dumb and easily fooled.
"they should not, under any circumstances, use a video or image as proof or assume its veracity"
This is just silly. Courts never assume the validity of evidence. It is actually assumed to be invalid unless it can be proved to have not been tampered with. Photos have been able to be edited for over 100 years but they are still used as evidence. The person who took the photo will sign an affidavit and or testify in court that it is real. And AI videos are going to be easily detectable for a long time.
I'm talking about your average person, not the court system. I'm asserting that culturally we need to shift to acknowledging that photos are not proof, rather than pretending that some fancy counter-model or watermarking will somehow allow us to maintain an already-misplaced trust in the veracity of images.
Here they are presented as separate things, each of which apparently improves quality / efficiency. I wonder what the quality / efficiency increase is of all those methods put together? Maybe that’s what Llama 4 will be?
This looks like a lot of innovation is happening at Meta in those areas, really cool!