Meta-Transformer: A unified framework for multimodal learning

kristjank · on July 24, 2023

Yo dawg, we heard you like transformers so we put transformers on your transformers so you can train while you train. The spider web graph shows metatransformers performing worse to their counterparts in almost all fields. Is there a reason I should not believe that an expert model will always outperform a general purpose one, even if it's a metatransformer?

danielbln · on July 24, 2023

I mean, there is a somewhat unique value proposition of a multimodal framework like this meta transfirmer. Its goal isn't necessarily to beat expert models in their own game, but to provide a unified framework for processing diverse modalities of data.

I think it aims to leverage the cross-modal relationships and unified learning, which might not be possible with expert models designed for only a single modality.

Even if it performs slightly worse on some tasks, the ability to handle multiple modalities within a single framework is an pretty sweet advantage in scenarios where data from various sources need to be processed simultaneously, and patterns across modalities need to be captured somehow.

A general-purpose model could also be a more cost-effective solution in some cases, ensemble experts are difficult to scale and parallelize.

AndrewKemendo · on July 24, 2023

>an expert model will always outperform a general purpose one, even if it's a metatransformer

It's an interesting question as it begs questions of conceptual "boundaries."

The sense-plan-do process requires a search and filter process for task switching, assuming an agent can do more than one thing.

So assuming you have a robotic/autonomous agent that is a collection of systems (locomotion, dexterous gripper, visual perception, etc...), if each system could be represented as an "expert module", say for example the dexterous manipulator, then so long as a discriminator can appropriately switch states using the sensor/system inputs, then it's conceptually possible that there is a canonical "expert module" that everyone uses and therefore "general purpose" would apply to the agent as a whole while expert model would apply to the dexterous manipulator.

You can walk that reasoning up the abstraction layers then to conclude that (as usual with these turtle stacks) the distinctions come as each sub system/module specializes more granularly for the environment they operate in.

I think that it's probably forever and always true that any system designed to explore/exploit a bounded environment with comprehensive observations, will always outperform a system that is required to adapt it's sense-plan-do components to the bounded environment without similar observations.

A system would either have to generate different observations than the native agent, or change the boundaries of the environment in a way that is unavailable to the native agent in order to outperform it.

throwawayadvsec · on July 24, 2023

I'm pretty sure it's a relatively small model?

If you had the same quantity of text data as GPT-4 + comparable quantity of data for other domains, it could probably learn transferable skills across those domains.

But it would take a huge amount of processing power that is probably not attainable today

sebzim4500 · on July 24, 2023

>Is there a reason I should not believe that an expert model will always outperform a general purpose one, even if it's a metatransformer?

If a general purpose model beats the specialized one, you could almost certainly distill the general purpose one into a better specialized one.

bick_nyers · on July 24, 2023

Yo dawg, we just need to figure out what x converges to as you apply transformer() infinite times and then finally attention will no longer be all you need:

transformer(transformer(transformer( ... x ... ))) = ?

nh23423fefe · on July 24, 2023

performance is bounded and so outperformance will approach episilon?

orwin · on July 24, 2023

Yeah, that's where I thought it would go shortly after I tried GPT-4 from openAI. We're clearly at the transformer limits imho (comparing the effectiveness between 3.5 and 4, and the number of parameter in each model is why I think we reached a soft cap).

So since it'll be hard to go deeper, going broader by interlacing different model types might be a way to pierce through.

whimsicalism · on July 24, 2023

> We're clearly at the transformer limits imho

GPT-4 did not scale up substantially in depth, going from 175 b to 220 b per transformer.

CSMastermind · on July 24, 2023

Wouldn't making the model multimodal require scaling the models significantly?

Or is the idea to keep the network the same size and trade off some of its nodes for image, video, etc. data?

If so has anyone shown that doing so results in better overall performance?

My lay-observation is that GPT-4 seems to be on the border of usability for most applications so if nothing is gained by simply changing the input data type as opposed to expanding the model then it feels like it won't be of much use yet.

Also apologies if I'm not making sense, I'm almost certainly not using to correct technical terms to articulate what I'm thinking.

whimsicalism · on July 24, 2023

> Wouldn't making the model multimodal require scaling the models significantly?

Just width if that makes sense. Basically, you add another encoder model but you are not actually increasing the depth that much.

ccheney · on July 24, 2023

We need to start ingesting raw scientific data through these models and see what it comes up with. What could these models identify by parsing through raw JWST or Hubble data? Or training against every published scientific paper? Is anyone doing this sort of thing already?

danielbln · on July 24, 2023

Meta's Galactica was an attempt to train an LLM predominantly on scientific papers, articles and so on. It failed pretty spectacularly but Galactica 2, if that's ever a things, might rectify that.

RC_ITR · on July 24, 2023

GP likely means training transformers on raw data (similar to protein folding transformers) to find patterns that humans cannot (due to lack of context, bias, or whatever).

Problem with the assumption though is that transformers are good at identifying and replicating patterns given a set of rules (i.e. how proteins fold and misfold depending on the environment).

Hubble data isn’t so much “we know the rules but not their interactions” as much as “we don’t really know the full set of rules,” so that particular example probably wouldn’t be that fruitful.

In general, biology (where we understand the basic rules but not the complex ways they are combined) is the most fertile ground for transformer driven research.

FrustratedMonky · on July 24, 2023

Just few more steps like this, put it in a robot body, and Voilà , we have start of the first AI wars. How many centuries after this does the Butlerian Jihad start, lead by John Conner, of course?

Oras · on July 24, 2023

According to the website, the model can then fine-tuned for certain tasks such as image classification.

1. How does the multi-model help here in improving the accuracy of image classification when training is combined from text, images, and audio?

2. How about the speed? I would imagine a model with text, audio and image data would be larger compared to text-only models?

ImHereToVote · on July 24, 2023

This seems like a step in the dangerous direction.

valine · on July 24, 2023

It’ll be ok. The technology for “dangerous” AI doesn’t actually exist. The near term risks we face from AI are constrained to the realms of spam and privacy. World ending super-bots are science fiction.

flangola7 · on July 24, 2023

Blind denial. No argument or evidence presented, merely bold statements made with the expectation they be taken without question.

Flying humans was science fiction 120 years ago. A single bomb able to destroy an entire city was science fiction 80 years ago. A machine that can complete more mathematical calculations in one minute than all human manual computation in history was science fiction 60 years ago. EUV photolithography capable of creating molecule-sized transistors was science fiction 30 years ago. A computer that can create visual art and talk to you in plain English was science fiction 2 years ago. A computer that can clone your voice and mannerisms was science fiction 1 year ago.

Science fiction has a way of becoming non-fiction, often within the span of a generation or less.

naasking · on July 24, 2023

> It’ll be ok. The technology for “dangerous” AI doesn’t actually exist.

Nobody's worried about the tech that exists.

> The near term risks we face from AI are constrained to the realms of spam and privacy.

Define "near term".

> World ending super-bots are science fiction.

Science fiction has become science fact before. Where's the knockdown argument that won't happen in this case?

valine · on July 24, 2023

Its not feasible to worry about the implications of every imaginary technology. Nuclear chain reactions were first theorized to exist a decade before the first bomb dropped. Should scientists have stopped exploring quantum mechanics in the 30s? Fear of the unknown shouldn’t be allowed to stop scientific progress.

We can deal with the implications of dangerous AI if and when it becomes a problem.

piva00 · on July 25, 2023

> We can deal with the implications of dangerous AI if and when it becomes a problem.

What makes you assume that? We haven't been able yet to deal with the repercussions of globalised social media, we don't even completely understand its impacts. Or dealt with the impact of climate change.

AI seems like a much more encompassing and transformative technology than social media, what makes you assume we will be able to deal with its problems in a timely fashion when they inevitably occur? We might as well not be able to, and as usual, unintended consequences will follow.

> Fear of the unknown shouldn’t be allowed to stop scientific progress.

Scientific progress at any cost while being irresponsible about major consequences of it shouldn't be allowed either, it needs to be a balancing act, just pushing forward without even assessing the risks is a stupid game.

naasking · on July 25, 2023

> Nuclear chain reactions were first theorized to exist a decade before the first bomb dropped.

Yes, and they were very worried about igniting the entire atmosphere in a chain reaction when they built the first nuclear bomb, and they only proceeded when they showed that that was very unlikely.

> We can deal with the implications of dangerous AI if and when it becomes a problem.

Can we though? What calculation is this based on?

valine · on July 25, 2023

> Can we though? What calculation is this based on?

What calculations is your conjecture that AI will spiral out of control based on? You're not allowed to cite Hollywood movies.

The science behind the nuclear bomb was well understood before any engineering activities began. We have no framework to discuss AGI because it doesn't exist. Your entire premise is based on the idea that we could accidentally create evil AGI without first developing a theory of intelligence. Maybe Newton should have been locked up before he developed his theory of gravity. The man set us on a path toward nuclear weapons, thank god he didn't accidentally build a nuke.

naasking · on July 25, 2023

> What calculations is your conjecture that AI will spiral out of control based on?

Not will, but a very plausible outcome:

1. An intelligence greater than human intelligence can outthink humans.

2. Artificial intelligence is effectively alien intelligence, and will not innately share human values or thought processes, and could thus be very unpredictable.

3. Artificial intelligence will not have the same physical constraints that humans do (innumerable copies, lack of physical boundaries), and so our ordinary intuitions around containment will not necessarily work.

4. The usefulness of AI means it will be deployed everywhere, controlling and monitoring many things. Combined with the above properties, it will be very difficult to contain, detect, subvert, or eliminate.

5. Millions of billions of AIs will be created, many of which will eventually match or exceed human intelligence. Alignment has to go right every single time to mitigate risk to humans. It has to fail only once.

You know, the completely obvious properties that anyone who knows anything about computers could come up with if they bothered to give this matter some actual thought without the usual arrogant assumption of human superiority and mastery over dumb machines.

> We have no framework to discuss AGI because it doesn't exist.

Which also means we have no framework from which to build safe AIs that don't want to kill us, experiment on us, or exploit us, or ...

> Your entire premise is based on the idea that we could accidentally create evil AGI without first developing a theory of intelligence.

We tamed fire before understanding chemistry. We invented catapults and crossbows before understanding elasticity and kinematics. We domesticated animals and created agriculture before understanding genetics. We created bridges and buildings before understanding civil engineering. We even created computers (Babbage machine) before understanding computation (Turing machines, lambda calculus). There's a long history of inventions preceding understanding.

For all we know we're one simple modification to the transformer architecture away from truly general artificial intelligence. We don't know what we don't know, and all of the anti-doomers are blatantly overconfident about what we don't know.

Also, my conclusions do not even require that that we lack a theory of intelligence. Software bugs will also apply to alignment code. Alignment has to go right every time, per above. I don't think people have a proper appreciation of the long list of hazards here.

flangola7 · on July 25, 2023

It is absolutely feasible. The zealot religion of seeking scientific "progress" at all costs is not the only choice humanity has.

FrustratedMonky · on July 24, 2023

>> "The technology for “dangerous” AI doesn’t actually exist"

What? Did you not see the Netflix documentary on AI for military use? They literally have AI's that can beat fighter pilots in dog fighting.

Just because it isn't walking around having coffee and chatting you up, doesn't mean it isn't already very advanced and deadly.

valine · on July 24, 2023

Dog fighting AI isn’t going to end the world. When people talk about the “risks” associated with AI they’re talking about an AI that spirals out of control and destroys civilization. Something something infinite paper clip optimizer.

It’s scifi themed end-times cosplay.

FrustratedMonky · on July 24, 2023

I get that.

But the post was just saying it seems 'dangerous'. It is already 'dangerous'.

Yes, it will probably become even 'more dangerous'.

I'd disagree that many people agree on common definitions of risk. Some people think autonomous drones that can beat humans in a dogfight is already too far, others are holding out for some paper clip optimizers before getting worried.

You included 'world ending' as the definition of risk, others have lower bar than that.

danielbln · on July 24, 2023

Before superintelligence scifi stuff we'll probably get some sort of superworm. Some rogue autonomous agent network that is improving itself via some framework like SKILL[1] going around 0-day'ing systems left and right and wreaking havoc.

[1] https://arxiv.org/abs/2010.11944

naasking · on July 24, 2023

WormGPT already exists. These will only become more dangerous as the tech evolves.

sebzim4500 · on July 24, 2023

I am also concerned about existential threats from AI, but part of the problem is that I have no idea which research directions help and which ones hurt.

ImHereToVote · on Aug 3, 2023

AI safety is a field of its own.

https://80000hours.org/career-reviews/ai-safety-researcher/

faktory · on July 24, 2023

FrustratedMonky · on July 24, 2023

Because up till now many people that discount AI threats base that discount on a few assumptions like 'its just a parrot', 'it doesn't have any drives', 'it doesn't really understand', 'it isn't conscious', etc... ad-Infinium.

But the more different technology is plugged together to start resembling a brain, like a visual cortex, a speech center, motor controls, etc...

At some point the distinction between carbon based life and silicon becomes meaningless vanishes. All the arguments or proofs that humans are conscious would equally prove AI is conscious. Or that neither truly are. Proving an AI is not conscious would also prove humans aren't.

And of course, Terminators.