Remaking old computer graphics with AI image generation

Teslazar · on Jan 2, 2023

Is there a way to have current AI tools maintain consistency when generating multiple images of a specific creature or object? For example, if there are images of 'Dr. Venom' they need to look similar, or if there are images of the same space ship.

ftufek · on Jan 2, 2023

Yes, right now you have 3 options:

- dreambooth, ~15-20 minutes finetuning but generally generates high quality and diverse outputs if trained properly,

- textual inversion, you essentially find a new "word" in the embedding space that describes the object/person, this can generate good results, but generally less effective than dreambooth,

- LORA finetuning[1], similar to dreambooth, but you're essentially finetuning the weight deltas to achieve the look, faster than dreambooth, much smaller output.

1: https://github.com/cloneofsimo/lora

wokwokwok · on Jan 2, 2023

> Is there a way to have current AI tools maintain consistency when generating multiple images of a specific creature or object?

...but, all of these can't maintain consistency.

All they can do is generate the same 'concept'. For example, 'pictures of batman' will always generate pictures that are recognizably batman.

However, good luck generating comic cells; there is nothing (that I'm aware of) that will let you generate consistency across images; every cell will have a subtly different batman, with a different background, different props, different lighting, etc.

The image-to-image (and depth-to-image) pipelines will let you generate structurally consistent outputs (eg. here is a bed, here is a building), but they will still be completely distinct in detail, and lack consistency.

This is why all animations using this tech have that 'hand drawn jitter' to them, because it's basically not possible (currently) to say: "an image of batman in a new pose, but that is like this previous frame".

So... to the OP's question:

Recognizable outputs? Yes sure, you've already been able to generate 'a picture of a dog'.

New outputs? Yeah! You can train it for something like 'a picture of 'Renata Glasc the Chem-Baroness' now.

Consistency across outputs? No, not really. Not at all.

ftufek · on Jan 2, 2023

From my experience playing around with dreambooth in the last few weeks generating images of a specific person or pet (not just a generic concept), it surprisingly works really well. But you have to make sure to feed it enough pictures, make sure to label the images properly, use smaller learning rate, use prior preservation loss and make sure to not overfit, etc.

For the animation stuff where you need frame to frame consistency, the new diffusion based video models show that it's possible [1][2]. These are not open source yet as far I know, but it's highly likely that we'll get them within a few months.

1: https://arxiv.org/pdf/2212.11565.pdf

2: https://imagen.research.google/video/paper.pdf

wokwokwok · on Jan 2, 2023

> generating images of a specific person or pet (not just a generic concept)

There's no difference between those things. It's a specific label that directs the diffusion model. It doesn't matter if your label is 'dog' or 'betty' (ie. my personal dog). Anyway...

> it's highly likely that we'll get them within a few months.

Yep! It's not a technical limitation of the technology for sure; but the OP asked:

> Is there a way to have current AI tools ...

...and right now you can't do it with the current AI tools that are publicly available.

bitforger · on Jan 2, 2023

I think you can get the effect you're looking for by using the previous cell as an init image and only repainting the character.

As for consistency of character details, I think that will depend on how many images you use to train dreambooth etc. and how varied those images are.[1]

[1]: https://www.youtube.com/watch?v=W4Mcuh38wyM

genewitch · on Jan 2, 2023

consistency isn't really that difficult with more or less static images. I haven't tried to do "same outfit many poses" yet, because i don't really know what poses are called, and there's no guarantee that the humans that trained/tagged the input images knew, either. I've been messing around with "batch img2img" and i sort of like the jank; i am wondering if a more aggressive CLIP would help at all, but i think it boils down to there really isn't enough detailed tagging to make this worth messing with too much.

what i mean is, assuming this technology moves forward, and GPUs continue increasing VRAM as they have, and enough people are interested in doing extremely detailed tagging with small shapes, the sorts of issues you're talking about will go away over time. Or, alternatively, someone or a group could develop a way to scan hundreds of outputs and collate them according to similarity, allowing a human to use batches that are similar enough to do something like short comics or whatever. As it stands, when i do txt2img or img2img i will run off 20-40 images. I'm also wondering how much seed fiddling could be done - when i first got "Anything v3.0" every image was some person sitting at a dining table near a window with food in front of them, dozens in a row. I have no idea how it happened, but there was enough global cohesion between images i thought it was trained on just that for the first hour or so.

Each of the below images is a set of 4 images (i think generally called a grid in SD), so each image is a set of 4 "2 panel comic strips" - they aren't really intended to flow between the grid squares, but you'll notice that the clothing, hairstyles, etc between strips matches, even if they don't match between individual images. My personal favorite - and the one i used for something online, is the top left set in the first .png https://i.imgur.com/BWek3YI.png https://i.imgur.com/LHchsj5.png

P.S. if anyone knows what the source art could possibly be, let me know?

mmahemoff · on Jan 2, 2023

This is the next frontier for AI art as it will let you build a series, graphic novel, or even video with consistent objects. There’s techniques like textual inversion that let you associate a label with an object, but they rely on having multiple images of that object already, so it won’t work for an image you just generated. To get around that, some people have tried using tools to generate multiple images of a synthetic object, eg Deep Nostalgia that can animate a static portrait photo.

So in theory you select one photo with the AI image generator, create variants of it with separate image tools, then build a fine-tuned model based on some cherry-picked variants.

I think this will get easier as AI image tools focus more on depth and 3D modelling.

The “aiactors” subreddit has some interesting experiments along these lines.

michaelbuckbee · on Jan 2, 2023

Check out this video by Corridor Crew -> they're able to use Stable Diffusion to consistently transfer the style of an animated film (Spiderverse) onto real world shots.

https://youtu.be/QBWVHCYZ_Zs

astrange · on Jan 2, 2023

The concept of “similar” is AI-complete (ie, only you knows what seems acceptably similar to you), so basically, no.

You can force a model to generate nearly the same actual pixels with DreamBooth, which can be interesting for putting people’s faces in a picture, but otherwise I’d call it overfitting.

enchiridion · on Jan 2, 2023

Is AI-complete an actual complexity class? Genuinely curious, I’ve never heard of it.

FartyMcFarter · on Jan 2, 2023

I think it's just an informal term for things that seem to require human-level AI.

enchiridion · on Jan 2, 2023

Ah, don’t care for it in that case. Seems like it’s cashing in on the formality associated with algorithms research.

astrange · on Jan 2, 2023

I think you’re insulting all of philosophy there.

But there is a paper about it: https://www.aaai.org/Papers/Symposia/Spring/2007/SS-07-05/SS...

washadjeffmad · on Jan 3, 2023

Two parameters in stable diffusion webui are denoising (similarity to source) and cfg scale (adherence to prompt). img2img does as it sounds, and inpainting allows masked modifications to a base image with a great deal of control and variability.

I'd recommend giving it a shot if you have an Nvidia GPU with ≥4GB VRAM.

Edit: There are also training and hypernetworks, but they require a body of source material, keywording, and significantly more time and compute resources, so I haven't attempted either.

CoffeePython · on Jan 2, 2023

Seems like there is some way. There's a startup[1] that I've been seeing around on twitter[2] which makes it easy to create in-game assets that are style-consistent. Haven't tried it yet but it looks promising!

[1] - https://www.scenario.gg/

[2] - https://twitter.com/Beekzor/status/1608862875862589441?s=20

djur · on Jan 2, 2023

Textual inversion can kind of do this, but I haven't been impressed by examples I've seen. It seems more suited to "Shrek as a lawnmower" than "Shrek reading a book".

pkdpic · on Jan 2, 2023

Hugging face has everything you need to get started with stable diffusion textual inversion training here. It's awesome to get it running but as others have said it has limitations if you're trying to get multiple images for a narrative made etc.

https://huggingface.co/docs/diffusers/training/text_inversio...

lilgreenland · on Jan 2, 2023

midJourney lets you upload a reference image. For a portrait of someone's face this produces consistently the same person.

Kerbonut · on Jan 2, 2023

I would recommend looking into depth map of the source material then generating off of the resulting depth map. That will keep the structure the same so things don’t pop in and out. Then the suggestions of dreambooth or textual inversion to get the colors etc right.

tniemi · on Jan 2, 2023

You can teach the AI a new item/character. https://www.youtube.com/watch?v=W4Mcuh38wyM

smusamashah · on Jan 2, 2023

There is a way which someone recently discovered to work great on reddit.

In automatic1111 UI you can alternate between prompts e.g. "Closeup portrait of (elon musk | Jeff bezos | bill gates)". Final image will be a face that look like all three. See this https://i.redd.it/8uq52mnausu91.png

Now do the same with two people but invert the gender. The female version of what I gave example of won't look like anything you know about. And it will remain consistent.

It kind of works.

theCrowing · on Jan 2, 2023

You can work with embeddings.

kache_ · on Jan 2, 2023

Yeah, look up textual inversion

hnbad · on Jan 2, 2023

This is a great example of AI image generation being unable to generate "art" and instead just replicating a naive approximation of what it was trained on. There's no coherence or consistency between the images and while they all look "shiny", they also look incredibly dull and generic.

It's a cool exercise but using this for a real-world project would eliminate any attempt at producing an artistic "voice". AI image generation excels only at generating stock art and placeholder content.

washadjeffmad · on Jan 3, 2023

That's more an issue with prompts and training. For as functional is it is, the established models are all very preliminary, and we're in a Cambrian explosion of sorts wrt tooling.

Also, one model doesn't speak for them all. I have the problem that my results are often too consistent with certain models, largely because of prompt complexity, lack of wildcarding, etc.

siraben · on Jan 2, 2023

I was trying this recently with the Sierra Christmas Card from 1986![0] The images that I generated were[1], and I was trying to tweak the model parameters with different denoising and CFG scales. When you get the parameters just right you can preserve the composition of the input image very well while still adding a lot of detail. This isn't a completely automatic process though, with Stable Diffusion you have to provide the right prompt otherwise the generation process isn't guided correctly, so approach works better for aesthetics and style transfer than regular image super-resolution such as ESRGAN.

[0] https://archive.org/details/sierra-christmas-card-1986

[1] https://i.imgur.com/WxD05gX.jpeg

Agentlien · on Jan 2, 2023

If you're using stable diffusion 2.0 or later you can use its depth-to-image mode[0] to create variations of an image which respect its composition without having to keep your parameters within a narrow range.

[0] https://github.com/Stability-AI/stablediffusion#image-modifi...

siraben · on Jan 2, 2023

I still struggle with SD 2.0 prompting because many of the tricks (greg, artstation) don't work anymore, have people had success with it or do I have to use custom models?

speedgoose · on Jan 2, 2023

You need other tricks and put a bit more thoughts in the negative prompt. The quality is then higher but you really need a good prompt. It’s a bit frustrating compared to midjourney or dall.e.

You should take a look at embeddings too. They are tiny files, no more than 128kB, that have a huge influence on the final output. You put the files in the embeddings folder and use the filename in your prompt. Ideally the filename is a unique word so it doesn’t interfere with the normal prompt logic.

You can find the best embeddings in the stable diffusion discord.

Agentlien · on Jan 3, 2023

> The quality is then higher but you really need a good prompt.

I've seen this stated but it has not been my experience. Nor have I seen solid examples of it. For 2.0 (haven't tried 2.1) I find the model very finicky and unstable. Any prompt which works well also seems to work at least as well in 1.5.

LudwigNagasena · on Jan 2, 2023

That’s just yet another exercise in prompt engineering. I expected something more interesting.

carrolldunham · on Jan 2, 2023

The title excited me - maybe someone succeeded making new art that looks like the old pre-renders, maybe a convincing imitation of the scanline render look. Instead it was a vapid article about tossing pixel art into img-2-img and getting some tenuously related junk.

Jiro · on Jan 2, 2023

He didn't even do that. He used text descriptions; the original pixel art wasn't involved.

layer8 · on Jan 2, 2023

I guess that explains why the results were only vaguely related to the original pictures.

nineteen999 · on Jan 2, 2023

I think the new images look jarring next to the original 8-bit Konami style font at the bottom, the look clashes in my mind. I would have pixelated the generated images and dithered them down to a smaller palette to look more retro. Or kept the hi-res images but subtituted the fonts to something more detailed and modern.

But that's just me.

hnbad · on Jan 2, 2023

It reminds me of those HD texture packs for Minecraft in the early days that would attempt to map photorealistic textures onto a world almost entirely made of cubes. Yes it's more high-fidelity, but it adds nothing except emphasizing the low fidelity of everything else.

nineteen999 · on Jan 3, 2023

I agree with you. The other point it makes to me is that AI art generation doesn't turn non-artists into artists, no matter how many programmer types suddenly think they can replace artists now. You still need a developed sense of taste and style, which lets face it, outside of software, many programmers do not have.

vyrotek · on Jan 2, 2023

This would be super fun to use with "Return to Zork" clips.

shdon · on Jan 2, 2023

Is it just me, or was anybody else reminded of the Carmageddon cover art after seeing the image they ended up using for Dr. Venom?

js8 · on Jan 2, 2023

I have a question. Stable diffusion is based on gradually processing noise into a coherent image, by training a denoiser. Would it be possible to feed low-fidelity image (such as pixel art, or pixelated image) directly into the denoiser step and get a higher-fidelity image that would match the original?

mschulkind · on Jan 2, 2023

Yes, stable diffusion has an img2img mode that does essentially exactly that.

sigmar · on Jan 2, 2023

I think what you are describing is called super-resolution https://ai.googleblog.com/2021/07/high-fidelity-image-genera...

charcircuit · on Jan 2, 2023

It's noise in a latest space which is different from noise in the image that the vector represents.

sublinear · on Jan 2, 2023

You'd have to scale the resolution on these waayyy down to not see the usual janky, smudgy, sometimes nightmare-inducing details.

I seriously have never understood why what gets published in these blog posts isn't just lower res especially since this is precisely about old video game graphics.

kybernetyk · on Jan 2, 2023

>the usual janky, smudgy, sometimes nightmare-inducing details

I usually don't notice those. It's only when someone mentions it.

xwdv · on Jan 2, 2023

It’s amazing that someday we’ll be able to pass in low fidelity pixel art sprite sheets to an AI and get back high definition hand drawn 2D graphics for use in games.

qbasic_forever · on Jan 2, 2023

I don't see how animations would work with the current crop of image generation tools. If you feed in 5 frames of a character swinging a sword pixel art you'll get five wildly different renditions of the character, not one character with a smooth tweening of the sword swing.

birracerveza · on Jan 2, 2023

That's why an artist is still needed to smooth it out and make it coherent, but at least 90% of the work is already done.

Sophira · on Jan 2, 2023

"Hand drawn" is a huge misnomer there - it wouldn't be hand drawn at all, that's the point.

smoldesu · on Jan 2, 2023

Square Enix: Is it possible to learn this power?

modfreq · on Jan 2, 2023

Get out of my head!

I literally just registered spritesheet.ai yesterday.

and I've already had mild success training a model on spritesheet data.

kyleyeats · on Jan 2, 2023

Someone put out a really compelling one a couple months ago:

https://www.reddit.com/r/StableDiffusion/comments/yj1kbi/ive...

modfreq · on Jan 2, 2023

Nice. Thanks!

throwaway675309 · on Jan 2, 2023

I've been sending emails to the creators of mid journey to scrape the entire archive over at spriters resource to create a custom model specifically to generate pixel sprite sheets.

https://www.spriters-resource.com

Fingers crossed.

ranger47 · on Jan 2, 2023

I'm interested in following your project if you have a blog or anything.

modfreq · on Jan 2, 2023

I only just recently started building things for fun again (Thanks AI) so I haven't had a blog in many years.

But I'll be launching one on alexbrown.io sometime in the next few weeks to document my projects.

js8 · on Jan 2, 2023

Actually, one day we will be able to do this not just with sprite sheets, but whole games.

jdamon96 · on Jan 2, 2023

you could probably build this w/ current img-gen model tech! totally agree, absolutely crazy.

NicoleJO · on Jan 2, 2023

What are you going to do about all the copyrighted material?

camtarn · on Jan 2, 2023

Honestly, really disappointing, especially since the author forces you to watch the video to see the final image - which looks nothing like the shoulder-spiked, triple-forehead-eye'd villain of the game. Spoiler, the generated image is just a threatening looking green dude with two different coloured eyes.

It's a decent writeup on the process of trying to generate specific images using text prompts, I guess, with the conclusion that it's really hard, and in some cases basically impossible (hence the lack of the three forehead eyes).

spaceman_2020 · on Jan 2, 2023

Yeah, I don’t think this tech is ready for production use at all right now. Great tech demo, not actually useful currently.

In contrast, I’ve been using chatGPT multiple times daily

newobj · on Jan 2, 2023

Nice, loses all its charm and turns off my imagination.

postalrat · on Jan 2, 2023

You wouldn't have those problems if you stuck to 320x240 pixels on your display.

ggambetta · on Jan 2, 2023

At last we have the tech to do a cinematic, photorealistic version of Zero Wing! For great justice!!!

NicoleJO · on Jan 2, 2023

[flagged]

pixl97 · on Jan 2, 2023

Because a large portion of us on HN aren't big fans of IP laws that last seemly forever.

NicoleJO · on Jan 2, 2023

The demographic is criminal??

pixl97 · on Jan 2, 2023

I don't reckon that you've studied the history of hackers much.

If we go back to the 80s and people were making acoustic couplers and hooking them to their telephone would you be calling them out for breaking the telco rules, or would you be asking for a blueprint to make your own?

astrange · on Jan 2, 2023

Since it’s legal (via explicit non-commercial data mining exemptions in EU law), it literally isn’t stolen.

NicoleJO · on Jan 2, 2023

It's not legal to reproduce copyrighted content. Idgaf where you live.

o_____________o · on Jan 2, 2023

I realize you're here with a single goal, but what content in the link is being "reproduced"?

astrange · on Jan 3, 2023

Sure it is if the copyright law literally says you can do it.

And civil copyright violations aren’t “illegal” anyway.

o_____________o · on Jan 2, 2023

Where's that?