Hacker News new | past | comments | ask | show | jobs | submit login
Image Super-Resolution via Iterative Refinement (iterative-refinement.github.io)
115 points by floxy on Nov 12, 2021 | hide | past | favorite | 68 comments



While brilliant, I can’t stop myself from reminding the instance where another tool put Ray Gosling’s face on an image [0]. It’s fascinating, I especially enjoy when AI screws up because it breaks the illusion and gives us a clue to reason how this have happened.

[0] https://news.ycombinator.com/item?id=24196650


Those old face super resolution cnn were usually trained with the celebA dataset. And because celebs have above average attractiveness, so did the generated images.

Also the dataset were primarly white. Creating the meme image white Obama.


> Also the dataset were primarly

But offers «40 binary attribute annotations per image», which the (virtual) algorithm may use for choosing a path.


I was talking about previous iterations of facial SR. After the white Obama debacle, researches included skin color as one of the parameters in the cost function.


Ray or Ryan? I’ve never heard of the former.


With some probability both, potentially:

> CelebFaces Attributes (CelebA) Dataset [...] 10,177 unique identities, but names of identities are not given

https://www.kaggle.com/jessicali9530/celeba-dataset

http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html


> I’ve never heard of the former.

That should probably give you a clue but you can always check the source and find out if it's Ray or Ryan Gosling.


I've never seen this mentioned, but when I tried these sorts of methods, they mostly only worked on images that started as high quality images and were deliberately down-scaled.

If the image original was low quality, or had artifacts, then the results were useless.


This reminded me of the recent "VFX artists react" video by Corridor Crew[1], they talk about how the artist(s) added a ton of details to a scene, just for almost all of it to be blurred out and hidden by subsequent passes.

The point made was that even though the end result is blurred out (out of focus etc), the brain can tell if it started out looking appropriate or not. So to make a good shot, detail is required.

In that case it's our brain doing the up-scaling of sorts, and the issue seems similar. Down-scaling (blurring) a high-quality image and a low-quality one will result it two images that have critical differences. When those again are up-scaled they will lead to very different outcomes.

Similarly if noise is introduced in the down-scaled (blurred) image, that is in essence the same as changing the image it was down-scaled from, thus up-scaling would again lead to something different.

[1]: https://youtu.be/-CjNwVyojPA?t=1034


True, but most images on the web have been deliberately downscaled.


There’s a lot of interest in superresolution imaging in quantitative microscopy, where this is definitely a huge limitation


I am afraid these methods ("learn the typical detail and "re"-apply it") on the said context ("obtain telling detail where we miss it") would be "dreaming" the detail.

There methods are for "beauty", not to "induce real facts": what is "probable" is not what is "real".

--

Edit: I have an example for you. Check

https://github.com/Janspiry/Image-Super-Resolution-via-Itera...

and notice the black spot under the iris of the right eye, on the left: does the actual person really have it? It is not in the blurred source image, https://github.com/Janspiry/Image-Super-Resolution-via-Itera... . Now think of an "empiricist" perceptive instrument returning that kind of "noise"...

--

I understand you work on the topic (ANN for assessment), so you surely meant something different that what contextually appears.


And the pupils are not round which is a common telltale of some generators.


Super-resolution has been in the news over the last few days after it came up in the Rittenhouse trial. The defense lawyers said something along the lines of disallowing the ipad pinch-and-zoom as a way to present evidence because (they claimed) that it does super-resolution, which would potentially introduce fake evidence.


Thats not what they claimed.

They pointed out that the onus is on the prosecution to their get evidence validated. If they want to zoom in they have to get an expert to testify that iOS doesnt add any pixles.

Its pretty basic trial procedure, the prosecution knows the rules and the judge gave them time to find an expert.


"The judge responded by disallowing the zoomed-in footage unless the prosecution could prove that it wasn’t manipulated, and only giving them 20 minutes to find and produce an expert witness from Apple, which was obviously impossible. "

20 minutes to find an expert?

All of this hinges on whether or not Apple really is using AI (like super-resolution) for pinch and zoom. If they are, this is a totally valid objection.

But I've never heard that iPad pinch-and-zoom does this. It would be surprising if it's more than plain old interpolation.

Not to mention the defense lawyer clearly doesn't understand what the hell he's talking about, which doesn't exactly engender confidence in the judge taking this complaint seriously: "iPads, which are made by Apple, have artificial intelligence in them that allow things to be viewed through three dimensions and logarithms"

OH NO. LOGARITHMS!


> 20 minutes to find an expert?

They'd had an expert in previously who came back and contradicted the prosecution's statement that enlarging won't add "pickles" (quickly corrected to pixels) to the image. The judge allowed the evidence after the prosecution followed the proper procedure.

> But I've never heard that iPad pinch-and-zoom does this. It would be surprising if it's more than plain old interpolation.

There are various methods that can be used, any of which can add new pixels and or colors, the prosecution expert didn't actually know the details. When you have a couple of pixels that's claimed to be a gun, it really matters whether it's an image artifact or not. Also, the alleged gun is somehow in Kyle's left hand (he's right handed) and does show up as the same temperature as his body on the infrared image of the same scene.

> Not to mention the defense lawyer clearly doesn't understand what the hell he's talking about

Nobody there knew what they were talking about, which is why they got experts to testify. The prosecution hasn't been following court procedure and has gotten chewed out for it repeatedly because they're trying to pull illegal tricks and are getting called on it.


The correct way to frame this conundrum is that what you see on an iPad in Photos by default[0] is not the original version the jury should be considering, either. What you get is typically a zoomed-out version, which mangles the image by selectively omitting “pickles”, or an already zoomed-in version (even without any pinching) if the image is smaller than the screen. The “true original” cannot be viewed using Photos absent third-party tooling.

[0] Excluding rare events where source image’s dimensions in pixels exactly corresponds to those of the screen on that particular iPad model.


In court, you can't just offer evidence on its own, you need to offer a witness who can attest to the evidence's reliability, who may the be cross-examined. Doing otherwise violates one's right to confront their accusers. Binger, the prosecutor, did not want to do that. This is basic law--something he has repeatedly been admonished in court for ignoring.

Also, you're actually contradicting what the prosecution's own expert witness said in court. He said, yes, it would add pixels and he didn't know what kind or color. And for all that, the jury was allowed to see the images.

This trial has been wild. Grosskreutz went on ABC the other day to recant his sworn testimony just prior. On ABC, he said he didn't point the gun first, but even Snopes has said that yes, he did admit to that in court.[1] There is a photo of him with his gun pointed at Kyle's head as his bicep is being vaporized. It was when confronted with this that he admitted it originally in court.

Grosskreutz also claimed that Kyle re-racked his gun and this meant that Kyle wanted to kill someone, yet no unspent ammo from Kyle's gun was ever recovered, nor can any re-racking motion be seen on video. Instead, an unspent round from Grosskreutz' Glock was recovered, implying that Grosskreutz had re-racked his gun. Given the mechanics of that, it had to have been when he still had two working arms. Knowing that Grosskreutz' roommate wrote on social media that Grosskreutz wished he'd killed Kyle (something the roommate denied on the stand, saying he'd made it up), then if we use Grosskreutz' own line of reasoning, Grosskreutz had both threatened and intended to kill Kyle prior to being shot.

[1] https://www.snopes.com/fact-check/kyle-rittenhouse-gaige-gro...


Of course zooming in "adds" pixels.

And if we were to follow that pedantic logic, we should consider removing pixels from the original image (zooming out) equivalently problematic. But, apparently, no one cared that software presented a version with "subtracted" pixels in the first place, and only cared about pixels added by zooming in (which could have technically been "de-zooming-out", up to a point).

Generally, fitting a raster image with dimensions AxB into a viewport of dimensions NxM inevitably involves what effectively is adding and/or removing pixels to/from the original image. That can be done in various ways (which is why we have different resizing algorithms).


> But, apparently, no one cared that software presented a version with "subtracted" pixels in the first place, and only cared about pixels added by zooming in (which could have technically been "de-zooming-out", up to a point).

Courts only rule on stuff like that if there's an objection to it. If you don't object, you can't come back later and complain about it, you have to object at the time. So all of Binger's pickle objections were legally irrelevant. He knows this and is playing to the cameras, that's why he's getting yelled at by the judge, because none of what he said mattered legally. He was also pretty clearly angling for a mistrial after his earlier screw-ups, like managing to obtain information harmful to his case from his own prosecution witness on direct. This is a screwup of epic proportions.

In this case, there was doubt about the nature of a handful of pixels in an enlarged image and whether or not this depicted a gun. It's hardly "pedantic" to want to get experts to talk about the reliability of the image, rather than lawyers (none of whom knew anything), when the prosecution's entire theory at this point hinges on a legal standard of "provocation with intent" where one cannot regain innocence by fleeing or communication.


And when the prosecution is trying to infer from a single pixel what the defendant was doing, algorithms and artifacts matter.

Just watch the trial.


Yeah, nobody else was trying to say that some pixels in a greatly enlarged image meant anything until now. And no witnesses are there to testify to this gun being raised--small wonder, given that Zimminski is the one who fired the first shot, yet he wasn't even brought to court.

They could have brought him in to testify, but didn't. This is the same trick they pulled with Grosskreutz, where the DA ordered them not to search his phone (despite them already having a signed search warrant for it) and they did not interview his, and only his, interview with the police.

They also testified that they had never once done this before.

I encourage more people to watch the actual trial. Some news outlets, particularly CNN, have engaged in highly selective reporting that's getting called out by all the people actually watching the trial. The ABC coverage where they put on Grosskreutz, and let him contradict his sworn testimony in court, despite there being photographic proof that he lied, not to mention physical evidence that he (and not Kyle) re-racked his gun, was especially crazy.

I won't be surprised if this turns into a bunch of defamation suits in the future.


I strongly agree with this comment. But this is why digital media is so dangerous - because few people understand the “logorithms” used to light the pixels on the screen


“ 20 minutes to find an expert?”

They had months to prepare. They’re not noobs they know what the rules are.

The rest of your comment is exactly why council has a right to object and demand validation:

- The judge doesn't know the answer to the question

- Defence doesnt know the answer to the question

- Prosecution, even, didnt know the answer to the question.

Heck, you admit in your own answer that you don’t know the answer - “Id be surprised” doesn’t cut it in court.

As to “logarithms”, defense is not a CS expert and mockery and sarcasm wont get you far in court. An expert will.

But prosecution knew this, and when they did show the still they got ripped apart when defense demonstrated that the pixel that was supposedly a gun was a headlight from a car a couple frames earlier

EDIT:

See a submission I made about the dangers of ignoring procedure and trusting digital data:

https://news.ycombinator.com/item?id=29206833


The image likely was recorded as pixels (in some grid), but is stored as sets of DCT components (https://en.wikipedia.org/wiki/Discrete_cosine_transform), and doesn’t have pixels.

If so, of course it adds pixels.

So, there are several ways to lose information. Inside the phone, at least these:

- between reality and image sensor

- between image sensor and storage

- between storage and display


If the algorithm introduces materially changes the image, then the opposing council (to the one who introduces evidence) has the right to have the image validated be validated by an expert.

Thats basic court procedure.

Btw, see a submission I made about the dangers of ignoring procedure and trusting digital data:

https://news.ycombinator.com/item?id=29206833


I read “add any pixels” in the context of zooming in as super resolution.


Is interpolation “super resolution”. Id argue not, but it doesn't matter.

If the software adds pixels to the image than it has made a material change to the picture. Defense has a right to call the prosecution to prove that the zoom effect does not alter the picture.

And its not a moot point. Years ago Xerox started doing OCR (?) on photocopies to same on storage before photocopying (by storing the image as text). Everyone assumed it was a materially identical copy until people noticed it was making typographical and lexical changes to legal contracts in materially meaningful ways. [1]

[1] https://www.cbc.ca/news/science/xerox-copiers-might-alter-nu...


It would have to be an interpolation algorithm that doesn't add or remove signal information. Everybody uses approximations for efficient hardware utilization. I wouldn't consider that admissible.


Then get an expert to testify as much.

Btw, this isn’t a “right wing” issue. Political “liberals”, to their credit, have been saying as much for years, pointing out that closed source software has huge problems in court.

As it stands, the defense has a right to force the prosecution to either get an Apple employee to testify in court, or use a linux machine, gimp and a CS expert to explain what gimp is doing.

EDIT:

sorry I read comment Im replying to as “I wouldn't consider it inadmissible”. I txted this while dozing off.

Leaving comment for record’s sake.


Any interpolation adds information to the data: the very interpolation method is alien to the image, and hence becomes “added” information.


And therefore needs to be validated!

The prosecutor literally tried to infer an action from a single pixel. That’s BS and everyone in this forum has the technical ability to recognize this. A single pixel from a low light environment (therefore noisy)?

BS


Pinch-and-zoom is a display time thing. Makes you wonder what would happen if AI enhancement is applied at capture time. Like Pixel Night Site already does (others too?).


>Makes you wonder what would happen if AI enhancement is applied at capture time

I think this is a very pertinent point!

The only reasonable outcome, IMO, is that photos or videos that are improved in this way between capture and storage must be made inadmissible as evidence.

But I also think that in the longer term that won't really change much, because I expect all audio and photo evidence to go down that drain because falsification technology will improve to the point where it becomes practically impossible to reliably distinguish between genuine and falsified media.


> The only reasonable outcome, IMO, is that photos or videos that are improved in this way between capture and storage must be made inadmissible as evidence.

That's not at all reasonable. There would be certain details that could be made up by the algorithm that it would be reasonable to be made inadmissible.


Wow great question. Iphones are using ML on the capture end these days, right?


Can you provide some image difference results between the reference images and the resulting super-resolution output? That would help to visualize what sort of structure, if any, is introduced by the super-resolution process.


The "leopard spots" example is particularly interesting in how the super-resolution just hallucinates seemingly similar textures which can be completely different from the actual texture in the reference patch. (Such artifacts have been specifically pointed out in the context of analogous deep learning approaches applied to medical images).


It is not structure per se (although in the case of the leopard there is structure). It is an illicit inference.

Given a diffusion process (as blur) there are an infinite number of initial states which converge to a specific final state. This is the nature of diffusion (think of it as: as long as the total initial energy is the same, the final state of a diffusion process is “homogeneous density of energy”). So inferring an “initial state” is totally invalid.

Edit: look at the 64 to 256 to 1024 example in the “Unconditional…” section and take a look at the artifact on the bottom left (to the viewer) teeth and lip. If that is not an artifact… Same on the top-right teeth.

Also: how does the algorithm know it is facial hair and not just makeup? It might be both but it generates facial hair.


I'm curious to know if the various blemishes (acne scars, moles, etc...) are there in the true images. Also, the braids in one of the pictures don't look as simple/contiguous as I think real braids would be.

Still, it's very cool how it fills in realistic looking details.


If you scroll down, under the "Super-Resolution Results" header, there's a comparison with a "Reference" column.


Thanks, what I am asking for is a visualization on the pixel level which shows the difference between the reference image, and the associated super-resolution image, as is often used to highlight the minimal differences in adversarial examples that confuse image classifiers. For example, see:

https://christophm.github.io/interpretable-ml-book/adversari...

https://openai.com/blog/adversarial-example-research/


so we can go basically to the old times of speech based tales just with computer AI instead of your own imagination - i mean the modern version would be that Netflix sends to your computer "The superhero hits the villain", and the AI on you computer makes it into a glorious 8K sequence of frames of the glorious battle. Depending on the model you leased of course, premium custom or regular one, etc. And there is a startup here to turn book pages you would read otherwise immediately into a movie.


It’s an interesting thought.

Similar to the ideas of sending the audio on a zoom call and some deep fake model of your face and then the viewer sees a high res face talking but doesn’t need to get a full video stream.


Page doesn't render properly in chrome mobile.


Funny how everyone was mocking that judge for thinking that zooming might add data, yet here is an example showing that zooming reduces the age of someone's face and removes the aggression from a tiger's expression.

You might say "well of course we all know that this AI based super-resolution uses too much computing power to do in real time on an iPad, where they can only do sinc interpolation" but how long will that be true for? It's not really unreasonable for a judge not to know about technical details like that.

I was pretty shocked at how all of the comments on the Ars article about it were so misguided.


I'd love how many times you can regressively run this process until the first generation source no longer looks (subjectively) like the nth generation refined image.

e.g.

   [woah]   -> [wooah]  displacement: 1
   [wooah]  -> [woowah] displacement: 3
   [woowah] -> [oowaha] displacement: 6


What an extremely mobile-unfriendly website. I'll try to remember to read this on the pc later ...


Two Minute Papers did a ̶r̶e̶v̶i̶e̶w̶ ̶o̶f̶ ̶t̶h̶i̶s̶ ̶w̶o̶r̶k̶ thing:

https://www.youtube.com/watch?v=WCAF3PNEc_c


That's not a review. The guy just described the title of the paper without even reading the abstract.


haha it’s a shame the level of effort in his paper reviews has really gone downhill, this one was especially so.


Early TMP subscriber here. Recently subbed to bycloud as well, he does a pretty good job of just getting down to business when reviewing papers:

https://youtube.com/c/bycloudAI


thanks for the share! My personal taste is more like Yannic Kilcher videos I think.


Enhance 34 to 36.


Has anyone applied this to video (i.e. apply frame by frame)? A quick Google didn't turn anything up...


There’s no guarantee the detail will be stable frame-to-frame. You’d almost surely get textures stuttering and winking in and out of existence.


I remenber movies from the past where a police Investigation team magically upscaled a blurred picture of a numberplate, person etc. Always smiled and thought this is such bullshit. But it might actually become reality. Number plates still hard though


There are a lot of amazing things you can do when you have extra constraints for the blurred information, like which font was used for a blurred text or knowing that it was a valid barcode that got blurred. In general "enhancing" images just does not work though. It will fake it in a way that looks believable to us but it cannot recover information that is not even there.


But sometimes there is enough information, just in a format that isn’t easily useable. For example a low-res video of a car driving away: each frane is too blurry to recover the plate number, but all frames together have enough information.

I would expect DL models to perform wel on this task, but I could not find many results.


Do you know about that technology that works like a "spying microphone", reconstructing the environmental sound in a (ordinary quality, ordinary rate) video through the minuscule vibrations in the objects computed from the frames?

Found: https://news.mit.edu/2014/algorithm-recovers-speech-from-vib...

> ... they were able to recover intelligible speech from the vibrations of a potato-chip bag photographed from 15 feet away through soundproof glass

I see that (as expected) it has been discussed on HN: https://news.ycombinator.com/item?id=8131785


You are correct, of course. In that case, the information is actually there albeit spread out temporally over several frames. I should have said that I was talking about single photographs.


> magically upscaled a blurred picture

...To realistically get an hallucinated reality.

But member occamrazor, nearby, is right: to induce non-evident facts from a number of different downscaled pieces of information is possible.

Edit: if the idea of enlarging to reconstruct real missing detail (Mhh... no.) excites you, this (recovered for the above mentioned branch), real tech, will blow your mind: https://news.mit.edu/2014/algorithm-recovers-speech-from-vib...


is this real? something doesn't feel right


Where's the code?



This is an "unofficial implementation".


TLDR: "machine learning", that is, some neural network again. That has nothing to do with science.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: