Hacker News new | past | comments | ask | show | jobs | submit login
Neural Enhance – Super Resolution for images using deep learning (github.com/alexjc)
221 points by eejr on Oct 29, 2016 | hide | past | favorite | 51 comments



We enhanced the image like on CSI and look, the defendants face!

"Because my photos were used heavily in the dataset..."

Jury: So guilty


Defense should train a network with faces of the jury and then show how the same technique, run by their biased network, now shows each of them in the scene of the crime :)


Wow, we'll finally blow the lid off all these conspiracy theories when we 'unblur' the pictures of : Sasquatch, UFOs, Loch Ness Monster ... :)


Comparison using nearest neighbor, instead of a more reasonable linear filter, or-- heaven forbid-- some edge basic directed interpolator... is a little cheaty.


Agreed, it would have been nice to show other upscaling algorithms. But neural net super resolution generators can still have significantly more detail at 4-8x, as shown here http://arxiv.org/abs/1609.04802


(Author here.) Yeah, I knew this would come up but decided to proceed with the pixelated comparison anyway. I couldn't get the GIFs to reflect the results because of 8-bit quantization/dithering. The images show the neural network inputs and outputs, not a comparison with other super-resolution algorithms (still fascinating :-).

I'm working on the Docker instance now, that should help anyone with interest/experience in the field compare results easily.


Really cool to see them taking a page out of X264, trying to match the "energy" of the image rather than the "correctness".


A friend of mine suggested that an approach similar to this could be used to upscale old standard definition TV shows (specifically, those shot on video rather than film). I'd imagine that multiple specially trained networks would be employed for different parts of the image (trained on pictures of individual performers or types of set/background). Pleased to see that this is possible. Is there anyone doing something along those lines already?


It should also be possible to train it on itself to improve moving scenes by using the motion itself as temporal super-sampling, just like the human eye does.


this works quite well, and does not necessarily require any NN/machine learning. see the youtube for this paper https://www.disneyresearch.com/publication/scenespace/ tldr simple brute force weighted average of samples from many frames, combined with a noisy/low quality depth-from-motion estimate can be used to de-noise, increase resolution and otherwise manipulate video footage. very cool paper with great results from a simple technique.


Ooh


As you suggested, continuity of appearance is what makes this problem so difficult.

I recall watching a movie that was converted from black-and-white to color as a child. There were many distracting artifacts. Most notable was the hairlines of the actors would shift as the actor rotated their head. It made the film unwatchable.


(Author here.) Absolutely! Using multiple super-resolution networks, not only continuity would present problems, but also blending between different regions. I agree there's a lot of value for domain-specific networks here, as you can see from the faces example on GitHub.

I'd be curious to see an ensemble-based super-resolution, where each model can output the confidence of a pixel region, then have another network learn to blend the result.

Conversely, these results are achieved using a single top-of-range GPU. Everything fits in memory for a batch-size 15 at 192x192. By distributing the training somehow, you could make the network 10x bigger and train for a whole week and likely get much better general purpose results.


Is there anyone doing something along those lines already?

I have a side business doing Film restoration and am not aware of any solution like that. Probably the best upscale solution there is is from Teranex, acquired by BlackMagic Design. Evertz probably also has something in their offering.


It should work, I don't think you need to bother with training it on individual performers. Someone made a thing like this to improve low res anime, that worked well.

In theory you could use this to increase temporal resolution as well. Turn 24 fps movies into 60 fps, and upscale regular HD to 4k.



My question is offtopic, but how do you keep lists of urls like that? Do you just use text files? I'm struggling with too much to read


The key when you have too much to read is losing links, not retaining them better.


I like pinboard.in


It definitely makes a significant qualitative improvement, making the picture appear more in sync with what our brain interprets as a higher resolution picture, but my first thought is whether this particular example goes beyond aesthetics. Is there really any instance where this method could for instance turn an unintelligible picture of a license plate to something in which the characters can be recognised? More generally, I wonder whether there has been any research on the limits - i.e, what needs to be the combined minimal size of the information stored in the neural network plus the information on its inputs before the output can be said to be true to the source with probability x ?


I imagine that if you trained this on a set of license plate photos, it would be able to enhance license plates illegible to an untrained human such that they're readable. However, I doubt it would be better than a human specifically trained at this task.

I've seen some videos from Cold War satellite photo analysts, and the way they can look at some tiny gray blobs and go "That's a T-64 tank, that's a T-62 tank, that's an SA2 launcher" etc.


Well, it doesn't create any information that wasn't in the original data (nothing can do that, you can only lose information in processing) so if e.g. the characters can be recognized in the processed image of a licence plate, then by definition they could have been recognized from the original data as well in some manner.

However, they can make things more easily interpretable by humans. A rough analogy is turning up the contrast - given a very dark image of licence plate where the black parts are totally black (#000000) and white parts are just very dark (#010101), the characters definitely can be recognized even while human in normal conditions would just see it as totally black, and processing would help.


> Well, it doesn't create any information that wasn't in the original data (nothing can do that, you can only lose information in processing)

I'm not sure this is correct. In a sense, it does contain information that wasn't in the original inputs - i.e information added by the weights in the neural network which itself was obtained by information extracted from an enormous amount of previous samples. Of course, the largest and best trained neural network won't be able to tell the license number given 2 pixels of information, but I am curious as to the theoretical limits of what can be achieved in extreme cases of with very little information as input and a neural network that has almost limitless resources.


This is amazing. The surprise is that while the higher resolution images seem real, they are reconstructions based on the previous learning, and can be very different from the actual.

Nice test images to include would have been an original image, downsampled image, and the reconstructed image. If the author is reading this, could they add this to the README?

Sci-fi on TV is making it to the real world :)


Would also be interesting to see some pixel art run through this. It probably won't work that well given that its trained on real downsampled photos though, but who knows.


The examples show a comparison to the original and the scaled.


This technique is akin to hire an artists to draw a high resolution version of your pixelated photos.

A good example of this is "They Of The Tentacle Remastered" (http://dott.doublefine.com/). The new game looks extremely similar to the old one but it has been redrawn.

As some one suggested you should be able to take an old TV show, train the neural network with HD pictures of the cast. And let it redraw it in its own "artistic" interpretation of the images.


This approach can be equaled or bettered with no machine learning.

This example allows easy comparison between common techniques. Choose image 7 to see an example with a person: https://dl.dropboxusercontent.com/u/2810224/Homepage/publica...


(Author here.) Did you see the faces example on the GitHub page? It was a domain-specific network trained adversarially for that purpose, but I have yet to see any super-resolution of that quality with or without machine learning.

Most other approaches don't even try to inject high-frequency detail into the high-resolution images because the PSNR/SSIM benchmarks drop. Until those metrics/benchmarks are dropped, there'll be little more progress in super-resolution.


While those results are nice too, they really don't seem comparable to the ones given here. The look is too different.


I ran that image through the library with the default settings and it came out with an image that is in my opinion much better than all of the approaches shown there

http://imgur.com/a/moP57


The example w/ the Japanese ideograms is impressive, it seems to actually make a difference on readability.

https://github.com/alexjc/neural-enhance/blob/master/docs/St...


How does this compare to waifu2x?


How much RAM do I need to run this?

It dies for me with a MemoryAllocation Error after eating 28GB of RAM…


I found I couldn't really use anything larger than 300x300


Wow how much do you have?


On this system I’ve got 32GB, of which about 2GB were used by the OS itself, and another 2GB by firefox, that’s why it stopped at around 28GB.


(Author here.) Maybe it's worth moving to a GitHub issue. Try `--model=small`. The demo server limits the number of pixels to around 320x200 or 256x256 and can do only 4 at the same time to fit in RAM.


I do photo restorations on Reddit, where people often submit blurry photos that sharpening just can't fix. It would be great if this were offered as an online service.


Yes, it sounds possible with this code — but would require training a new network. Do you have a link to some examples?



This looks amazing.

Question for the more experienced deep learning folk: if I wanted to use this to upscale textures for a game, would I have to train it on the same type of texture? In other words additional wood textures when upscaling wood, brick when upscaling brick textures, and so on?


(Author here.) If you have the luxury to train on domain-specific textures, the results will definitely be better. That's why I included all the training code in the repository as well—to allow for this kind of solution.

If you scroll down on GitHub to see the faces examples, those are achieved by a domain-specific network. I suspect you'll similarly get extremely high-quality if you have good input images.


Yes, that would help a lot with output quality. The machine can only hallucinate what it has previously seen.


I've seen a number of neural network approaches for super-resolution like waifu, but I haven't seen something general purpose thats better than bicubic/fourier/nearest neighbor.

Would be nice if the author did a comparison.


(Author here.) My biggest insight from this project is that super-resolution with neural networks benefits significantly from being domain specific. If you train on broader datasets, it does pretty well but has to make compromises. Many recent papers do a comparison in terms of pixel similarity (PSNR/SSIM), and using those metrics the quality drops because high-frequency detail is punished under those criteria (even though it may look better perceptually). Reference: http://arxiv.org/abs/1609.04802

On GitHub, below each GIF there's a demo comparison, but on the site you can also submit your own to try it out (click on title or restart button). Takes about 60s currently; running on CPU as GPUs are busy training ;-)


> super-resolution with neural networks benefits significantly from being domain specific. If you train on broader datasets, it does pretty well but has to make compromises.

To what extent could the need for this trade-off be overcome with a larger network?


Train this using a huge facial database such as the one US immigration holds and you have the perfect human detector, able to identify you even from nighttime security cameras.


Granted it won't be actually sharpening the images but for 99% of the use cases it would be awesome!


(Author here.) Unlike most other non generative adversarial network (GAN) approaches to super-resolution, it does try to inject high-frequency detail; see the faces example on GitHub. But I tuned down that parameter in the released models a bit so it performed better generally.


Hi, what was the parameter you used for that face example, it's really impressive.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: