FastPhotoStyle from Nvidia

dbranes · on Feb 21, 2018

I'm probably missing something obvious here, maybe someone can explain the following to me.

- Their approach is a composition of 2 steps, what they call "stylization" and "smoothing".

- Top left of 2nd page they claim: "Both of the steps have closed-form solutions"

- Equations 5 is the closed form solution for the "smoothing" step.

My question: Where's the closed-form solution for the stylization step that they're claiming?

Are they calling equation 3 a closed-form expression? In this case the title and the claim in the introduction are rather misleadinng, because computing 3 requires you to train autoencoders.

saurik · on Feb 21, 2018

You don't train it for every image; in this way, a neural network often is a "closed-form solution": it provides you an equation, admittedly a very convoluted one, which can be used to obtain its solution, admittedly usually an approximation, in a finite amount of time. The normal solution to this problem (according to the paper) is an iterative technique "to solve an optimization problem of matching the Gram matrices of deep features extracted from the content and style photos", whereas this one is simply two passes: stylization and smoothing.

dbranes · on Feb 21, 2018

Not sure if I understand, don't every neural network ever produce some approximation in finite time? In what sense is this approach "closed-form"?

nmca · on Feb 21, 2018

Previous stylisation was slow because it needed to SGD optimisation for each image to be stylised. This uses a NN trained once. When you've trained a NN it is precisely a closed form solution, in the style y = max(0, 3x + 4). However they are normally a little longer to wrote down :P

dbranes · on Feb 21, 2018

Ah okay right this is the answer. Previous approaches [1] are deep generative models that you have to optimize for each input, whereas here you run just a forward evaluation on a model that you've trained beforehand.

I would still argue the term closed-form is misleading here, because:

- Even during training at any given time you can read off a "closed-form expression" of the neural network of this type, so closed-form in this broad sense really doesn't mean much. Furthermore any result of any numerical computation ever are also closed-form solutions according to this, on the grounds that they result from a computation that completed in finite number of steps. So really whenever you ask a grad student to run some numerical simulation expect them to come back saying "Hey I found a closed-form expression!"

- The reason the above is absurd is that these trained NN's aren't really solutions to the optimization problem, but approximations. So this is really saying I have a problem, I don't know how to solve it but I can produce a infinite sequence of approximations. Now I'm gonna truncate this sequence of approximations, and call this a closed form solution.

The analogy in highschool math would be computing an infinite sum that doesn't converge, but now let's instead just add to some large N, and call this a closed-form solution.

[1] e.g. https://arxiv.org/pdf/1508.06576.pdf

nmca · on Feb 21, 2018

Actually, I agree with you. Initially you seemed to object to the term "closed form"; this now highlights the more pertinent point - these models are 100% closed form, but 0% "solution" in the formal sense.

westoncb · on Feb 21, 2018

Someone correct me if I'm wrong, but I believe this refers to the fact that it can be expressed in terms of certain simple mathematical operations like addition, subtraction, multiplication, powers, roots etc.—and as a consequence, the execution is very efficient. My understanding is that 'closed form' solution is essentially something that resembles a polynomial (again, accepting corrections!).

IanCal · on Feb 21, 2018

Closed form just means you can do it in a finite number of operations. So just "run X" rather than the previous versions of this kind of thing which are "repeat X until measure Y is lower than the limit I care about". (my basic understanding)

westoncb · on Feb 21, 2018

I checked the Wikipedia article, and the sorts of operations involved do appear to be a part of the definition: https://en.wikipedia.org/wiki/Closed-form_expression —though it sounds like it's a somewhat loosely defined term.

grenoire · on Feb 21, 2018

I really don't think that it's fair to call neural networks closed-form solutions. The term immediately makes me assume that it enabled you to bypass the training stage altogether.

IanCal · on Feb 21, 2018

Running a trained net is, if you only have to do a single iteration. It's a complex formula, but it is a closed form one.

scribu · on Feb 21, 2018

Notice that all of the examples ilustrated in the paper contain similar scenes. The content image is a building, while the style image is also a building. Or an image of trees is styled using another image of trees.

But how well does it fare when you give it an image of a house and an image of something completely different, like a dog or a slipper?

amelius · on Feb 21, 2018

Download the code, run it, and let us know!

JackFr · on Feb 21, 2018

What would you expect the outcome to be?

What is the correct answer to a question that's not well-formed?

semi-extrinsic · on Feb 21, 2018

The interesting question, then, is how far off can this be and still work? Is the limit "reasonable", or is there room for improvement of the algorithm?

E.g. I think most humans would say taking this content picture:

https://wallpapershome.com/images/pages/pic_hs/10150.jpg

and styling it with this picture:

https://c2.staticflickr.com/4/3499/3876547311_c2e32759d9_z.j...

is a pretty well-posed operation. How does that look using this algorithm?

IanCal · on Feb 21, 2018

Your first link just redirects to their homepage for me, can you explain which picture it was?

yorwba · on Feb 21, 2018

It shows a red crab on a beach in front of the bright blue ocean with a blue sky and white clouds.

I guess transfer of the wooden house amidst yellow fields with a reddening sky might lead to a wooden crab on a yellow field in front of a reddish-yellow ocean with red sky and clouds, or something.

yorwba · on Feb 21, 2018

It actually looks better than expected: https://imgur.com/a/5BjvC

IanCal · on Feb 21, 2018

Looks nice!

Did you have to do anything extra to get it working? I've set things up according to the documentation (I think), but I get dimension size errors when running it.

yorwba · on Feb 21, 2018

Haha, yes, I had to rewrite their code a bit. All the .unsqueeze(1).expand_as(...) in photo_wct.py need to be replaced by just .expand_as(...) and the return value of __feature_wct needs to be wrapped in torch.autograd.Variable.

I'm going to submit a PR, but it took me a bit of experimentation to fix these errors, so the code is a bit messier than I'd like.

IanCal · on Feb 21, 2018

Ahh that looks like the error I was hitting, thanks. I might try replacing the bits as well, though I just upgraded pytorch from 0.1.12 to 0.3 and it became much slower (I killed it after 5-6 minutes of setup).

yorwba · on Feb 21, 2018

My fork is here: https://github.com/Yorwba/FastPhotoStyle

I was using the pytorch 0.1.12 installed with conda (following their USAGE.md) and it took ~30s total for the transfer.

IanCal · on Feb 21, 2018

Much appreciated thanks!

For some reason it's taking me about 4-5 minutes for the transfer, but the code now runs and the rest of the runtime is only a few seconds.

semi-extrinsic · on Feb 21, 2018

Wow, thats pretty good! So this thing can do fairly well on complex transfers.

ttoinou · on Feb 21, 2018

Thoses interested in that technology : I made two videos 18 months ago, No optical flow and youtube compression kills everything but still decent if watched in 4K on a big screen :)

https://www.youtube.com/watch?v=2YRVt80g2Ek

https://www.youtube.com/watch?v=i69cBYI6f-w

TD-Linux · on Feb 21, 2018

I'm going to be that person - why a non-OSI approved license? Given that it's CUDA-specific, I'd expect NVIDIA to want people to use it.

dingo_bat · on Feb 21, 2018

> Licensed under the CC BY-NC-SA 4.0 license

Seems fine to me. If you want to develop something commercial you'd roll your own anyway. Nothing else is restricted by this license.

daeken · on Feb 21, 2018

Consider artists. There's a tremendous potential in using technology like this in art, and preventing someone from selling their works will often put them off of using it at all.

mfgmfg · on Feb 21, 2018

What does the license of the product have to do with the output of the product? You can use GIMP and GCC commercially, for example and libraries used with GCC often have runtime exemptions for their output

daeken · on Feb 21, 2018

Because this tool is licensed non-commercial. Using it for art that you sell would be a commercial use, and a violation of the license.

andybak · on Feb 21, 2018

Hmmmm. Does the licence of the tool affect the output from the tool? Photoshop is propriety but Adobe doesn't have to explicitly grant me rights to the work I create with it.

bb88 · on Feb 21, 2018

Usually no, unless say, the tool put some part of itself in the output.

The license of GCC doesn't affect the license of your binaries.

The license of python doesn't affect the license of your software.

etc.

contravariant · on Feb 21, 2018

You only need a license for the copyright though, in the worst case you waive your right to distribute your derivative code if it has been used for commercial applications (which would be a weird interpretation, but I can't find a precise explanation what the 'non-commercial' license covers).

pjmlp · on Feb 21, 2018

Contrary to modern software developers, artists are used to the notion that tools developed by other people are worthy of some kind of compensation, even if found on some flea market.

daeken · on Feb 21, 2018

Certainly, but do you see where you can buy a commercial license for this? I don't.

SeanBoocock · on Feb 21, 2018

I wonder whether this could be applied to a real-time scenario. Modern Real-time renderers for games will often have a tone mapping step that let artists color grade the final output. The paper cites a 11+ second runtime for 1K inputs, which is orders of magnitude off what it would need to be, but perhaps a simpler version run on the GPU is feasible.

josephpmay · on Feb 21, 2018

Notice that the research was done by Nvidia

piracykills · on Feb 21, 2018

Nvidia is pretty big in the machine learning space in general, not game specific these days - GPUs are pretty general purpose highly multithreaded number crunchers and Nvidia's been making moves further in this direction with their own CUDA-based training tools, the DGX-1, the Jetson and other products.

supermdguy · on Feb 21, 2018

Paper this is based on: https://arxiv.org/pdf/1802.06474

It's really great that NVIDIA is releasing code for their deep learning research.

milanfar · on Feb 26, 2018

(a) This problem is long known as color/contrast transfer, and it was solved > 10yrs ago, (b) the results shown in this paper aren't objectively or subjectively better/more photo-realistic than Kokaram etc.'s work; and (c) I question whether this task even requires deep learning at all.

https://francois.pitie.net/colour/

milanfar · on Feb 26, 2018

This problem is long known as color/contrast transfer, and it was solved more than 10yrs ago. The results shown aren't objectively or subjectively better/more photo-realistic than Kokaram etc.'s work which is far simpler. I question whether this task even requires deep learning at all.

https://francois.pitie.net/colour/

p1necone · on Feb 21, 2018

These very low res example images aren't particularly useful for judging how good this actually is.

grondilu · on Feb 21, 2018

Only tangentially related, but has anyone ever tried to apply style-transfer on human faces for artificial aging or rejuvenation? Like for the movie industry or something?

robbomacrae · on Feb 21, 2018

FaceApp does this (inc gender swapping) and its quite fun for an hour or so of messing around.

dingo_bat · on Feb 21, 2018

The examples seem to be too good to be true. I don't have a GPU lying around so I cannot try it unfortunately.

medhir · on Feb 21, 2018

paperspace provides pretty easy setup cloud GPUs for ~$0.40/hr if that's of interest :)

andybak · on Feb 21, 2018

The only machines with decent GPUs in them I have access to run Windows and Windows Subsystem for Linux doesn't allow GPU access. Other than dual-booting or running Linux in VirtualBox - is there any way I can try this?

exDM69 · on Feb 21, 2018

None of the dependencies seemed to be Linux-specific at a quick glance. You might be able to install all that on Windows (not sure how pleasant experince it'll be).

Virtualbox won't help you, because you can't give proper access to the GPU for the VM guest unless you set up PCI-e passthrough and dedicate your whole GPU to the VM guest (and use your integrated graphics for the host). Not sure if this is even possible if Windows is the host.

If you don't feel like setting up a Linux install on your box, you could try some of the GPU cloud services.

ATsch · on Feb 21, 2018

Also I am told the proprietary nvidia drivers have a software lock that prevents you from using GPU passthrough unless you buy certain more expensive models.

exDM69 · on Feb 21, 2018

With PCI-e passthru using intel_iommu, you can set this up with a gaming GPU. The driver can't tell that it's not running on bare metal.

This requires dedicating the whole GPU and the PCI-e slot to the virtual machine guest.

For more flexible virtualization setups, you need the professional quality cards.

mtreis86 · on Feb 21, 2018

There is a work around. A number of GeForce cards gave the exact same chipset as a Quadro card but with a resistor pulling down an external pin. That resistor can be changed to make the card identify as a Quadro.

http://www.eevblog.com/forum/chat/hacking-nvidia-cards-into-...

Apparently this can also be done from software

http://archive.techarp.com/showarticleefc1.html

exDM69 · on Feb 21, 2018

This is just spoofing the PCI VID:PID numbers to the driver and relying on driver bugs(?) to function. You could do the same with a few lines of kernel hacks far easier than soldering. It does not enable any features that are fused off in the hardware. This setup is not reliable.

Also, these posts are from 2008 and 2013, 5 and 10 years old. These hacks probably don't work any more.

executesorder66 · on Feb 21, 2018

OT, but why do those machines need to run Windows? Why can't you install Linux?

andybak · on Feb 21, 2018

They are dev boxes for Windows VR apps. I'd like to play with this out of curiosity. It's not worth the hassle of a dual boot for that.

flipp3r · on Feb 21, 2018

The user manual literally has a setup for Ubuntu, using CUDA & cupy.

andybak · on Feb 21, 2018

I'm not sure I understand how that helps me.

poppingtonic · on Feb 21, 2018

Anaconda can probably help

jamespo · on Feb 21, 2018

JeffreyKaine · on Feb 21, 2018

Is it really all that hard to have a demo site for these things? It would be a lot of fun to play with crossing pictures. I'm guessing it's because using a graphics card in the browser isn't good enough yet?

volker48 · on Feb 21, 2018

I'm not sure how fast their FastPhotoStyle approach is, but a TensorFlow implementation of the original neural style transfer can take upwards of 20 minutes to create the final stylized image. If someone had the pre-trained model and neural net code in JS to read it and you could do it all client side then it would be possible, but still very slow.

ehsankia · on Feb 21, 2018

The tech has come a long long way since the original, even before this FastPhotoStyle project.

A few months ago, there was TensorFire [0] that was able to do it in the browser. Quick google also gives other results [1]. There's also many apps that can do it in seconds. Speed definitely isn't an issue anymore, but getting it to work in browser can be tricky.

[0] https://tenso.rs/demos/fast-neural-style/

[1] https://reiinakano.github.io/fast-style-transfer-deeplearnjs...

krn1p4n1c · on Feb 21, 2018

That top left style will be perfect for the family xmas photo.

limaoscarjuliet · on Feb 21, 2018

Is there some research doing the same in voice area? - Fix/change accent, - Improve person's voice, - Perhaps even make one sound like another.

STRiDEX · on Feb 21, 2018

I saw a clip from adobe a while ago

https://www.youtube.com/watch?v=I3l4XLZ59iw

arto · on Feb 21, 2018

https://news.ycombinator.com/item?id=16426585

etaioinshrdlu · on Feb 21, 2018

https://lyrebird.ai/ -- they do the last thing. But they all seem related.

arbie · on Feb 21, 2018

Yes, Adobe's VoCo, is one example.

Images and speech require different architectures (CNNs vs RNNs).

abledon · on Feb 21, 2018

Has anyone had luck using this for their tinder profile?

skocznymroczny · on Feb 21, 2018

Unfortunately, it only transfers style, not attractiveness

jczhang · on Feb 21, 2018

I assume you need a Nvidia card for this? Also has anyone tested it and seen how long it takes to render?

ivanceras · on Feb 21, 2018

>Preparation 1: Setup environment and install required libraries >Python Library Dependency

> conda install pytorch torchvision cuda90 -y -c pytorch

What is conda? How do i install it on ubuntu 16.014?

dagw · on Feb 21, 2018

Conda is basically an alternative to pip and virtualenv, used by the Anaconda python distribution that's really popular in the data science and machine learning community. The easiest way to get it is to install miniconda: https://conda.io/docs/user-guide/install/linux.html

sungam · on Feb 21, 2018

Conda is the anaconda python distribution widely used for machine learning and numerical computing.

ttoinou · on Feb 21, 2018

Is it faster than previous implementation ?

koverda · on Feb 21, 2018

Looks like it's a lot faster. They compare their approach to the Luan et al. approach, and for a 1024x512 image, they are about 30-60x faster. They also seem to be more accurate with better results.

ttoinou · on Feb 21, 2018

Oooooh no I'm going to get back on nerding this 100 % of my time :'(

dharma1 · on Feb 21, 2018

what's the max resolution with this?

simonhamp · on Feb 21, 2018

What witchcraft is this?

sannee · on Feb 21, 2018

Of course there is a relevant XKCD: https://xkcd.com/1838/