Notice that all of the examples ilustrated in the paper contain similar scenes. The content image is a building, while the style image is also a building. Or an image of trees is styled using another image of trees.
But how well does it fare when you give it an image of a house and an image of something completely different, like a dog or a slipper?
The interesting question, then, is how far off can this be and still work? Is the limit "reasonable", or is there room for improvement of the algorithm?
E.g. I think most humans would say taking this content picture:
It shows a red crab on a beach in front of the bright blue ocean with a blue sky and white clouds.
I guess transfer of the wooden house amidst yellow fields with a reddening sky might lead to a wooden crab on a yellow field in front of a reddish-yellow ocean with red sky and clouds, or something.
Did you have to do anything extra to get it working? I've set things up according to the documentation (I think), but I get dimension size errors when running it.
Haha, yes, I had to rewrite their code a bit. All the .unsqueeze(1).expand_as(...) in photo_wct.py need to be replaced by just .expand_as(...) and the return value of __feature_wct needs to be wrapped in torch.autograd.Variable.
I'm going to submit a PR, but it took me a bit of experimentation to fix these errors, so the code is a bit messier than I'd like.
Ahh that looks like the error I was hitting, thanks. I might try replacing the bits as well, though I just upgraded pytorch from 0.1.12 to 0.3 and it became much slower (I killed it after 5-6 minutes of setup).
But how well does it fare when you give it an image of a house and an image of something completely different, like a dog or a slipper?