Yes, I think a "Cartesian" quine would be pretty tough to create, because you're going from a symbolic representation to an inefficient visual representation. Could a more efficient visual representation solve it? One approach might be to output a bitmap that looks like this:
gunzip(########)
where ######## is a bitmap representation of the raw input to gunzip.
https://secure.wikimedia.org/wikipedia/en/wiki/Quines