that will cause state and can hurt performance since it needs extra memory. one of the main selling point of IPv6 is try to be stateless as much as possible to ease up on routers and switches
Where's the need for state? Please excuse the abuse of terms below, but you can probably figure out what I mean.
A v6 only host would send a v6 packet from it's full address to the v4+ address. A router on the path that has access to v4 internet would pull the v4 destination out, and reframe as a v4 packet (source ?, dest the v4 address), that's got the v6 packet, or maybe just the addresses, I dunno. This router would burn a lot of CPU doing this, but doesn't need any state.
The v4+ host has a little harder job, it needs to know a v4 address to send the tunneled packets to. But again, it's sending a tunneled packet, and whatever is processing that doesn't need state, it just needs cpu to inspect and untunnel. Of course, if the v4+ address is rfc1918 (or otherwise unroutable), then that's problematic. You _could_ do NAT at the router, but I'd say don't do that.
It might be useful for the v4 host to keep the v4 tunnel sender IP from incoming addresses to reframe on the back end.
You might also do something special with routing to the v4+ prefix... if you advertise the v4+ address, it indicates you want v6 -> v4+ traffic to go to your network as v6 and you'll encapsulate it, otherwise it would go a (hopefully local) router that advertised the /96 prefix. If this encap/decap turned out to be popular, you might see router ASICs accelerate it, but likely it's expensive, so the work should be distributed to end points as much as possible.
Of course, there was Teredo that kind of tried to do something like, but it didn't really work out, did it?