This is exactly the scenario that a bloom filter is for.
You have an expensive lookup. You're caching information on success/fail so that you don't have to do the expensive lookup every time. But the caches are getting large.
What you do is replace the local caches with a bloomfilter. That data structure takes a bounded amount of memory. When it says, "No, I have not seen you before," you really haven't. And when it says, "I might recognize you," it is only sometimes right. However its mistakes will not really matter because you'll do the expensive lookup.
The tradeoff is that the more data you put into a bloom filter, the higher the odds are that it will think think you might have seen things before, and therefore the less useful it becomes. But in this caching situation, it saves you work even if the false positive rate is fairly high.