It's pretty easy to optimise away some reference counting operations. For example if you allocate something on the heap that is not returned from the function, nor passed to any other function, or captured in a closure, then you know it will be dead at the end of the function, so you don't need to emit reference counting operations for it.
Probably yeah. One thing is that it helps to have a statically typed language to do these optimisations, since you can used type-based analysis for them. So dynamically typed languages like Lisp or Python are going to have a harder time doing them.