In my usage Colab and Colab Pro were similar, with plain Colab occasionally OOMing during model loading. That said I've actually been seeing times slower than yours on Colab and I think they're slower than on my RTX 3080. ~15 secs per image. I'm not sure why, though.
You are much better off running it locally at those speeds. P100 does 13 to 33 seconds a batch in my experience. Cloud to cloud data transfer (Hugginface to Colab) is ridiculously fast tho.
I'm on Colab Pro and get about 3 steps per second when generating a single 512x512 image at a time, with slight throughput improvement when I batch 2-3 images
I run it locally and can generate images with 50 steps in about 6 seconds per image, would it be faster for me to use Colab Free/Pro/Pro+?