It can't be a training pipeline, because the IPs are all around India.
Sample code from Stack Overflow being used by some major app is the most likely candidate. It's also possible that the image fetch call is a vestigial appendix that doesn't even display the image, which will make tracking this down extra challenging.
Same advice I gave a w3c.org admin who was lamenting how much traffic people generate by not caching xml schemas. Yes, you have to serve the requests. But you don't have to try to serve them in 100 ms. If a human is on the other end, 1-2 seconds is just fine. If a human is not, then the human will surely notice when their batch process goes from 3 minutes to 10 minutes because it fetches the same schema 200 times.
Well any time you start yanking levers and spinning dials you'd better know where the breaking points in your system are.
If you care about the traffic because you're already having trouble with that many simultaneous requests, then you are definitely not going to solve that problem by increasing the response time by a factor of 10.
But an important property of reverse proxies is that once the proxy sees the last byte of the response, the originating server is no longer involved in the transaction. The proxy server is stuck ferrying bits over a slow connection, and hopefully is designed for that sort of work load. If the payload is a static file, as it is in both of these cases, then it should be cheap for the server to retrieve them.
Yes, but slowloris isn't really a big deal if you've got a modern http(s) server with async i/o. It costs nearly nothing to have a idle connection while waiting 3 seconds before sendfiling the schema xml.
You can run out of sockets, but that's easy to tune. I don't know the limits on other systems, but FeeeBSD lets you set the maximum up to Physical Pages / 4 with just boot time setings. So about 1 million sockets per 16 GB of ram.
Worst case, if you start running out of sockets because you're sleeping, sample the socket count once a second and adjust sleep time to avoid hitting the cap. Also, you could use that sampling to drive decisions about keeping http sockets open or closed.
I should add, select on millions of sockets is going to suck; so you'll need kqueue/epoll/whatever your kernel select but better interface is.
It doesn't have to be quick for bad effects. See https://discussions.apple.com/thread/7908738 for example. I can't find the link now, but there was also an article about one of the older designers (IBM?) making sure the terminal cursor blinks in 1:2 ratio to reduce the problem.
Just don't give random unsuspecting people blinking images as a rule.