Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You would still have the same bottleneck but the API request would return straight away with some sort of correllation ID. Then the workers that handle the GPU bound tasks would pull jobs when they are ready. If you get a lot of jobs all that will happen is the queue will fill up and the clients will wait longer and hit the status endpoint a few more times.

Here is an example of what it could look like: https://docs.microsoft.com/en-us/azure/architecture/patterns...



Thanks for the explanation.

Right now, we use ELB (Elastic Load Balancer) to sit in front of multiple GPU instances.

Is this sufficient or do you suggest adding Celery into this architecture?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: