This looks cool at first glance. I'll dig into it more. One note that may be hel...

This looks cool at first glance. I'll dig into it more.

One note that may be helpful, if all you care about is the HTML, it's better to take a "snapshot" of the page by streaming the response directly to blob storage like S3. That way if something fails and you need to retry, you can reference the saved raw data from storage vs making another request and potentially getting blocked. Node pipelines makes it really easy to chain this stuff together with other logic.

For reference, I run a company that does large scale scraping / data aggregation.