Parallec has a special super convenient response context let you pass in/out any object when handling the response. Now you can conduct scalable API calls, then pass aggregated data anywhere to elastic search, kafka, MongoDB, graphite, memcached, etc.
Python has global interpreter lock so if it is computational expensive, you have to use multi-process to use more than 1 core. Parallec can let the handler to run your onComplete() function either in worker before aggregation (parallel) or in manager after aggregation
I think what Uberneo is asking is whether Parallec would handle the html parsing like Scrapy. I believe the answer is no. You wouldn't want to slow down Parallec with parsing though, you would rather send the html output to some other process for that, right?
Thanks jstoiko!
You are right. Parallec is not specifically built to do crawling or parse website pages recursively (however you may build such crawler on top of it) We mostly use it to manage (HTTP/S) agents on every production machine in the cloud for software deployment, remediation, asset discovery etc. (Parallec like a kubenate master) to manage all the kubelet (agents)
Yes, we may or may not want to slow down. Sometimes if it is just regex or simple parsing we just put the parser inside of the worker. We can send the results out to Kafka etc so that some other process/machine can process them.
Fast Parallel Async HTTP/REST/SOAP client as a service to monitor and manage 10,000 web servers.
Sends requests to 1000 servers with response aggregation in 10 seconds. or 10,000 servers in 50 seconds.