As a novice, is there a benefit to using custom Node as the downloader? When I d...

wilsonzlin · on May 9, 2024

I think your curl approach would work just as fine if not better. My instinct was to reach for Node.js out of familiarity, but curl is fast and, given the IDs are sequential, something like `parallel curl ::: $(seq 0 $max_id)` would be pretty simple and fast. I did end up needing more logic though so Node.js did ultimately come in handy.

As for the Arrow file, I'm not sure unfortunately. I imagine there are some difficulties because the format is columnar, so it probably wants a batch of rows (when writing) instead of one item at a time.