I'm doing my PhD in HPC (part of my work is in situ stuff). One of the biggest problems is actually IO. But honestly, 250GB isn't that big. I haven't used dask, but I know people that do. I just avoid python for anything I need performance for, C++ is just always going to be faster.
Biased: you might want to look into DOE libraries. For IO I suggest ADIOS2 [0]. There's python bindings too.
One of the biggest things you can do is use a different storage type like BP (ADIOS) or hdf5. These are readable but binary. But to really determine how to speed up your problem you have to know where the bottleneck is. Is it IO or compute? With 100 workers (threads or nodes?) you aren't highly parallelized. I mean that could be a single node if it's threads.
Biased: you might want to look into DOE libraries. For IO I suggest ADIOS2 [0]. There's python bindings too.
One of the biggest things you can do is use a different storage type like BP (ADIOS) or hdf5. These are readable but binary. But to really determine how to speed up your problem you have to know where the bottleneck is. Is it IO or compute? With 100 workers (threads or nodes?) you aren't highly parallelized. I mean that could be a single node if it's threads.
[0] https://github.com/ornladios/ADIOS2