> Usually no testing on a small test case / data set before going to the big one;
I don't understand how "no testing on small test case" will resulted in "loads of compute hours get wasted because of accidentally overwriting output data"?
> Very old fashioned workload management systems with CLI-level job setup and automation
> ...standard software best practices cannot be adopted in the HPC space (for no reason)
I agree for no reason is bad, but then sometimes there's reason.
Just a random example, for example using snakemake with SLURM has the following caveat at NERSC:
> Astute readers of the Snakemake docs will find that Snakemake has a sluster execution capability. However, this means that Snakemake will treat each rule as a separate job and submit many requests to Slurm. We don't recommend this for Snakemake users at NERSC.
> from https://docs.nersc.gov/jobs/workflow/snakemake/
It could depends on how large the HPC is.
The case mentioned above is NERSC, which I think always has a top 10 HPC at release time in top500.org.
When dealing with such a state of the art HPC,
some "standard software best practices" doesn't scale well enough for that many nodes.
In the particular example above, it is related to the SLURM scheduler,
and when there are many more nodes for it to schedule,
some usage pattern in smaller HPC platform can break it.
> Poor environment setup, little to no use of containerisation and when in use, HPC-specific solutions (as Singularity or Slurm containerisation support) tend to be more palliatives than addressing the root issue;
But how this would "loads of compute hours get wasted because of accidentally overwriting output data"?
While container could solve some problems in setting up environment,
it is not the final answer.
E.g. many HPC (if not all) does not use docker itself but roll their own Singularity or Shifter solution
is because they cannot give you root access.
Also using container does not automatically provide you optimized binaries,
meaning if one want to run HPC applications at best performance (say `-march native -mtune native` or even O flag)
will need to build their own container.
(Compiling specifically to the native arch is very important to HPC.
e.g. For NERSC Cori II using Intel Xeon Phi KNL,
AVX512 is essential for performance,
and in this specific case, we really need the Intel compilers for peak performance
as KNL is odd and GNU compiler can't optimize them as good, given they don't have much intensive to optimize for such niche product, unlike Intel has to deliver that to sell the product.)
Going back to another statement in the parent thread,
> IMHO the main issue of HPC today is not performance per se, but rather not embracing software best practices...
The main issue of HPC is really performance.
Again it might depends on how big a HPC is (in terms of top500.org ranking),
but for really high performance computing,
which always is at the cutting edge,
they are really testing new technologies that brings us to the next level.
(say exa-scale at the moment)
Think about the tech used in upcoming exa-scale or near-exa-scale machines,
some uses AMD CPU + NVIDIA GPU, some AMD CPU + AMD GPU, some Intel CPU + Intel CPU,
and some pure ARM CPU... (and the previous generation GPU-like CPU such as Intel Xeon Phi)
They are all quite exotic (compare to previous ones)
and very difficult to squeeze peak performance.
One simple metric is compare the theoretical peak performance
to that realized in https://www.top500.org/lists/top500/2021/11/
, it is very difficult for these large system to have high realization
especially as the number of nodes are increasing at that speed
(communication bottleneck is more apparent.)
(Just to mention that is a simple metric that no longer tells the whole story.
That measures only in terms of FLOPS, whereas nowadays IO can be a bottleneck in "big-data" applications.
And there are less well known metric targeting that.)
E.g. in the current generation NERSC system, Cori,
some systems with lower theoretical peak ranks higher than that
essentially because of the headroom from Intel KNL
(everyone who uses that system hates Intel KNL...
Even Intel themselves hate Xeon Phi so much that they give it up.
And their replacement solution is, well, just copying NVIDIA and give you a GPU.)
The next issue, or the same issue in disguise,
is energy. It is well known that we can build exa-scale machines
a few years back. What prevented it is the amount of energy (or power as in MW) it needs.
The main thing we need to solve to go to exa-scale is basically FLOP per watt,
whereas in the old days may be more FLOP per $.
(There's a rule of thumb that a Million Watt machine needs a Million dollar per year
energy cost.)
Frankly, I think many users of HPC continues to have bad software practices
and continue to be successful.
Bad software practice are going to bite them when like you say (generated) data are overwritten,
which basically is just wasting computer hours.
(It is rare to lose original data, but I think there's a recent news say a lot of actual data are loss in an HPC center / university.)
Or that bad software practice resulted in a lot of manual labor in baby-sitting jobs.
But in today's Science world, that's entirely irrelevant.
While I was speaking with Scientific computing in mind,
it probably is also true say in financial computing, etc.
Well, people just cares about the outcome and who cares how messy you get there.
I don't understand how "no testing on small test case" will resulted in "loads of compute hours get wasted because of accidentally overwriting output data"?
> Very old fashioned workload management systems with CLI-level job setup and automation
I think this means they are rolling their own workflow manager? By workflow manager, I mean like those listed in https://docs.nersc.gov/jobs/workflow-tools/ and https://docs.nersc.gov/jobs/workflow/other_tools/
> ...standard software best practices cannot be adopted in the HPC space (for no reason)
I agree for no reason is bad, but then sometimes there's reason. Just a random example, for example using snakemake with SLURM has the following caveat at NERSC:
> Astute readers of the Snakemake docs will find that Snakemake has a sluster execution capability. However, this means that Snakemake will treat each rule as a separate job and submit many requests to Slurm. We don't recommend this for Snakemake users at NERSC. > from https://docs.nersc.gov/jobs/workflow/snakemake/
It could depends on how large the HPC is. The case mentioned above is NERSC, which I think always has a top 10 HPC at release time in top500.org. When dealing with such a state of the art HPC, some "standard software best practices" doesn't scale well enough for that many nodes.
In the particular example above, it is related to the SLURM scheduler, and when there are many more nodes for it to schedule, some usage pattern in smaller HPC platform can break it.
> Poor environment setup, little to no use of containerisation and when in use, HPC-specific solutions (as Singularity or Slurm containerisation support) tend to be more palliatives than addressing the root issue;
But how this would "loads of compute hours get wasted because of accidentally overwriting output data"?
While container could solve some problems in setting up environment, it is not the final answer. E.g. many HPC (if not all) does not use docker itself but roll their own Singularity or Shifter solution is because they cannot give you root access. Also using container does not automatically provide you optimized binaries, meaning if one want to run HPC applications at best performance (say `-march native -mtune native` or even O flag) will need to build their own container. (Compiling specifically to the native arch is very important to HPC. e.g. For NERSC Cori II using Intel Xeon Phi KNL, AVX512 is essential for performance, and in this specific case, we really need the Intel compilers for peak performance as KNL is odd and GNU compiler can't optimize them as good, given they don't have much intensive to optimize for such niche product, unlike Intel has to deliver that to sell the product.)
And if we now agree we need to compile specific to the HPC, then container is not that attractive comparing to just compile using the HPC environment (with Cray wrapper compiler for example.) It does has other advantage related to IO as shown in https://docs.nersc.gov/development/shifter/images/shifter-pe... from https://docs.nersc.gov/development/shifter/
Going back to another statement in the parent thread,
> IMHO the main issue of HPC today is not performance per se, but rather not embracing software best practices...
The main issue of HPC is really performance. Again it might depends on how big a HPC is (in terms of top500.org ranking), but for really high performance computing, which always is at the cutting edge, they are really testing new technologies that brings us to the next level. (say exa-scale at the moment) Think about the tech used in upcoming exa-scale or near-exa-scale machines, some uses AMD CPU + NVIDIA GPU, some AMD CPU + AMD GPU, some Intel CPU + Intel CPU, and some pure ARM CPU... (and the previous generation GPU-like CPU such as Intel Xeon Phi) They are all quite exotic (compare to previous ones) and very difficult to squeeze peak performance. One simple metric is compare the theoretical peak performance to that realized in https://www.top500.org/lists/top500/2021/11/ , it is very difficult for these large system to have high realization especially as the number of nodes are increasing at that speed (communication bottleneck is more apparent.)
(Just to mention that is a simple metric that no longer tells the whole story. That measures only in terms of FLOPS, whereas nowadays IO can be a bottleneck in "big-data" applications. And there are less well known metric targeting that.)
E.g. in the current generation NERSC system, Cori, some systems with lower theoretical peak ranks higher than that essentially because of the headroom from Intel KNL (everyone who uses that system hates Intel KNL... Even Intel themselves hate Xeon Phi so much that they give it up. And their replacement solution is, well, just copying NVIDIA and give you a GPU.)
The next issue, or the same issue in disguise, is energy. It is well known that we can build exa-scale machines a few years back. What prevented it is the amount of energy (or power as in MW) it needs. The main thing we need to solve to go to exa-scale is basically FLOP per watt, whereas in the old days may be more FLOP per $. (There's a rule of thumb that a Million Watt machine needs a Million dollar per year energy cost.)
Frankly, I think many users of HPC continues to have bad software practices and continue to be successful. Bad software practice are going to bite them when like you say (generated) data are overwritten, which basically is just wasting computer hours. (It is rare to lose original data, but I think there's a recent news say a lot of actual data are loss in an HPC center / university.) Or that bad software practice resulted in a lot of manual labor in baby-sitting jobs. But in today's Science world, that's entirely irrelevant. While I was speaking with Scientific computing in mind, it probably is also true say in financial computing, etc. Well, people just cares about the outcome and who cares how messy you get there.