I have recently started using the high performance cluster at the Sanger Institute, and it comes with an interesting quandary. You have to be explicit in the amount of RAM you request when you submit your job (the PHE cluster wasn’t like this). Then, that amount of RAM is assigned to your job and wont be available for other jobs for the duration of the job.
This leads to an interesting question, with a few considerations, ‘how much ram should i request?’. The criteria, as far as my rudimentary understanding of HPCs goes, are
- The less RAM you request, the quicker your jobs will be run because there will be more ‘spaces’ they can fit into. i.e. if you demand loads of RAM, there will be fewer machines available with enough RAM, so you will wait longer.
- If you don’t request enough RAM, your job will crash and will have to be re-run. This has an, admittedly fairly minor, overhead for my time.
- If you request loads of RAM that you don’t need, this is very inefficient and others jobs who need that RAM will be delayed.
- Chances are, your jobs have a range of RAM requirements, even for the same workflow i.e. on a recent batch the average requirement was 4.3 with a stdev of 0.6 Gb for the jobs that finished, but 15% of the jobs ran out of memory and will need to be re-run. I requested 6 Gb of RAM per job.
I think running with 6 Gb of RAM requested seems like a decent compromise, with 85% of jobs finishing successfully, but I wonder whether there is a more principled (either mathematically or ethically) way of doing it?