My time, your time, compute time

I have recently started using the high performance cluster at the Sanger Institute, and it comes with an interesting quandary. You have to be explicit in the amount of RAM you request when you submit your job (the PHE cluster wasn’t like this). Then, that amount of RAM is assigned to your job and wont be available for other jobs for the duration of the job.

This leads to an interesting question, with a few considerations, ‘how much ram should i request?’. The criteria, as far as my rudimentary understanding of HPCs goes, are

  • The less RAM you request, the quicker your jobs will be run because there will be more ‘spaces’ they can fit into. i.e. if you demand loads of RAM, there will be fewer machines available with enough RAM, so you will wait longer.
  • If you don’t request enough RAM, your job will crash and will have to be re-run. This has an, admittedly fairly minor, overhead for my time.
  • If you request loads of RAM that you don’t need, this is very inefficient and others jobs who need that RAM will be delayed.
  • Chances are, your jobs have a range of RAM requirements, even for the same workflow i.e. on a recent batch the average requirement was 4.3 with a stdev of 0.6 Gb for the jobs that finished, but 15% of the jobs ran out of memory and will need to be re-run. I requested 6 Gb of RAM per job.

I think running with 6 Gb of RAM requested seems like a decent compromise, with 85% of jobs finishing successfully, but I wonder whether there is a more principled (either mathematically or ethically) way of doing it?


7 thoughts on “My time, your time, compute time

  1. Having worked in the Hpc, the first pass solution is to see if you can fit your job into the limit of ram/cpu of the machines. . For example, if the majority of machines are 16 core, 64 GB machines, request 4 GB/core. This allows most other jobs to run proportionally…

  2. If it’s just one of many jobs, try and submit test jobs you expect to be outliers (eg largest fastqs) in interactive mode and monitor the resource usage (qacc -j I think in SGE) then use those settings (+ a bit) when you submit your main batch of sequences.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s