When you submit a job, your job always gets whole nodes allocated. Each node typically has 16 cores, and you are expected to use those cores. If you have 'serial' program, that is, a program that only uses 1 cpu core, simply running that program will only use 1/16th of the capacity of the machine. Especially if you have to run that program many times, it is much better to fill each machine with runs, so that you
In the remainder of this article, we will refer to one run of a program as a 'task', to distinguish this from a 'job' (which is what you will eventually submit). The idea is to to multiple tasks within one job, and for large numbers of tasks, to run multiple jobs to finish all tasks.
There are a few techniques to run multiple tasks in parallel, and they can be used in conjunction as well. The techniques range from simple to complex, where increasing complexity means that it will take you more effort from you, but it also means that it will be more powerful.
If you have a lot of tasks to do, and if either you suspect that even if you use GNU Parallel you will still run out of walltime, or if you simply want your results faster, you can use a combination of GNU Parallel and array jobs. You distribute the tasks over multiple jobs, and inside a job you use GNU Parallel to run the jobs efficiently.
In the example below, we asssume that you have 10000 serial programs that can be called like
./my_program 1 ./my_program 2 ... ./my_program 10000
and suppose that each task takes about 1 hour, so all work will take rougly 10000 cpu-hours.
The maximum walltime of a job is 5 days or 120 hours, and if a batch node has 16 cores, you can do at most 16 * 120 = 1920 cpu-hours within one job. Now 10000 / 1920 = 5.2, so you need 6 jobs to get all work done within 5 days. You could write 6 different jobs each using GNU Parallel, with a different range. The jobs would be as follows:
seq 1 1666 | parallel ./my_program
seq 1667 3333 | parallel ./my_program
seq 8334 10000 | parallel ./my_program
and you submit these 6 jobs. Now the scheduling system has a facility called 'array jobs' to facilitate exactly this. The idea is that you submit one job with a specified range. Instead of one job, you get one job for each value within the range, and in each job a variable $PBS_ARRAYID is set with a value in the specified range.
Below is a complete job script that will run
./my_program 1 ./my_program 2
and so on up to
./my_program 10000
in 6 jobs. It has the correct walltime specified, and with the line
#PBS -t 1-6
you will automatically submit 6 (array)jobs if you submit this script to the scheduler. Note that it will *not* check whether all jobs can actually finish.
You must tell the logic of the script that there are 6 jobs as well, and this is done in the lines
T_START=1 T_END=6
#!/bin/bash ## request 120 hours of walltime #PBS -lwalltime=120:00:00 ## create 6 array jobs, with $PBS_ARRAYID set for each job #PBS -t 1-6 # ------------------------------------------------------------------------ # Start of the input; change these as required # the start and end array-job id, taken from the -t option in array jobs. T_START=1 T_END=6 # start and end parameter for all array jobs; these will be used as input # to your program I_START=1 I_END=10000 # end of the parameters; do not change below, until where you run your # program # ------------------------------------------------------------------------ # compute the number of tasks from start and end. The start and end are # inclusive; the actual number if tasks is therefore one more than the # difference between end and start. N_I=$(( $I_END - $I_START + 1 )) # the same goes for the number of array jobs N_T=$(( $T_END - $T_START + 1 )) # Compute the range of task ids for this job IT_START=$(( $I_START + $N_I * ($PBS_ARRAYID - $T_START) / $N_T )) IT_END=$(( $I_START + $N_I * ($PBS_ARRAYID - $T_START + 1) / $N_T - 1 )) # ------------------------------------------------------------------------ # Now the start and end range for this job have been defined; use # GNU Parallel to actually run the tasks within this job efficiently. # # Change this to make it run your program or script. # The example below will execute something like # # ./my_program $IT_START & # ./my_program $IT_START + 1 & # ./my_program $IT_START + 2 & # ... # ./my_program $IT_END & # # and so on, but it will keep the number of running tasks equal to the number # of cpu cores, until all work is finished, so it will not overload the # system. # ------------------------------------------------------------------------ # Change "./my_program" to your script; it will actually be run # with each parameter as command-line argument seq $IT_START $IT_END | parallel ./my_program
In the above example, we estimated that with 6 array jobs, each job will get roughly 1666 tasks, and with 16 cores per batch node, each job will take approximately 104 hours. If you want your results quicker, you can simply use more array jobs; just change the number of array jobs from 6 to 10, and change it in the settings too (only the relevant lines are shown):
#!/bin/bash ## Request 10 array jobs #PBS -t 1-10 T_START=1 T_END=10
In this case, the work will be split over 10 separate jobs with each job doing 1000 taks. If each tasks takes approximately one hour, then with 16 cores per job, it will take approximately 60 hours for each job to complete.
The technique above assumes that your program or script can read a number from the command line as in
./my_program 5
where your program should read "5", and use it in its logic. This section describes how to read the number from the command line, for various programming languages.
Use
import sys parameter = int(sys.argv[1])
Note that this is a minimal version that does no error checking or exception handling.
Use
parameter <- as.numeric(commandArgs(TRUE)[1])
Note that this is a miminal version that does no error checking or exception handling.
The SURFsara Data Archive allows the user to safely archive up to petabytes of valuable research data.
Persistent identifiers (PIDs) ensure the findability of your data. SURFsara offers a PID provisioning service in cooperation with the European Persistent Identifier Consortium (EPIC).
B2SAFE is a robust, secure and accessible data management service. It allows common repositories to reliably implement data management policies, even in multiple administrative domains.
The grid is a transnational distributed infrastructure of compute clusters and storage systems. SURFsara is active as partner in various...
Spider is a dynamic, flexible, and customizable platform locally hosted at SURF. Optimized for collaboration, it is supported by an ecosystem of tools to allow for data-intensive projects that you can start up quickly and easily.
The Data Ingest Service is a service provided by SURFsara for users that want to upload a large amount of data to SURFsara and who not have the sufficient amount...
The Collaboratorium is a visualization and presentation space for science and industry. The facility is of great use for researchers that are faced with...
Data visualization can play an important role in research, specifically in data analysis to complement other analysis methods, such as statistical analysis.