Mpicopy is a tool for parallel copying large files to compute nodes.
Lisa is equipped with two types of filesystems, i.e. a network filesystem (NFS) and local filesystems on each compute node.
The advantage of NFS is that files in your home-directory are directly available on all compute nodes. A disadvantage, however, is that this type of filesystem is much slower than local filesystems. Furthermore, when a lot of people perform data transfers at the same time, the NFS performance decreases drastically.
Each compute node of Lisa has a local filesystem, which is available via the TMPDIR environment variable in your job, e.g. /scratch/123.batch.job
. The scratch space is cleaned at the start of your job. Your job might have to use a (large) inputfile. It is often desirable to copy your inputfile to the local filesystem first and to let your application read it from there. This is especially useful if the file is read more than once or used as a database with many random I/O requests. Example:
cp my_input.dat "$TMPDIR" my_app "$TMPDIR"/my_input.dat
For a serial job this is fine. For a parallel job, however, this means that all the compute nodes assigned to you, let's say 50, start copying the same file from NFS to their scratch filesystems, using scp. You can imagine that for a large inputfile the NFS server will be heavily loaded, which is inefficient and will probably hinder interactive users and jobs.
The mpicopy tool prevents the heavy load on the NFS, because it reads the inputfiles from NFS only on your first node and broadcasts them to all compute nodes assigned to you, including the first node itself. You can use this tool as follows:
module load mpicopy # define mpicopy environment module load openmpi/gnu # define your environment mpicopy my_input.dat mpiexec my_app "$TMPDIR"/my_input.dat
Note, that for mpicopy you don't need to specify the destination directory: mpicopy uses the TMPDIR environment variable by default. Mpicopy will copy recursively directory trees as well.
For more detailed information on the options for mpicopy you can consult
module load mpicopy man mpicopy
Please let us know if the functionality of mpicopy is not adequate for your purpose. We are willing to adapt the functionality of mpicopy to the needs of the users.
The SURFsara Data Archive allows the user to safely archive up to petabytes of valuable research data.
Persistent identifiers (PIDs) ensure the findability of your data. SURFsara offers a PID provisioning service in cooperation with the European Persistent Identifier Consortium (EPIC).
B2SAFE is a robust, secure and accessible data management service. It allows common repositories to reliably implement data management policies, even in multiple administrative domains.
The Data Ingest Service is a service provided by SURFsara for users that want to upload a large amount of data to SURFsara and who not have the sufficient amount...
The Collaboratorium is a visualization and presentation space for science and industry. The facility is of great use for researchers that are faced with...
Data visualization can play an important role in research, specifically in data analysis to complement other analysis methods, such as statistical analysis.