To use the Cartesius system efficiently, it is important to know something about the file systems that are available, and how to use them. Some important features are summarized in Table 1.
On native Cartesius file systems - i.e.: home, scratch, and project file systems - data will not automatically migrate to tape when the file system gets full. Therefore we set quota on the file systems for each user or group.
File system service | Quota | Performance, capabilities | Expiration | Backup service |
---|---|---|---|---|
home | 200 GiB | normal | none | daily incremental |
SURFsara archive | 50TiB for E-INFRA users , unlimited for NWO | slow | none | daily incremental backup of meta-data, continuous migration of data to tape |
scratch | 8 TiB; 4 million files | fast, parallel I/O | 14 days (scratch-shared); 6 days (scratch-local) | none |
projects | Capacity (TiB) varies per project arrangement, number of files quota are derived from capacity are also enforced | fast, parallel I/O | project duration | none |
Table 1: Summary of important features of Cartesius file system services and the SURFsara archive service
Every user has his own home directory, which is accessible as /home/<loginname>
. YOUR HOME DIRECTORY HAS DEFAULT CAPACITY QUOTA OF 200 GiB (I.E.: 200 x 230 BYTES). No quota on number of files and directories are enforced.
The 200 GiB is ample space for a work environment on the system for most users. But our helpdesk can be contacted if you think that it is not sufficient to accommodate your work environment on Cartesius. Note however that HOME DIRECTORIES ARE NOT INTENDED FOR LONG TERM STORAGE OF LARGE DATA SETS. SURFsara provides the archive facility for that. The home file systems are NFS4-based file systems. Therefore HOME DIRECTORIES ARE NEITHER SUITABLE FOR FAST, LARGE SCALE OR PARALLEL I/O. Use scratch and/or project space (see below) for fast and voluminous job I/O.
SURFsara provides a versioned incremental backup service for your home directory that is run overnight. Files that have been backed up are retained in the backup repository for three weeks after they have been deleted from the file system. So we can restore files and/or directories when you accidentally remove them or overwrite them with the wrong contents, or when they are corrupted because of some storage hardware failure – provided of course that a version already existed and was successfully backed up. Note that no consistent backup can be created of files that are removed, being changed, truncated, or appended to, while the backup process is running. The backup process will therefore simply skip files that are opened and in use by other processes.
To have a file successfully backed up:
The Cartesius scratch and project spaces are both created on high volume, high data throughput file systems that support parallel I/O. They are first and foremost designed for performance, to meet the I/O demands of massive parallel jobs.
YOUR DEFAULT SCRATCH SPACE CAPACITY QUOTA IS 8 TIB (I.E.: 8 X 240 BYTES). The I-node quota (number of files and directories per user) is set at a soft limit of 3 million files per user, and a hard limit that is set substantially higher. Most of our users will never hit the soft limit ceiling, as there is a programmed cleanup of files that are older than 6 days (on scratch-local) or 14 days (on scratch-shared). Users that produce enormous amounts of files per job may have to cleanup themselves after the job as they could reach their quota before the automatic cleanup is invoked.
If the soft limit is reached, a grace period of 7 days starts counting down. If you clean-up within the grace period, and do not grow to reach the hard limit, you will not notice anything of the limit. If the hard limit is reached or if you fail to cleanup to get a usage below the soft limit in due time, the file system refuses to create new files and directories for you.
Scratch directories can be created as seen fit in two locations:
/scratch-local/*
/scratch-shared/*
"/scratch-local/
" behaves like it is local to each node, whereas "/scratch-shared/
" denotes the same location on every node. But in fact not even the /scratch-local/
directories are truly (physically) local. This implies that ALL SCRATCH DIRECTORIES FALL UNDER THE SAME SINGLE PER USER QUOTA REGIME.
It also implies that in fact all /scratch-local/
directories are in fact visible from all nodes, if you know the canonized fully qualified file names. This can be seen with:
[donners@int2 ~]$ readlink -f $TMPDIR /scratch/nodespecific/int2/donners
Note that TMPDIR
environment variable is set to a default value of /scratch-local/<loginname>
and the corresponding directory is already created, or is created when you log in, or a batch job is started.
/scratch-shared/
behaves like scratch space that is shared by all nodes.
Please create your own subdirectory under scratch-shared
, e.g. with the command: mktemp -d -p /scratch-shared
.
FOR SCRATCH SPACE THERE IS AN AUTOMATED EXPIRATION POLICY OF 6/14 DAYS (FOR SCRATCH-LOCAl / SCRATCH-SHARED). FILES AND DIRECTORIES THAT ARE OLDER, I.E.: THAT HAVE NOT CHANGED THEIR CONTENTS FOR THIS DURATION, ARE REMOVED AUTOMATICALLY. There is no guarantee however that files are actually retained for at least 6/14 days. Serious hardware failure could e.g. cause loss of files that have not reached that age.
SURFSARA PROVIDES NO BACKUP SERVICE ON SCRATCH SPACE. Job end results, or any other precious job output that you want to keep, must be copied in time to your home directory, to the SURFsara archive, or to any off site storage of your choice.
TRULY LOCAL DIRECTORIES, SUCH AS /tmp
AND /var/tmp
, SHOULD BE REGARDED AS "OFF LIMITS" FOR USERS. THEY ARE TOO SMALL AND TOO SLOW TO BE USED FOR JOB OUTPUTS. FURTHERMORE, THEY ARE NEEDED BY THE OPERATING SYSTEM ITSELF. THEY CAN BE EMPTIED WITHOUT FURTHER NOTICE AT NODE REBOOT, AT NODE RE-INSTALL - IN FACT AT SEVERAL OTHER OCCASIONS.
If you (accidentally) fill up /tmp
or /var/tmp
on a node, the operating system will experience problems. Ultimately your job, and on an interactive node you and other users as well, will experience problems.
$TMPDIR
: a per job step unique directory in /scratch-local (i.e. also unique per node).$TMPDIR
.For projects with a high on-line storage demand we can create project directories on request for an agreed upon limited period of time. Please use the below form:
or contact the SURFsara helpdesk for further advice on this service.
Projects are accessible under:
/projects/0/<projectname>
Project quota are implemented as group quota, not as user quota. The projectname defaults the name of the group used for quota administration, but it can be arranged to be something else.
WE ADMINISTER PROJECT SPACE CAPACITY QUOTA, AS WELL AS QUOTA FOR THE NUMBER OF FILES. The exact capacity is project dependent. The number of files quota are derived from the capacity quota: Each project is allocated basic quota of 1 million files and on top of that a surplus that is a - non-linear - function of the capacity quota.
For large projects the average file size must be larger than for smaller projects. The following table contains some reference values for resulting number of files quota, and the resulting average file size.
Capacity | Number of files | Avg. file size |
---|---|---|
1 TiB | 1,000,000 | 1.05 MiB |
5 TiB | 1,359,881 | 3.86 MiB |
10 TiB | 1,728,141 | 6.07 MiB |
50 TiB | 3,766,218 | 13.92 MiB |
100 TiB | 5,605,170 | 18.71 MiB |
200 TiB | 8,492,952 | 24.50 MiB |
300 TiB | 10,879,241 | 28.91 MiB |
Project space can be considered user managed scratch space. THE PROJECT SPACE ITSELF HAS AN AGREED UPON END DATE. BUT THERE IS NO EXPIRATION POLICY FOR THE AGE OF INDIVIDUAL FILES AND DIRECTORIES IN YOUR PROJECT SPACE. PROJECT USERS THEMSELVES MUST TAKE CARE NOT TO RUN INTO THEIR QUOTA LIMITS, BY DELETING AND/OR COMPACTING AND ARCHIVING DATA NO LONGER NEEDED. Note that SURFSARA PROVIDES NO BACKUP SERVICE ON PROJECT SPACE. Because of the high volume and data mutation and/or growth rate, an integral and incremental file system backup service for project spaces is very unfeasible and very unaffordable. You, the user, must decide which data are precious and should be archived.
This implies that data you have created on project space – probably spending a quite substantial amount of your compute budget doing so - are at risk. If you have not arranged for a backup, a restore possibility, your data will be irrevocably lost in case serious damage is caused to your files or to the file system at large, by failing hardware or human error. SURFsara provides the archive facility for long-term data storage, but you may of course also use off site storage of your choice. BUT IT IS YOUR OWN RESPONSIBILITY TO ARCHIVE AND TO KEEP TRACK OF WHAT YOU ARCHIVED WHEN AND WHERE.
The purpose of project space is to enable fast and bulky reading and/or writing of files by large and/or many jobs, not long term storage of data. WHEN THE AGREED UPON PERIOD OF TIME OF YOUR CARTESIUS PROJECT SPACE EXPIRES, THE PROJECT SPACE WILL BE MADE INACCESSIBLE. IF NO FURTHER NOTICE FROM THE PROJECT SPACE USERS IS RECEIVED, THE FILES AND DIRECTORIES IN YOUR PROJECT SPACE WILL EVENTUALLY BE DELETED AFTER A GRACE PERIOD OF AN ADDITIONAL FOUR WEEKS.
ALL MEMBERS OF THE GROUP USED FOR QUOTA ADMINISTRATION WILL BE NOTIFIED BY AN E-MAIL TO THEIR E-MAIL ADDRESSES REGISTERED IN THE SURFSARA USER ADMINISTRATION 30 DAYS IN ADVANCE OF THE EXPIRATION DATE. A SECOND NOTIFICATION MAIL WILL BE SENT OUT THE DAY AFTER EXPIRATION.
Quota on project file systems are per group rather than per user. USERS OF THE PROJECT SPACE MUST BE MEMBER OF THE GROUP USED FOR QUOTA ADMINISTRATION FOR THE PROJECT AND THEY MUST WRITE FILES AND DIRECTORIES WITH THIS GROUP OWNERSHIP. In most cases this works correctly by default but some commands that try to set group ownership (e.g. "rsync -a
" or "cp -p
") will fail without the extra options described below.
A few tips about the usage of different commands in this context:
chgrp [group] [dir]
and change permissions with chmod g+s [dir]
.rsync -a
command can be used to transfer data to a project directory with the options --no-g --chmod=Dg+s
AFTER the option -a
. With the --no-g
option in place, rsync
does not try to preserve group ownership. The option --chmod=Dg+s
sets the setgid bit on all newly-created directories, to ensure that all files are written with the correct project group. To enable write permission for the whole project group, use --chmod=Dg+s,g+w
.cp -p
command can be used to copy files to a project directory with the option --no-preserve=ownership
, to make sure that the file is written with the correct project group.umask 007
before creating any files on the project space. This will give the full permissions to the user and the group and no permissions to anyone else.For users involved in more than one data project it is theoretically possible to store data in multiple project directories using any quota group that they are member of quasi-randomly. This is unwanted behaviour: FILES AND DIRECTORIES WITH A GROUP OWNERSHIP USED FOR THE QUOTA ADMINISTRATION OF A PARTICULAR DATA PROJECT MUST ALL BE PLACED UNDER THEIR RESPECTIVE PROJECT ROOT DIRECTORY. CONVERSELY, ONLY SUBDIRECTORIES AND FILES LOCATED BELONGING TO THE PROJECT SHOULD BE PLACED UNDER THAT DIRECTORY. SURFSARA WILL MAINTAIN THESE RULES, IF NEED BE WITH PERIODIC CORRECTIVE ACTIONS THAT CHANGE GROUP OWNERSHIP WITHOUT FURTHER NOTICE.
In principle the lifetime of a project directory is not extended beyond the lifetime of an associated compute project. After all, project spaces for projects that cannot be active are wasting high performance storage resources. In some cases however, a follow up project could make efficient use of the same data without first having to stage them from an archive into a new project space. This may be a valid reason for retaining a Cartesius project space "in between projects". Demonstrating, before the grace period has ended, that the project proposal for a follow up project and destined "heir" of the project space has actually been submitted, is mandatory. New limits and expiration dates will have to be re-established and motivated by the needs of the follow-up project.
The SURFsara archive is not a Cartesius file system. It is an independent facility for long term storage of data which has a tape storage backend. It will be around when Cartesius is long gone and it is accessible from other systems too. You have other options for archiving data, e.g at your home site. But SURFsara provides the SURFsara archive as an option to all users who additionally request it. For small E-INFRA grants (including usage of Cartesius) maximum 50 TiB can be granted. Larger request can be granted via NWO or via a separate "pay-per-block" contract.
On several Cartesius nodes the SURFsara archive is available as an NFS mounted file system.
Every user has his own archive directory: /archive/<logginname>
.
But it is also possible to transfer files using network services such as scp, sftp or grid-ftp.
The /archive file system can be relatively slow. Not just because the file system is NFS mounted.
Use the /archive file system as an archive.
dmftar
is tape-based system aware and can therefore efficiently store and retrieve your data from the archivedmftar
can work remotely from any system without using the file system mount /archive
Use dmget to improve the read performance.
dmget -a
command to change the DMF status of the file from OFL (offline / on tape) to DUL (dual-state: both online and on tape). This will increase the speed of copying (access time).dmls -l
command.For more information, see this separate page about the archive
You can check the disk quota with the command myquota
. Below is an example of its output.
[xyzabc@int1 ~]$ myquota HOME file system "/nfs/home4", disk quotas for USER xyzabc: blocks quota limit grace files quota limit grace 196G 200G 240G 533k 0 0 SCRATCH file system "/scratch", disk quotas for USER xyzabc: capacity quota limit files quota limit [status] 0.0132T 8.0000T 8.0000T 11067 0 0 [OK] PROJECT file system "/lustre1", disk quotas for GROUPS involving user xyzabc: (project-)groupname capacity quota limit files quota limit [status] xybiggestprj 18.8006T 25.0000T 25.0000T 271890 1000000 1000000 [OK] xyotherprj 8.4624T 16.0000T 16.0000T 829456 2109035 2109035 [OK] PROJECT file system "/lustre4", disk quotas for GROUPS involving user xyzabc: (project-)groupname capacity quota limit files quota limit [status] xyz_lglprj 2.4624T 16.0000T 16.0000T 829456 2109035 2109035 [OK] [xyzabc@int1 ~]$
The tool lists home directory quota, scratch quota, and project space quota per file system. The mentioned file system names, e.g. "/nfs/home4
" for a home directory, may look unfamiliar.
sn't the home directory directly under "/home
"? Well, Yes and no. In reality there are a number of home file systems as well as a number of project space file systems. With the help of symbolic links - "redirection pointers" in the file system - the name space for users is kept uniform and simple. However, the quota tooling really operates on a per file system basis and it typically immediately translates all symbolic links to the actual underlying target pathnames.
The three project spaces of user xyzabc, in the example above , will correspond to what the user would normally see as: /projects/0/xybiggestprj
, /projects/0/otherprj
, and /projects/0/xyz_lglprj
.
The /scratch and /projects directories are Lustre parallel file systems. It is possible to read or write the same file in parallel from multiple nodes with specific software libraries, e.g. MPI-IO, HDF5, NetCDF4 or SIONlib. The Lustre file system uses 48 OSTs (=Object Storage Targets), each with multiple disks, to store all the data in parallel. The OSTs are connected with InfiniBand to the compute nodes of Cartesius. By default, the Lustre file system stores each file on a single OST, which works quite well for most situations where files are accessed from a single process and files are relatively small (<10GB). However, when very large files need to be read or written in parallel from multiple nodes, Lustre striping is needed to reach a good performance. Lustre striping is the use of multiple OSTs to store a single file, which increases the maximum bandwidth to the file. To set the striping for the file 'large_model_output' to use all available OSTs with a stripe size of 2MB, please use the command:
lfs setstripe -s 2M -c -1 large_model_output
The default stripe size is 1MB and it's recommended not to choose this smaller.
The I/O Benchmarks section of this document provides more information about choosing an adequate stripe size and count.
With ACLs you define on a per-user or per-group basis who are allowed to access your files.
The SURFsara Data Archive allows the user to safely archive up to petabytes of valuable research data.
Persistent identifiers (PIDs) ensure the findability of your data. SURFsara offers a PID provisioning service in cooperation with the European Persistent Identifier Consortium (EPIC).
B2SAFE is a robust, secure and accessible data management service. It allows common repositories to reliably implement data management policies, even in multiple administrative domains.
The grid is a transnational distributed infrastructure of compute clusters and storage systems. SURFsara is active as partner in various...
Spider is a dynamic, flexible, and customizable platform locally hosted at SURF. Optimized for collaboration, it is supported by an ecosystem of tools to allow for data-intensive projects that you can start up quickly and easily.
The Data Ingest Service is a service provided by SURFsara for users that want to upload a large amount of data to SURFsara and who not have the sufficient amount...
The Collaboratorium is a visualization and presentation space for science and industry. The facility is of great use for researchers that are faced with...
Data visualization can play an important role in research, specifically in data analysis to complement other analysis methods, such as statistical analysis.