This page describes the process for requesting resources for SURF’s Data Processing (DP) Services through NWO proposal call for Computing time on national Computer Facilities available here. This page provides information on:
The DP services cover two platforms, Spider and Grid, for more information see:
For additional information reach out to helpdesk@surfsara.nl
There are two NWO grant routes you can take to request compute time and resources. Which route you take depends on the size of your project:
The application process for either route is outlined below. To determine if you are eligible for these grants please see the call for proposal page here
For more information about which application route applies to your project please reach out to helpdesk@surfsara.nl
For an NWO small project you must apply through the SURF request portal (previously e-infra).
The max for each category that can be requested for small DP projects is outlined below. If your project exceeds the maximum in any category you should apply through the large project route
CPU | 500.000 core hours |
Storage | 200TB |
Tape | 300TB |
Grant length | 1 year |
Access to both the Spider and the Grid is requested through the same fields in the SURF request form. Therefore, you must indicate in either your (i) technical project requirements or (ii) service preference which platform you intend to use.
This portal is new as of Jan 2020, so for your first request you will need to create your account. This is a new log-in that is not linked to your existing SURF accounts
CPU time in the unit of CPU core hours can be calculated by following the formula below:
CPU core hours = wall clock time hours (per job) x #cores x #jobs
For more information about calculating CPU core hours please see the Grid FAQ
In addition to processing your data you will also require a place to store the data you wish to process. This field is broken into storage space, and archive storage.
Consider the example storage requests below for how to interpret these fields:
Storage Grid
This user is requesting:
More information about Grid storage (dCache) in the documentation here
Storage Spider
This user is requesting:
More information about Spider storage in the documentation here.
If you require a more custom storage solution than either outline above, for example:
Then please contact helpdesk@surfsara.nl for help completing your request.
Optimizing your project’s pipeline on Spider and the Grid platforms can be a challenge. This is one of the reasons why DP at SURF recommends that you request support with your NWO pilot request.
If you have any questions about how SURF advisors can support your data processing project, please reach out to helpdesk@surfsara.nl
Read through your forms, and confirm your:
Select Create
Large grants are handled via the NWO ISAAC portal and are reviewed by an external committee.
NWO large grants are for any project that exceeds the limit in any category below. Meaning that if you require ( > 500.000 core hours or > 200TB of storage or > 300TB Tape ), then you must submit your request through NWO’s large grant process.
CPU | 500.000 core hours |
Storage | 200TB |
Tape | 300TB |
Grant length | 2 years |
In order to complete the NWO form for large projects there are multiple places where you are required to identify whether you require resources from Spider or the Grid. Therefore, it is important to know which platform(s) you plan to use, and how plan to distribute your compute between them.
For help on determining which platform to use, please see the service description for DP or contact helpdesk@surfsara.nl.
Complete tables 1-9 with project information
Navigate to technical appendix B. Data Processing. Complete all of the Section B tables.
The first table is for CPU core hours. Through NWO there are three processing platforms that can be requested, and that time can be distributed between Year 1 and Year 2.
The NWO compute call accepts requests for two Grid sites, SURF and NIKHEF. These two sites both have worker nodes that you can submit jobs to. If you have a preference for only one site (e.g. SURF) you can distribute 100% of your request there, or split your request 50/50 between the two sites. In this use case, the request is asking for 1 million core hours for the first year of their project and 200.000 core hours for the second, split evenly between the SURF and NIKHEF sites.
If you are unsure how to distribute your resources across SURF and NIKHEF please reach out to helpdesk@surfsara.nl.
Spider is a single cluster hosted at SURF; therefore, compute can only be distributed across the two years you request to use it for.
CPU time in the unit of CPU core hours can be calculated by following the formula below:
CPU core hours = wall clock time hours (per job) x #cores x #jobs
For more information about calculating CPU core hours please see the Grid FAQ
Usage definitions
Grid - CPU core hours are consumed with a continuous usage model, meaning at the end of your grant period your allocated resources will be depleted. Should your project require more processing at any given period of time, then the service is available on a best-effort basis or another request for additional or reserved resources could be submitted.
Spider – CPU core hours are guaranteed with a continuous usage model, at the end of an allocation period should you have remaining allocations then you can request either an extension or a grant renewal.
There are several different storage components available for the Grid and Spider. Depending on your project(s) your data may live in multiple locations. The following are examples of storage requests for the Grid and Spider.
The following request is completed by a user who wishes to store their data on SURF’s Grid storage (dCache). They expect to collect 1PB / year of data and have 400 TB of files active and accessible at a time.
Usage definitions:
Grid storage – disk: In the case of disk, the requested amount is reserved disk space, so in this case usage is defined as the reservation and availability of the disk, not necessarily the amount of data.
Grid storage – tape: the request amount is the maximum amount of data that can be written, should you overshoot your expected data, you will need to make an amendment to your request
For more information about Grid storage, please see the documentation.
The following example is for a project that only wishes to process their data on the Spider cluster. They expect to have a static amount of data at 1 PB, that does not increase over the course of their project. This request has 4 components
Please see the documentation for each of the storage components for more details:
If you have any questions about building your use case please reach out to helpdesk@surfsara.nl
Usage definitions
Spider shared disk storage: the requested amount is the maximum amount of data that can be written, should you overshoot your expected data amount you will need to submit a request for extension or create an additional request.
Starting a new compute project on the national infrastructure can be a challenge. This is one of the reasons why DP at SURF recommends that you request support with your NWO request. Please outline the areas that you would like support.
In the table below the user is requesting a total of 140 hours of support over 2 years or approximately 1 hour /week.
Examples of expertise are outlined below:
Architectural advice
Advice from SURF advisors on how to set-up your workflows and pipelines on the national e-infrastructure. How to organize your data and software to optimize your core-hour usage. How to make use of the platform for collaboration and/or role-based access.
Data flow and management advice (platform specific)
Support on how your data can enter the system, move into and out of your jobs, and how to eventually export your output data for visualization or distribution. Which tools may work best for your project to handle your data. Data staging troubleshooting and support.
Onboarding
Training for your users on how to use Spider or the Grid. Support in configuring and troubleshooting your credentials. Troubleshooting support for your initial pipeline set up.
Project Coordination
If you would like a SURF advisor to sit in on regular meetings to keep the data processing team at SURF in close communication concerning your project’s status and needs.
SURF advisors can provide assistance with the form, including (if desired) a preview for completeness. Please reach out to helpdesk@surfsara.nl if you are not currently in touch with a SURF advisor.
NWO forms can be submitted through the ISAAC portal
It is still possible to use DP services at SURF if you are not eligible for NWO or if your project falls outside of the scope of NWO.
In these cases, please reach out to info@surfsara.nl with the following information:
After submitting your request, you will receive a notification when your application is approved and information on how to request access for your project team.
For questions about the status of a submitted application:
In order to get familiar with your DP platform please refer to the Spider and/or Grid documentation.
In case of any questions please feel free to contact helpdesk@surfsara.nl
Continuous usage model: Explains a way for calculating CPU usage, meaning your usage is based off of an expected usage / day, not necessarily the usage. For example:
Daily usage = allocated core hours / 365 days / # years
Best effort: This term describes a service where there is no guarantee that the service will be delivered or meets and guaranteed quality, meaning that they obtain an unspecified or variable rate.
CPU core hours: Unit used to mean CPU time, as defined by the formula below
CPU core hours = wall clock time hours (per job) x #cores x #jobs
For more information about calculating CPU core hours please see the Grid FAQ
Small projects, you can login to the SURF request portal to check the status of your project. If you have any questions then please reach out to helpdesk@surfsara.nl
Large projects, you can login to the NWO ISAAC portal to check the status of your project. If you have any questions then please reach out to rekentijd@nwo.nl
The SURFsara Data Archive allows the user to safely archive up to petabytes of valuable research data.
Persistent identifiers (PIDs) ensure the findability of your data. SURFsara offers a PID provisioning service in cooperation with the European Persistent Identifier Consortium (EPIC).
B2SAFE is a robust, secure and accessible data management service. It allows common repositories to reliably implement data management policies, even in multiple administrative domains.
The grid is a transnational distributed infrastructure of compute clusters and storage systems. SURFsara is active as partner in various...
Spider is a dynamic, flexible, and customizable platform locally hosted at SURF. Optimized for collaboration, it is supported by an ecosystem of tools to allow for data-intensive projects that you can start up quickly and easily.
The Data Ingest Service is a service provided by SURFsara for users that want to upload a large amount of data to SURFsara and who not have the sufficient amount...
The Collaboratorium is a visualization and presentation space for science and industry. The facility is of great use for researchers that are faced with...
Data visualization can play an important role in research, specifically in data analysis to complement other analysis methods, such as statistical analysis.