This guide is copied and adapted from the UK Data Service (https://dam.ukdataservice.ac.uk/media/622368/costingtool.pdf) under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported Licence (http://creativecommons.org/licenses/by-nc-sa/3.0/) . Please, cite it as follows:
UK Data Service (2013). Data management costing tool. UK Data Archive, University of Essex, available at https://dam.ukdataservice.ac.uk/media/622368/costingtool.pdf
The text has been slightly adapted by the team for Research Data Management and Digital Curation at the ETH Library and brackets indicate [where the original has been changed]. Not all formulations are adapted to the situation and services available at ETH Zurich, which are partly provided free of charge to ETH Zurich members. Instead, the list indicates which questions are worth to consider when planning costs for research data management. Whether these steps require additional costs at ETH Zurich can be determined by contacting the responsible service provider (e.g., ETH Library, IT Services, Ethics Commission, or Legal Department), depending on the service under consideration.
The UK Data Service has prepared this costing tool and checklist to help formulate research data management costs in advance of research starting, for example for inclusion in a data management plan or in preparation for a funding application.
This tool considers the additional costs – above standard planned research procedures and practices – that are needed to preserve research data and make them shareable beyond the primary research team. The checklist indicates the activities to consider and cost to enable good data management. Such additional activities may require extra researcher or administrative staff time input, equipment, software, infrastructure or tools.
There are no hard and fast rules for costing data sharing requirements, as some research projects will pay more attention to detailed data documentation, organisation and formatting than others as part of routine fieldwork or preparation before analysis. Much also depends on the long-term storage, preservation and publication plans beyond the duration of the research itself. When data are deposited with a professional data centre or repository […] data preservation and dissemination activities are covered by the data centre/repository.
Check the data management activities in the table and tick those that may apply to your proposed research.
For each selected activity, estimate the additional time and/or other resources needed and cost this, e.g. people’s time or physical resources needed such as hardware or software. Find out which resources, e.g. for data storage and backup, are available to you from your institution. Consider whether you need a dedicated data manager.
Add these data management costs to your research application. Coordinate resourcing and costing with your institution, research office and institutional IT services.
Plan the data management activities in advance to avoid them competing with the need to focus on research excellence.
Remember that when your research project nears the end you do not want these additional data management activities to compete with delivery of your planned outputs, writing of publications and the timely delivery of your project. At this later stage the costs of preparing data for sharing may be significantly higher.
We encourage sharing and reuse of these materials under the terms of the Creative Commons licence below. To cite:
UK Data Service (2013). Data management costing tool. UK Data Archive, University of Essex.
The table below can be downloaded here:
|Activity||Comments and suggestions||Check||Cost|
- Are data in a spreadsheet or database clearly marked with variable and value labels, code descriptions, missing value descriptions, etc.?
- Are labels consistent?
- Do textual data like interview transcripts need description of context, e.g. included as a heading page?
- if data description is carried out as part of data creation, data input or data transcription
- low or no additional cost
- if needed to be added afterwards – higher cost codebooks for datasets can often be easily exported from software packages
- Do quantitative data need to be cleaned, checked or verified before sharing, g. check validity of codes used, check for anomalous values?
- Will data match documentation, e.g. same number of variables, cases, records, files?
- Does textual information in data need to be spell-checked?
- if carried out as part of data entry and preparation before data analysis – low or no additional cost
- if needed afterwards – higher cost
- Do you have documentation for the data that describes the context and methodology of how data were gathered, created, processed and quality controlled?
- often essential contextual and methods documentation will be written up in publications and reports
- if all data creation steps are well documented and documentation is kept well organised during research – low or no additional cost
- if documentation to be written or compiled specifically afterwards – higher cost
- Do structured metadata need to be created when data are shared via a data centre or archive, e.g. completing a [metadata entries] for the [ETH Research Collection]?
- completing a UK Data Archive deposit form may take one to two hours
- other data [archives or repositories] will have their own metadata forms
|Formatting and organising|
- Are your data files, spreadsheets, interview transcripts, records all in a uniform format or style?
- Are files, records and items in the collection clearly named with unique file names and well organised?
- if planned beforehand by developing templates and data entry forms for individual data files (transcripts, spreadsheets, databases) and by constructing clear file structures – low or no additional cost
- if needed afterwards – higher cost
- free software exists for batch file renaming to harmonise file names
- Will you transcribe qualitative data (e.g. recorded interviews or focus group sessions) as part of your research; or will you need to do this specifically so data can be more easily shared and reused?
- Is full or partial transcription needed?
- Is translation needed?
- Will you need to develop a standard transcription template or transcription guidelines, to ensure consistent formatting?
- if part of research practice – very low or no additional cost
- if not planned as part of research practice – potentially high additional cost
- is additional hardware /software needed ?
- consider cost of (time needed for) developing procedures, templates and guidance for transcribers
- calculate time needed for transcription - four to eight hours per hour recording […]
- Do analogue or paper-based research data (maps. newspaper clippings, photographs, images, text) need to be digitised to increase their potential for sharing?
- is additional equipment or software needed for scanning or conversion?
- if simply image scanning of text – relatively low cost
- if Optical Character Recognition required, with manual checking for accuracy (revising entire scanned text) – may be high cost
- if manual data entry or typing needed, e.g. to digitise tabular data – may be high cost
- Do data need to be converted to a standard or open format with long- term validity for long-term preservation?
- is additional software or hardware needed for conversion?
- for audio-visual data, converting to open digital formats can be time-consuming or require special equipment and/or software
- for databases, conversions may require checking for truncation, loss of metadata or annotation, loss of relationships, etc.
- How much data storage space is needed for the entire duration of the project?
- For long-term storage, decide which data will be kept long-term, which storage volume this represents and how long data will be stored and preserved.
- if storage is provided by the institution – cost is included in standard indirect costs or overheads
- if additional storage needed – cost server/ disk space, as well as the cost of setting up and maintenance
|Data transfer and access|
- Are special measures needed to transfer data from mobile devices, from fieldwork sites or from home equipment to a central work server?
- Do external people require access to research data?
- is software or hardware needed for data transfer, for encryption of confidential data before transfer, or for synchronisation of data files across sites?
- does remote access via VPN or secure FTP need to be arranged for external people?
- Does the institution provide regular backup or not?
- Consider how frequently backups should be done, how many backups should be stored.
- institutional backup – included in standard indirect cost/overheads
- additional backup needed – cost according to number of copies to be kept, frequency of backup and storage media needed
- Protect data from unauthorised access or use or from disclosure
- for confidential or sensitive data, determining conditions for controlling access to shared data may require extra time and discussion
- can security be arranged by institutional IT services or is extra software/hardware needed?
- data files may need encrypting before storage or transfers
|Consent for data sharing|
- Do you need to ask participants for their consent for data to be shared?
- does this require extra preparation of information sheets and consent forms; extra time for consent discussions; or training of interviewers?
- Do you need to remove identifying information […] before data can be shared?
- Anonymisation needs to be consistent throughout a data collection.
- for quantitative data (e.g. survey data) – low cost if identifiers are a priori excluded from data files, are easy to remove, or identifiable variables are coded to avoid disclosure; cost may be higher if variables need recoding afterwards to avoid disclosure
- for qualitative textual data (e.g. interview transcripts) – may be high cost as entire texts will need to be read and checked for identifying information; costs can be reduced if anonymisation is carried out during transcription (or at least highlighted during transcription)
- for audio-visual data – anonymising/editing voices or faces can be very costly and reduces the usefulness of data
- cost depends on how sensitive or complex data are and how much identifying information is recorded in the data – if only removal of names is required, cost is low; pseudonymisation will require more time
- if anonymisation is planned before data collection or transcription/digitisation – cost can be lowered
- for all files, check file properties and edit to remove disclosive information such as editor/author name
- Do other parties hold copyright in the data?
- Do you need to seek copyright clearance before sharing data?
- is time required to seek copyright clearance?
- is legal advice required?
- Will your data be deposited with a data centre or institutional repository?
- Which requirements exist to prepare data to particular standards e.g. regarding documentation or format?
- Will journal publishers require deposit of data supporting article findings?
- consider the cost of data deposit and/or longer-term storage – find out from data centre/repository/journal whether charges apply [ETH Library recommends non-commercial repositories and funders like the SNSF partly require this]
- cost in time and effort needed to prepare the data for sharing and preservation
|Roles and responsibilities|
- Do you need to allocate roles and responsibilities for various data management activities?
- if multiple partner institutions, researchers or funders are involved in research – consider cost of data management planning meetings or discussions
|Operationalising data management|
- What measures are needed to implement and operationalise data management throughout the research lifecycle?
- do you need extra time and resources to implement data management throughout your research, g. regular team meetings, setting up a collaborative research environment?
- if staff training is required - higher cost
- do you need a dedicated data manager [or data steward]?