|Activity||Comments and suggestions||Check||Cost|
- Are data in a spreadsheet or database clearly marked with variable and value labels, code descriptions, missing value descriptions, etc.?
- Are labels consistent?
- Do textual data like interview transcripts need description of context, e.g. included as a heading page?
- if data description is carried out as part of data creation, data input or data transcription
- low or no additional cost
- if needed to be added afterwards – higher cost codebooks for datasets can often be easily exported from software packages
- Do quantitative data need to be cleaned, checked or verified before sharing, g. check validity of codes used, check for anomalous values?
- Will data match documentation, e.g. same number of variables, cases, records, files?
- Does textual information in data need to be spell-checked?
- if carried out as part of data entry and preparation before data analysis – low or no additional cost
- if needed afterwards – higher cost
- Do you have documentation for the data that describes the context and methodology of how data were gathered, created, processed and quality controlled?
- often essential contextual and methods documentation will be written up in publications and reports
- if all data creation steps are well documented and documentation is kept well organised during research – low or no additional cost
- if documentation to be written or compiled specifically afterwards – higher cost
- Do structured metadata need to be created when data are shared via a data centre or archive, e.g. completing a [metadata entries] for the [ETH Research Collection]?
- completing a UK Data Archive deposit form may take one to two hours
- other data [archives or repositories] will have their own metadata forms
|Formatting and organising|
- Are your data files, spreadsheets, interview transcripts, records all in a uniform format or style?
- Are files, records and items in the collection clearly named with unique file names and well organised?
- if planned beforehand by developing templates and data entry forms for individual data files (transcripts, spreadsheets, databases) and by constructing clear file structures – low or no additional cost
- if needed afterwards – higher cost
- free software exists for batch file renaming to harmonise file names
- Will you transcribe qualitative data (e.g. recorded interviews or focus group sessions) as part of your research; or will you need to do this specifically so data can be more easily shared and reused?
- Is full or partial transcription needed?
- Is translation needed?
- Will you need to develop a standard transcription template or transcription guidelines, to ensure consistent formatting?
- if part of research practice – very low or no additional cost
- if not planned as part of research practice – potentially high additional cost
- is additional hardware /software needed ?
- consider cost of (time needed for) developing procedures, templates and guidance for transcribers
- calculate time needed for transcription - four to eight hours per hour recording […]
- Do analogue or paper-based research data (maps. newspaper clippings, photographs, images, text) need to be digitised to increase their potential for sharing?
- is additional equipment or software needed for scanning or conversion?
- if simply image scanning of text – relatively low cost
- if Optical Character Recognition required, with manual checking for accuracy (revising entire scanned text) – may be high cost
- if manual data entry or typing needed, e.g. to digitise tabular data – may be high cost
- Do data need to be converted to a standard or open format with long- term validity for long-term preservation?
- is additional software or hardware needed for conversion?
- for audio-visual data, converting to open digital formats can be time-consuming or require special equipment and/or software
- for databases, conversions may require checking for truncation, loss of metadata or annotation, loss of relationships, etc.
- How much data storage space is needed for the entire duration of the project?
- For long-term storage, decide which data will be kept long-term, which storage volume this represents and how long data will be stored and preserved.
- if storage is provided by the institution – cost is included in standard indirect costs or overheads
- if additional storage needed – cost server/ disk space, as well as the cost of setting up and maintenance
|Data transfer and access|
- Are special measures needed to transfer data from mobile devices, from fieldwork sites or from home equipment to a central work server?
- Do external people require access to research data?
- is software or hardware needed for data transfer, for encryption of confidential data before transfer, or for synchronisation of data files across sites?
- does remote access via VPN or secure FTP need to be arranged for external people?
- Does the institution provide regular backup or not?
- Consider how frequently backups should be done, how many backups should be stored.
- institutional backup – included in standard indirect cost/overheads
- additional backup needed – cost according to number of copies to be kept, frequency of backup and storage media needed
- Protect data from unauthorised access or use or from disclosure
- for confidential or sensitive data, determining conditions for controlling access to shared data may require extra time and discussion
- can security be arranged by institutional IT services or is extra software/hardware needed?
- data files may need encrypting before storage or transfers
|Consent for data sharing|
- Do you need to ask participants for their consent for data to be shared?
- does this require extra preparation of information sheets and consent forms; extra time for consent discussions; or training of interviewers?
- Do you need to remove identifying information […] before data can be shared?
- Anonymisation needs to be consistent throughout a data collection.
- for quantitative data (e.g. survey data) – low cost if identifiers are a priori excluded from data files, are easy to remove, or identifiable variables are coded to avoid disclosure; cost may be higher if variables need recoding afterwards to avoid disclosure
- for qualitative textual data (e.g. interview transcripts) – may be high cost as entire texts will need to be read and checked for identifying information; costs can be reduced if anonymisation is carried out during transcription (or at least highlighted during transcription)
- for audio-visual data – anonymising/editing voices or faces can be very costly and reduces the usefulness of data
- cost depends on how sensitive or complex data are and how much identifying information is recorded in the data – if only removal of names is required, cost is low; pseudonymisation will require more time
- if anonymisation is planned before data collection or transcription/digitisation – cost can be lowered
- for all files, check file properties and edit to remove disclosive information such as editor/author name
- Do other parties hold copyright in the data?
- Do you need to seek copyright clearance before sharing data?
- is time required to seek copyright clearance?
- is legal advice required?
- Will your data be deposited with a data centre or institutional repository?
- Which requirements exist to prepare data to particular standards e.g. regarding documentation or format?
- Will journal publishers require deposit of data supporting article findings?
- consider the cost of data deposit and/or longer-term storage – find out from data centre/repository/journal whether charges apply [ETH Library recommends non-commercial repositories and funders like the SNSF partly require this]
- cost in time and effort needed to prepare the data for sharing and preservation
|Roles and responsibilities|
- Do you need to allocate roles and responsibilities for various data management activities?
- if multiple partner institutions, researchers or funders are involved in research – consider cost of data management planning meetings or discussions
|Operationalising data management|
- What measures are needed to implement and operationalise data management throughout the research lifecycle?
- do you need extra time and resources to implement data management throughout your research, g. regular team meetings, setting up a collaborative research environment?
- if staff training is required - higher cost
- do you need a dedicated data manager [or data steward]?