Preservation Levels and digital preservation measures in the ETH Data Archive
In order to keep data accessible and usable over the long term, the use of open, documented and widely used standard file formats is an essential prerequisite. Please find information about our assessment of various file formats on the page “File formats for archiving”. However, the suggestions made there are often not applicable in the context of research data. Therefore, the ETH Data Archive will basically accept data in any file format. As a consequence, in many cases only bitstream preservation, i.e. storage of the data in its original form, can be guaranteed. Maintaining the usability of data in future IT environments is an ambitious task which can only be achieved for a limited number of well documented and non-proprietary file formats.
The services we provide for long-term preservation of the data stored in the ETH Data Archive therefore depend on the suitability of the file format for long-term preservation rather than on the retention period chosen.
1. Format classifications
Not suitable for archiving – Bitstream Preservation only
For data in proprietary or not clearly identifiable formats (for example experimental data in self-defined formats or output of a measuring instrument in a vendor’s proprietary format) we guarantee an undamaged storage in their original form (bitstream preservation). We cannot carry out any active preservation measures beyond this. As a rule, such data are suitable for a limited retention period. The same is true for encrypted or otherwise protected data, even if its file format would actually allow active preservation measures.
Suitable to only a limited extent – Limited Preservation Support
For some formats, we are monitoring their development and possible risks and may be able to carry out file format migrations in some cases.. Nevertheless, it is difficult to estimate and assess the consequences that may arise from such file format migrations. This is usually the case for proprietary formats that are widely used (e.g. Microsoft Office Open XML). The opportunities for action are therefore limited and extensive preservation measures cannot be guaranteed.
The unchanged storage (Bitstream Preservation) is guaranteed in any case.
Recommended – Full Preservation Support
On this level, reasonable precautions will be taken in order to maintain the data’s usability over the long term.
Such preservation measures can only be taken for formats that are widely used and follow open, documented standards.
Whenever possible, we recommend choosing a standard format for archiving of precious, non-reproducible data such as long-time observational data and series of measurements.
Again, the unchanged storage (Bitstream Preservation), is an essential prerequisite for the maintenance of usability.
For the assignment of particular file formats or groups of formats into the three classes, please refer to our format recommendations. For formats that are not listed, we will not conduct any active preservation measures. The format recommendations will be updated regularly. The current format recommendation also applies to files that are already stored in the ETH Data Archive, regardless of the format recommendation that was active by the time of their upload.
2. Preservation measures
Measure | Full Preservation Support | Limited Preservation Support | Bitstream Preservation only |
Format identificationAn unambiguous identification of the file format is an inevitable prerequisite for all active preservation measures. If a format cannot be identified, it can only be preserved as is (Bitstream Preservation). | x | x | x |
Extraction and storage of technical metadataDepending on the availability of tools for each format | (x) | (x) | (x) |
Validation against established schemes or specificationsDepending on the availability of tools for each format If needed, errors can be corrected before the ingest. | (x) | (x) | (x) |
Format specific risk analysisRisks can be inherent to the file format or can be gathered from specific properties in the technical metadata. | x | x | |
Conversion before ingestWill be offered in the framework of defined archiving workflows and can (partly) be automated. As a rule, the original format as well as the result of the conversion will be archived. | x | ||
Periodic reanalysis of archived dataCan new formats be identified? Can additional technical metadata be analyzed with new technical extractors? | (x) | x | x |
Strategic Monitoring of identified formatsHas the risk situation changed, is obsolescence to be feared or does any successor format exist that should be evaluated, in order to potentially convert data? | x | x | |
MigrationConversion of the original format into a newer format and archiving as a new version of the object. The former version of the object can be reconstructed anytime. | x | (x) |