We recommend to carefully select the data, such that the archived data is of scientific relevance and worth archiving over the long term. Please remove unneeded data and avoid storing identical files in several places, such as storing ZIP-Files and their unzipped contents, multiple backups or temporary files. Private information does not belong into the ETH Data Archive.
Choose open formats
To allow for long-term readability of your files, non-proprietary file formats that follow open and properly-documented standards should be preferred. If you plan to archive your data for more than 10 years, it is recommended to convert unusual file formats into more popular formats. Please consult the fact sheet File Formats for Archiving for further information on this topic.
Avoid special characters
Avoid special characters in names of files and folders. These characters hamper compatibility because they lead to undesired effects that depend on the operating system.
Avoid the following characters:
• \ / ? : * " > < |
These characters are not allowed in Windows file names. If a folder is unpacked by WinZip, these characters are usually replaced by underscores.
If files are packed with WinZip, these files are moved to locations outside of their original folder due to a flaw in Linux.
The following ASCII characters are permitted:
File extensions (such as .txt, .pdf) should be consistent with the file format. Avoid saving files without file extensions or using special characters in the file extension.
Limit the lengths of file and folder names
We currently recommend packing the data into ZIP or tar container files in order to archive large collections of heterogeneous research data sets in the ETH Data Archive (without active validation and preservation measures) over a limited time period. Using container files has the advantage that all files in an archival package are uploaded (and downloaded) in a single batch. Furthermore, the folder structure remains unchanged.
Despite using file containers, we strongly recommend preparing the data as described in the first section of this document. The data should be carefully selected and the contents should be documented. Furthermore, the used file formats should still be readable in 10 or 15 years.
Limit the length of file and folder names
Please consider that the original folder structure may need to be recovered from the container files in various operating systems. Therefore avoid overly long path lengths when organizing your data. Path lengths exceeding 256 characters hamper further processing in Windows, and WinZip fails to unpack such containers. See also the recommendations described in section 1.
Split large data packages
Please do not use the split feature of WinZip when splitting your data!
General comments on creating container files
• Avoid encrypting your files.
Container formats and suitable software tools