Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

To allow for long-term readability of your files, non-proprietary file formats that follow open and properly-documented standards should be preferred. If you plan to archive your data for more than 10 years, it is recommended to convert unusual file formats into more popular formats. Please consult the fact sheet File Formats for Archiving for further information on this topic.

 

Avoid special characters

Avoid special characters in names of files and folders. These characters hamper compatibility because they lead to undesired effects that depend on the operating system.

 

Avoid the following characters:

\ / ? : * " > < |

These characters are not allowed in Windows file names. If a folder is unpacked by WinZip, these characters are usually replaced by underscores.

...

If files are packed with WinZip, these files are moved to locations outside of their original folder due to a flaw in Linux. 

The following ASCII characters are permitted:

!#$%&'()+,-.0123456789;=@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_` abcdefghijklmnopqrstuvwxyz{}~

 

Proper use of file extensions

File extensions (such as  .txt, .pdf) should be consistent with the file format. Avoid saving files without file extensions or using special characters in the file extension. 

Limit the lengths of file and folder names

...

We currently recommend packing the data into ZIP or tar container files in order to archive large collections of heterogeneous research data sets in the ETH Data Archive (without active validation and preservation measures) over a limited time period. Using container files has the advantage that all files in an archival package are uploaded (and downloaded) in a single batch. Furthermore, the folder structure remains unchanged. 

Data preparation

Despite using file containers, we strongly recommend preparing the data as described in the first section of this document. The data should be carefully selected and the contents should be documented. Furthermore, the used file formats should still be readable in 10 or 15 years.

 

Limit the length of file and folder names

Please consider that the original folder structure may need to be recovered from the container files in various operating systems. Therefore avoid overly long path lengths when organizing your data. Path lengths exceeding 256 characters hamper further processing in Windows, and WinZip fails to unpack such containers. See also the recommendations described in section 1. 

Split large data packages

...

Please do not use the split feature of WinZip when splitting your data!

 

General comments on creating container files

...

• Avoid encrypting your files. 

Container formats and suitable software tools

...

The tar format is preferred for long-term archiving because it is an openly-documented format that does not depend on a single producer.

 

 WindowsMac
Format.zip, uncompressed.tar
Recommended tool7-Zip2Keka3; or use function „tar“ on command line
1 For file names, the lenght is limited by most operating systems to less than 256 characters.
2 Download is free of charge at http://www.7-zip.de/ (acessed 03.03.2015). Please contact your IT support.
3 Download is free of charge at http://www.kekaosx.com/de/ (acessed 03.03.2015). Please contact your IT support.