File Formats should be chosen to ensure sharing, long-term access and preservation of your data. Choose open standards and formats that are easy to reuse. If you are using a different format during the collection and analysis phases of your research, be sure to include information in your documentation about features that may be lost when the files are migrated to their preservation format, as well as any specific software that will be necessary to view or work with the data.
Best practice for file format selection include:
Remember to retain your original unedited raw data in its native formats as your source data. Do not alter or edit it. Document the tools, instruments, or software used in its creation. Make a copy of it prior to any analysis or data manipulations.
Different disciplines produce different kinds of data, and some formats are recommended for long-term storage of data.
Data Type | Original Data Format | Preservation Friendly Formats (Open Standard, Uncompressed) |
Text | Hand-written, docx, wpd, odt, rtf, txt, html, xml, pdf | xml, PDF/A, txt |
Tabular Simple (minimal metadata) |
csv, tsv, pipe-delimited, xls(x), ods, dif, xps | csv |
Tabular Extensive |
sav (SPSS), sas7bdat or xpt (SAS), dta (STATA) | csv, txt with setup file or associated script (r or m) |
Database | mdb, dbf, sql, sqlite, db, db3, xml | xml, sqlite |
Visual | static: pdf, jpeg, tiff, png, gif, bmp, moving: mpeg, mov, avi, mxf |
PDF/A, tiff, JPEG2000 MPEG-4 |
Audio | wav(e), mp3, mp2, aiff, wma, aac, dct, flac, ogg, | wave, aiff |
For more, see the UK Data Service Recommended Formats or the Recommended Formats Statement of the Library of Congress