Archiving Data

From Deskins Group Resources

General Backup/Archiving

You should backup your computer often (daily to weekly). External hard drives are ok, but you should definitely backup to a secure network. WPI has archival services available for such backup. Talk to lab mates or Academic and Research Computing for getting started on this. We are currently using Globus to backup files to WPI's servers.

Data Organization

You should keep your files on the linux machines organized so you and others can readily find files. Organizing the results by project is one easy way to do this; subfolders could be by type of calculation. For instance the following shows a folder hierarchy for organizing calculations on a project.

  • Project-TiO2-Defects
    • Bulk TiO2
      • Bulk TiO2 and Defects
    • 101 Surface
      • Carbon Adsorbates
      • Nitrogen Adsorbates

Product Backup

A product could be a journal article, poster presentation, oral presentation, or something similar. For every final product you should provide Prof. Deskins a tar.gz file with all the files and details necessary to recreate that work. This includes the main product files (latex, word, powerpoint, etc.), figure files (png, tiff), simulation files, and other relevant files. Simulation files should include the inputs and outputs of the simulations. Typically these will not include all the output files. For instance, you don't need to include the large wavefunction (e.g. WAVECAR) or charge density files (e.g CHGCAR) for all calculations since we can easily generate these files as necessary. If however your analysis does need these large files (like CHGCAR for electron density analysis), then include them in the archive.

Organize your tar.gz file in directories. Put each figure in a separate directory. Create a directory for the Supporting Information. Create other directories as necessary. Don't just put all relevant files in one big directory and tar it up! Be organized! If you have questions on this, ask Prof. Deskins.

Every figure should have its own directory. That directory should include the following.

  • Figure file (png or tiff)
  • File used to generate the figure (e.g. magicplot, matplotlib, or gimp file)
  • Tar files for each data point with simulation input/output files (discussed above).

Put calculation files that you mention in the paper, but didn't make it into a graph in a directory (or more). As appropriate put in calculations that didn't make it in the final paper, but still helped you get your final results. For instance, if you are trying to model adsorption of a molecule, you may model many structures before you finally find the final most stable structure. Include these "failed" simulations in the archive in the appropriately labeled directory.