Guide: How to publish your research data

Choosing a data repository - aka Where to publish the data

There are generally two different kinds of data repositories:

General-purpose repositories that allow uploading and publication of a folder with in principle arbitrary structure and content. They have a lower barrier of entry and are by far the most common type. On the downside, the content is usually not machine actionable.
Area-specific repositories that parse and normalize uploaded data to ensure a common standard and rich metadata for machine-actionability. Our chosen repository for materials science and condensed matter is NOMAD, developed and maintained by the NFDI consortium FAIRmat. ct.qmat is a member of FAIRmat and we are working on adapting our workflows to easily publish in NOMAD. As such, we are running a ct.qmat NOMAD Oasis that combines electronic lab notebook, analysis frameworks powered by a JupyterHub and a data repository with rich metadata.

Your data is out of the box suitable for NOMAD if it has been produced with simulation software supported by NOMAD¹ or is experimental data stored in NeXus² files. In this case we recommend you publish through NOMAD, in other cases, it might be suitable to fall back to a general-purpose data repository, there are several suitable choices in order of decreasing preference:

Many journals have their own repository, where they require to publish associated research data.
Institutional repositories
- Members of TU Dresden can use OpARA
- Members of JMU Würzburg can use WueData, Examples of published data can be accessed here.
If none of the above repositories suit your needs, you can use a public data repository, such as Zenodo, operated by CERN.

Three simple steps

To publish research data, we recommend following the three steps below before submitting a paper to arXiv or a journal. If you publish through NOMAD, you can jump directly to 2. Upload to Repository.

1. Data preparation

Check whether the journal requires or provides the option to publish the electronic research data with the journal. If yes, follow the guidelines of the journal, if not, proceed with preparing your data for WueData. Even in the first case, you may find the following text useful.
Prepare a folder with all relevant data on your computer. This folder will later be uploaded to the repository. Organize the folder in a logical and understandable manner, avoiding deep folder hierarchies. In most cases you can use the structure of the paper to organize the data, e.g., by creating a subfolder for each figure (incl. appendices). Create a human-readable plain text README file (e.g. README.md or README.txt) in the root folder that describes the relevant data/codes/configuration files, where to find them, and (if applicable) what needs to be done to reproduce the results.
Include at least the data which has directly been used in the publication. This means, that every figure in your publication should be accompanied by the extracted data in a format that is readable to others. For example, if there is a color plot, the underlying data array should be published. If possible, aim to export your data in a common format, for example csv or hdf5 files. Avoid data formats that need proprietary software to view. It is acceptable to upload the data in another format if that data file is accompanied by an instruction on how to load the data.
Make sure that the uploaded data is clearly arranged for external users. For example, use meaningful filenames such as Figure_1_Panel_c.
On top of the data described above, discuss with your coauthors what other data is useful to share. Best practice is to publish all raw data, all custom-made codes and all relevant scripts and configuration files of instruments and codes together with a description of how the data is processed (e.g., in the readme file). Record the software packages that you used, including their versions. Include source codes and/or scripts you used to process the data. The goal is that others can reproduce the published results using the published codes and the measured raw data.
For the publication of code and scripts make it portable and usable by others. For example, do not read data with absolute paths (e.g., C:/my_name/PhD/project/raw_data/measurement.hd5), but only with relative paths (e.g., raw_data/measurement.hd5).
Double-check everything. Make sure that all coauthors and other relevant persons (e.g., authors of codes you want to publish) have agreed to the publication of the data, scripts, and codes. Remove all unnecessary files, non-shareable data objects (raw and processed!), passwords hardcoded in your scripts, comments containing private information, and so on.
Make a single archive file from your data folder. It is recommended to use zip, as it is supported by practically every operating system natively. Your data is now ready to be published. You can find an exemplary data package here.

2. Upload to Repository

Instructions for NOMAD

For more details, see the instructions on the NOMAD website here, here and here. Aside from the assignment of a DOI and registration of an account, the instructions are identical for our ct.qmat NOMAD Oasis.

If you do not yet have an account on the central NOMAD server at https://nomad-lab.eu/prod/v1/, registering is straightforward and requires your email address.
You have to create a new upload, accessible through the menu point PUBLISH -> Uploads at the top left.
- You can create a new empty upload or add an example upload that demonstrates NOMADs capabilities.
- You can change the name of the upload. This if for your convenience and does not have a functional impact.
- An upload is a folder structure, akin to a project folder. But, contrary to the other repositories discussed on this site, a publication in NOMAD consist not primarily of this folder with its files and subfolders, but a set of entries in the NOMAD archive in a standardized format. Nonetheless, raw files underlying the entries will also be accessible.
- Files can be added via drag&drop, clicking, or via the API, NOMAD will automatically extract .zip. and .tar.gz files.
- NOMAD automatically scans for supported file formats. Recognized files will be parsed and normalized, meaning that the data will be extracted and transformed into a standardized format, automatically generating an entry for each mainfile.
Add coauthors and reviewers: Through the button with the tooltip "Manage upload members" to the right of the upload name, one can add coauthors and reviewers. Coauthors can edit the upload, while reviewers only have read access. Note that you will have to list the co-authors before publishing.
Edit author metadata: Through the button "Edit author metadata", one can add additional information to some or all of the entries created during parsing. Namely comments and references (e.g. links to publications). Furthermore, one can create a dataset containing the entries or add the entries to an existing dataset, thereby combining entries from multiple uploads. Datasets are the objects that can have a DOI in NOMAD.
Publish Upload:
- This will publish the upload and move it out of your private staging area into the public NOMAD. This step is final. All public data will be made available under the Creative Commons Attribution license (CC BY 4.0).
- If you wish, you can put an embargo on your data. This makes some metadata (e.g. chemical formula, system type, spacegroup, etc.) public, but the raw-file and archive contents remain hidden (except to you, and users you explicitly share the data with). You can already create datasets and assign DOIs for data with embargo, e.g. to put it into your unpublished paper. The embargo will last up to 36 month. Afterwards, your data will be made publicly available. You can also lift the embargo sooner if you wish.
Assign DOI: To assign a DOI after publishing, you need a dataset as described in point 4. above. Trough the menu point PUBLISH -> Datasets, you can view your datasets and assign a DOI to each.

Instructions for WueData

If you register for the first time, please ask university-wide RDM team (wuedata@uni-wuerzburg.de) to add you to an existing workspace or to create one for your research unit. You need to provide the following information:
- Your name
- The workspace that is assigned to your research unit, e.g. "Experimentelle Physik 4"
- How many data packages you want to publish and how much storage you need.
- Who will bear the publication cost beyond the free 2TB, keep the responsible person (i.e. group leader; chair holder) in the loop.
Logging into WueData. You can now select any workspace that was assigned to you in the overview. Workspaces can be assigned to a person or to a group. Within a workspace, all users assigned to this workspace have read and write access to the unpublished research and metadata stored in it.
Create your data packages.
- At the beginning you only need to provide a title that can be changed later.
- Upload your data via Drag and Drop
- If you upload zipped data, ensure to allow the system to unpack the data during upload (add a checkmark "Unpack archive when uploading")
- Add a README file
Fill in all Metadata.
- Your group leader (i.e. chair holder) holds the rights.
- Funding and publisher should be listed with Research Organization Registry (ROR).
  - ct.qmat is https://ror.org/00kkpv737
- Add the URI of research grants that funded the research.
- Additionally, mention ct.qmat in the field "Description", since "Funding" is at the moment not searchable.
After double checking the uploaded data and the metadata, click the button "Bereit zum Publizieren" and contact wuedata@uni-wuerzburg.de. They will than check if they can open and access the data and will send you a curation protocol.
Incorporate all points and contact wuedata@uni-wuerzburg.de again. The data will then be published after a maximum of two days. The data package obtains a DOI, which you can include in your Data Availability section of your manuscript.
After your paper is published you can add the DOI of the corresponding paper publication to the Metadata. The data are now stored for a minimum of ten years and cannot be deleted or modified anymore.

Instructions for OpARA

3. Update preprints and publications

Your preprint and later the paper should cite the data repository. For example, Phys. Rev. recommends adding a sentence before the acknowledgements. You can use something like “The supporting data and codes for this article are available from WueData [REF].” Here [Ref] is an entry in your bibliography citing the DOI of the form AUTHOR NAMES, YEAR, WueData, url. All preprints and papers published within ct.qmat should also be added to our publication database which can be found under here.

For a list of supported simulation software see here under "Processing". ↩
NeXus is a FAIR data format used by NOMAD for experimental data, see here. NOMAD/FAIRmat provide tools for converting data to NeXus. If you are interested in converting your data to NeXus, you can reach out to datamanagement.ct.qmat@listserv.dfn.de our you can ask directly on the FAIRmat Discord Server. Depending on your format and devices there might already exist an easy workflow, if not, it might make sense to use one of the other repositories for now, but you can start developing a workflow for converting future data. ↩