The motivation for citing datasets and linking them to publications is: 1) to assign due credit for the data, 2) to aid readers in locating and accessing the data, and 3) to help track the impact of the data.


A robust citation should uniquely identify the object cited, and should enable digital access to the object. The basic elements that should be present in any citation are: author(s), title and date, and location.

Formal identifiers:

While it should be possible to uniquely identify a dataset from these elements, in practice a formal identifier (like DOI) is used to enable unambiguous identification of the dataset using a resolver service (like http://dx.doi.org). Data repositories will generally assign a formal/persistent identifier to your dataset at your request prior to deposit.


Cheng, D. W., Jiang, Y., Shalev, A., Kowluru, R., Crook, E. D., & Singh, L. P. (2012). Transcription profiling by array of mouse MES-13 cells after treatment with glucose or glucosamine [scan; processed Data Matrix]. Array Express – EMBL/EBI. http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-2557

Tennstedt, Sharon, Morris, John, Unverzagt, Frederick, Rebok, George, Willis, Sherry, Ball, Karlene, and Marsiske, Michael. (2005). ACTIVE (Advanced Cognitive Training for Independent and Vital Elderly), 1999-2001 [United States]. ICPSR - Interuniversity Consortium for Political and Social Research. doi:10.3886/ICPSR04248


Cite datasets using the style and elements required by the journal editor/publisher, or adapt a standard data citation style (such as DataCite) to match the style of the journal; include data citations in the reference list alongside textual sources; cite the dataset at the level of granularity that best suits your needs.


The DataCite citation formatter will accept a DOI as input, and produce a data citation in a number of common styles.

*adapted from the Digital Curation Center, “How to Cite Datasets and Link to Publications”, http://www.dcc.ac.uk/resources/how-guides/cite-datasets