Data Management

Responsible data management begins with planning for data collection and continues after the work is published. It begins with experimental design and protocol approval; it involves record keeping in a way that ensures accuracy and avoids bias; it guides criteria for including and excluding data from statistical analyses; and it entails responsibility for collection, use, and sharing of data. Data collected from human research participants and data of a proprietary or sensitive nature require special consideration regarding secure storage, restricted access, and ultimate disposal.

Data can include measurements, observations, survey results, videotaped interviews, or any other primary products of research activity. These provide a factual basis for inference, conclusions, and publication. If data is defined in this way as research products necessary to validate the integrity of published or reported work, then 'data' consist of much more than just measurements written in a lab notebook.

Inadequate data management, substandard archival methods, poor security , etc. can all raise doubts about the quality of your work and your integrity. Providing the raw data and analysis coupled with evidence of responsible data management is your best defense against an allegation of misconduct. Fabrication and falsification of data are the most serious challenges to the integrity of research. Widely publicized cases have included the reporting of experiments that were never performed (Engler et al., 1987), extensive falsification of data that was used to inform the clinical treatment of people with intellectual disability (Holden, 1987), and even the painting of black patches on white mice to give the appearance of successful transplant of skin patches (Hixson, 1976). Although examples like these are dramatic, the responsible conduct of research consists of more than avoiding intentional fabrication or falsification. Responsible data management is a key element in ensuring the integrity of the research record. 

UAF Specific Information

Every employee, student, and research collaborator at the University of Alaska Fairbanks is charged with ensuring the integrity of the research record. We are all part of the university, and our collective reputation depends on the behavior of everyone engaged in research, irrespective of discipline or funding source. The principal investigator (this may also be a graduate student supervisor) is the primary university employee who oversees the research design, collection, analysis, and reporting of research data and results. Graduate students, post-doctoral fellows, or collaborators may be allowed to keep copies of data at the discretion of the PI (or their supervisor). All data collected as part of sponsored research must be archived for a minimum of 5 years after publication and/or final report submission. We recommend that all other research uses this as a minimum standard for data retention. A PI leaving the University can arrange to transfer data to another institution through the Vice Chancellor for Research . A suitable data management plan is required for the IRB (human subject protections) protocol approval process.

Professional societies, research institutions, and academic journals have guidelines for responsible data management, but only a few considerations are specifically regulated by federal laws. Concern about research misconduct was a primary motivation for a 1990 conference on data management sponsored by the Department of Health and Human Services. One of the outcomes of that conference was a summary of the many ways in which the conduct of research depends on responsible data management.

Records of research are necessary and can have legal standing for a variety of purposes, including, but not limited to, demonstration of priority for claims of intellectual property, ownership or patent rights, and requests under the Freedom of Information Act. In addition, nearly all aspects of misconduct allegations hinge on the extent and quality of documentation of the research.

Legally, data are the property of the institution and not the investigator. Research by investigators within an institution is supported either directly by the institution or by funding awarded to the institution. Typically, products of work by employees of an institution are the property of that institution. Unfortunately, the question about "who owns the data?" to often becomes the focus of discussion rather than the more critical question about "how can we ensure the integrity of the data?"

Integrity of research depends on the integrity of the data. Because data provide factual basis for scientific work, the integrity of research depends on integrity in all aspects of the collection, use, and sharing of data.

Integrity of the data is a shared responsibility. Everyone with a role in research has a responsibility to ensure the integrity of the data. The ultimate responsibility belongs to the principal investigator, but the central importance of data to all research means that this responsibility extends to anyone who helps in planning the study, collecting the data, analyzing or interpreting the research findings, publishing the results of the study, or maintaining the research records.

Research data should be shared with other scientists. Progress in science will be achieved most readily when information is freely exchanged. A failure to share data can result in the unwitting repetition of failed research strategies. The sharing of research data is also in the best interests of individual researchers. In most cases, an open data policy reflects positively on those who share and increases the likelihood for new insights, collaboration, and reciprocal sharing.

The responsible conduct of research includes considerations that begin even before data collection begins. Carefully designing the study so as to identify what data will be needed helps assure that resources are not wasted and that significant results can be obtained. The time to correct problems in data collection methods is before the data are collected.

A readily accepted method for data collection is to develop a Data Management Plan (DMP). In some cases, the research funding agency may require a DMP, or Data Management and Sharing Plan (DMSP) as part of the grant proposal submission process.  These plans are utilized to maximize the appropriate sharing of scientific data, accounting for legal, ethical, or technical issues that may limit data sharing and preservation. These plans outline how the scientific data will be managed and shared. Multi-agency and institutional concurrence on a DMP must be completed before using a DMP.

Through your affiliation with UAF, you have access to a free data management plan development tool, DMPTool.

*The DMPTool is a free, open-source, online application that helps researchers create data management plans (DMPs). The DMPTool provides a click-through wizard for creating a DMP that complies with funder requirements. It also has direct links to funder websites, help text for answering questions, and data management best practices resources (DMPTool.org, 2023).

Because data collection can be repetitious, time-consuming, and tedious, its importance can be underestimated. Care should be taken to assure that those responsible for collecting data are adequately trained and motivated, that they employ methods that limit or eliminate the effect of bias, and that they keep records of what was done by whom and when.

The best model for recordkeeping will not be the same for all areas of research. However, nearly all types of research in the physical and natural sciences include records that should be kept in bound lab or field notebooks. At a minimum, such notebooks can provide a listing of the date of research, the investigators, what was done, and where the corresponding research products can be found. The lab or field notebook should be supplemented as needed by specialized methods of recordkeeping such as digital files, audio/video recordings, and gels.

The use of statistical methods varies widely among research disciplines. Although the successful cloning of a gene may require no statistical analyses, biology is characterized by variation and therefore most inferences will depend on statistical methods to quantify confidence in accepting or rejecting hypotheses. Such testing depends on many experimental and statistical assumptions. Violation of those assumptions, or a misunderstanding of the methods of analysis, can result in significant, even if unintentional, misrepresentation of the results of a study.

Because it is not possible to report everything that has been done, researchers must make decisions about which studies, data points, and methods of analysis to present. Although some of these decisions are easy, many are not. For example, should an anomalous data point be excluded from analysis when there is no known reason for the discrepancy? Most researchers would favor retaining the data point, but some fields have criteria for excluding such outliers. It would be irresponsible to exclude such a data point without clearly reporting the use of the exclusion criteria.

It is a laudable ideal to analyze and report all data; however, in practice some data must be excluded. The selection should be based on objective criteria, preferably ones specified before data collection. Critically evaluate the reasons for inclusion or exclusion of data, the measures taken to avoid bias, and possible ways in which bias may nonetheless influence selection. Clearly document how the data was obtained, selected, and analyzed-- especially if the methods are unusual or potentially controversial.

Because the products of research involve creative contributions to new knowledge, it is easy to assume that the resulting data are different from the routine products of employees in any other private or public institution. Although the language and practice of science seems to suggest otherwise, the equipment, materials and reagents, and even resulting data all belong to the institution in which they are purchased or produced. The issue becomes especially salient if a marketable product is produced, but it is also an issue when someone moves from one institution to another. If the principal investigator is moving, she or he can normally expect to take the data. However, exceptions do occur, and equipment transfer is nearly always a matter for negotiation. Absent some explicit agreement or ruling to the contrary, the principal investigator has primary responsibility for decisions about the collection, use, and sharing of data. Student or postdoctoral researchers should assume that their original data will stay with the principal investigator. However, most institutions have the expectation that graduating students may take copies of their research records. If regulations preclude researchers from taking such copies, then the principal investigator is responsible for making this clear to research group members before work begins.

This is a good discussion of data retention issues.

The quality of data supporting published work becomes moot if the data is lost. This raises issues of what should be retained, how it should be stored, and for how long.

What should be retained?
This depends in part on the nature of the products of research. Some materials, such as thin sections for electron microscopy, cannot be kept indefinitely because of degradation. It is also impractical to store extraordinarily large volumes of primary data. At minimum, enough data should be retained to reconstruct what was done.

How should it be stored?
Any stored data will be rendered useless if there are insufficient records to locate and identify the material in question. Ease of access must be balanced against security, for instance if the study involved human subjects with a reasonable expectation of confidentiality. Although the institution is the legal owner of the data, it is usually the responsibility of the principal investigator to ensure that records are stored in a secure, accessible fashion.

How long should it be kept?
Under current Health and Human Services requirements, research records must be maintained for at least three years after the last expenditure report. Federal regulations or institutional guidelines may require that data be retained for longer periods. These formal requirements are minimal constraints. Decisions about retention of records should take into account the extent to which a line of research is still being pursued, the likelihood of ongoing interest in the research, continued assurances of confidentiality for any human subjects, and the space and expense necessary for storage.

 

Although sharing of data is generally in the best interests of science and the individual, it is clear that such sharing can place an individual scientist at risk. It is reasonable to fear that sharing data before publication can result in loss of credit or opportunity. Other concerns include exposure of data to the prejudiced scrutiny of competitors or detractors, risk of compromising confidentiality of human subjects, and expense of time and resources to meet requests for sharing of data. However, reasonable strategies to minimize potential problems should make it possible to choose sharing over secrecy. Before publication, it is best to maintain an open data policy with appropriate caution. After publication, be prepared to grant reasonable access to the raw data; that is, honor requests that are in the interest of scientific inquiry and can be accomplished without inordinate expense or delay.

  • Engler RL, Covell JW, Friedman PJ, Kitcher PS, Peters RM (1987). Misrepresentation and responsibility in medical research. New England Journal of Medicine 317:1383-1389.
  • Holden C (1987). NIMH finds a case of 'serious misconduct'. Science 235:1566-1567.
  • Hixson J (1976): The Patchwork Mouse. Doubleday, New York.
  • Department of Health and Human Services (1990): Data Management in Biomedical Research, Report of a Workshop, April 1990 Chevy Chase, Maryland.
    Results of a 1990 conference on data management, including a summary of the many ways in which the conduct of research depends on responsible management of data.