- Higher education institutions that create an internal consulting service to develop data management plans can assist researchers in crafting such plans for their grant proposals — as required by a number of federal funding agencies.
- In creating a DMP consulting group on campus, the institution should first identify a model for the service that best fits its institutional character.
- To foster the service's success, the creators must provide sufficient resources to make it an effective and efficient asset for the campus research community.
- Additionally, the institution must dedicate staff to the service who can adequately cover the skill sets required for DMP consulting.
Institutions and researchers have worked together for years on streamlining and improving campus processes to sustain institutional research at all levels. This support has ranged from assistance in initial planning and submission of grant proposals to implementation of and follow-through on those same grant activities. As the data generated by research has evolved to primarily electronic format, so have the requirements for how that data is ingested, managed, and archived by each researcher and by each institution. In addition, a critical compliance effort now faces higher education institutions: a growing number of federal funding agencies require that a data management plan (DMP) accompany each grant proposal submission. Identified by the federal government as a way to ensure that research activities and results funded with public dollars are available to the public for future analysis and discovery, a DMP with each grant proposal is evidently the first step in establishing a more comprehensive approach to research data policy.
Information technology departments at higher education institutions are keenly aware of their roles in strengthening campus services to adequately support the various stages of research activities and, in particular, how the resulting research data is managed throughout its life. Tightly woven into any discussion is the importance of identifying the predicted impact on campus cyberinfrastructure by future research activities. To successfully address these evolving needs, a formal research DMP service can be part of a larger data life cycle management process. A dedicated service allows for timely communication and planning on how best to meet the needs of each research project while also fulfilling requirements imposed by external stakeholders, such as the federal agency DMP criteria.
To strengthen support for institutional research activity and streamline the processes involved, the institution must create and manage the process of DMP development as part of both the funding proposal and the overall research data life cycle management process. This service will strengthen the institution's ability to address the needs of current and planned research activities, while simultaneously addressing the needs for efficient and timely support of data management. Such an endeavor requires the coordinated efforts, knowledge, and experience of many constituents of the institution.
What to Include in the Data Management Plan
In order to provide recommendations for strengthening institutional data management infrastructure and also define research data services an institution might pursue, it is essential to understand the data life cycle and how a DMP provides value to this process.
The model illustrated in figure 1 identifies data life cycle stages common to a number of research data life cycle models that have emerged recently. The models studied present common data life cycle stages from which data management needs can be determined. The stages could include data creation (conceptualization), data collection and description, data storage, archiving and preservation, data access, discovery and analysis, and data reuse and transformation.
Figure 1. Common data life cycle stages
Developing an Institutional DMP Service
The organization of a DMP service will vary by institution and be shaped by a number of factors, including institutional culture, organizational makeup, and geographic dispersion.
- Culture will provide a perspective that takes into account centralized and decentralized units.
- The organizational makeup will help identify the appropriate groups that would influence the planning process.
- Geographic dispersion will require a model that allows for multi-campus institutions.
Funding for the service will be driven by the organizational model insofar as budgets are most often linked to academic and administrative areas. Those areas that will offer resources to the service will be required (or "need") to budget accordingly. An alternative to this approach is for the institution to provide a central funding model for the resources required, wherever they reside.
Delivery is guided by the stakeholders who determine the workflow of the review, assessment, and approval of the plan and by software tools used to gather the required documents for submission to the funding agency.
A particularly challenging decision embedded in this process is to identify the best point within the proposal preparation sequence to interject attention to and development of the DMP itself. The days and hours leading up to grant submission are filled with all the final edits, fact-checking, and various other tasks of high priority. Intervening with yet one more item to add to the list will not be met with broad support, at least in the early stages. At the time of this writing, we see indications that campuses are beginning to request deadlines for DMP consulting and vetting within a set window prior to grant submission. What will ultimately emerge as a widely accepted and cogent practice for DMP development has yet to be determined.
Many of those in research and campus technology leadership roles will approve the practice of planning and communicating as early and as frequently in the proposal process as possible. Specific to campus cyberinfrastructure, this deliberate attention to data management within the overall research life cycle opens the door to clarifying how any proposed investment in research will affect its context. This analysis of the impact on campus cyberinfrastructure can be incorporated into many common decision workflows, including creation of new programs, construction of new facilities, establishment of new academic and research centers, or acceptance of major grants. The earlier this takes place, the better the opportunities for the institution to control costs and leverage existing resources. For example, by routinely conducting cyberinfrastructure impact analysis on grant proposals, a campus may be able to produce lower-impact ways of accomplishing a researcher's goal. At a minimum, this form of analysis allows greater predictability and financial preparation.
Model for Local Administration of the DMP Service
Identifying a model for local administration of DMP services is quickly becoming a primary objective for many institutions. Developing the model that best meets the needs of the local campus research environment requires the engagement of departments across campus. From an institutional standpoint, this service can be viewed as a core infrastructural activity in support of researchers. For the researchers, it can be the assurance that their DMP has been prepared under the guidance of those with valuable knowledge of campus resources and protocols. This form of critical insight to existing and planned institutional resources can be leveraged to make the most of grant dollars received.
As indicated earlier, best practices in administration of this process are still emerging, so few examples of widely accepted and well-tested models exist. In this early stage, investigation by the EDUCAUSE Advanced Core Technologies Initiative–Data Management Group (ACTI-DM) has found that campuses across the United States appear to be gravitating toward a couple of preferred models for administration of research DMP services:
- Embedded Small Group: A small group of designated staff are embedded in an existing department whose activities are closely aligned and/or impacted by these efforts (i.e., library, research, or IT — or in some cases an independent office supported by all three of these and possibly others). Additionally, an advisory committee may be named whose charge is to provide oversight and guidance to the staff in developing protocols that best serve a large spectrum of departments and disciplines, as well as provide guidance embedded in specific discipline topics.
- Advisory Committee: An advisory committee is selected and charged with the responsibility of providing guidance and assistance to individual faculty and departments for a collection of tasks related to data management (e.g., developing DMPs for grant proposals including discipline-specific guidance, providing DMP review services in the pre-proposal phase, identifying IT costs related to proposed research projects, etc.). These committees typically consist of cross-departmental representatives including but not limited to library, research, IT, faculty, sponsored programs, etc.
Development of DMPs is an essential function of the federal grant proposal process. A funding source should be identified to cover staffing and resources needed for this process to function efficiently and effectively, remembering that roles and services identified in the various administrative models impact budget needs. Institutions might want to begin by building support and refining the details of this process, addressing only the needs of federally funded grant proposals initially and adding processes for other state and local funding agency requirements as appropriate. Failure to develop plans that adequately address the data needs of research activities will minimally mean that the data will not be archived appropriately or be available to the research community. At worst, lack of a DMP may threaten the integrity and security of data and possibly cause the funding agency to reject the proposal.
Proper funding of the consulting activities that develop the DMPs must become a foundational process of the institution. If these activities are to be assimilated into the responsibilities of existing staff and resources, other tasks will take precedence and the DMPs will suffer. The consulting activities and related resources must be considered part of the institution's infrastructure, much like any other basic service that provides support to the institution's research activities.
Institutional Vetting of the DMP
The process for vetting and/or approval should be considered as DMPs are created. Both of the models described earlier speak to the logical selection of those involved in this process. Since these plans represent a commitment of current and future resources, budget, and risk, it is in the institution's best interest to provide some method by which plans are reviewed and verified for completeness and validity. The institutional vetting provides at least three valuable results:
- Improved quality in grant applications, leading to more grants awarded.
- Clarification of roles, responsibilities, and risks of the institution and the researcher.
- Commitment to and planning for the resources for which the institution and the researcher are responsible.
Universities continue to gather substantial collections of information directly targeting the principal investigator/researcher to assist them in development of their required DMPs. Posted online, this information provides a convenient, rich collection of tools and resources that support various components of the DMP. PIs can develop a DMP in step with the development of the grant proposal itself, tapping resources as needed. In most cases, these resources are compiled in a campus DMP website that often resides in the library or research domain, with links included to numerous other support services and resources both on and off campus.
Institutions are developing a growing list of complementary resources, essentially a "researcher's toolkit" that includes types of storage services available. Resources can be delineated by stages in the research data life cycle and might be available internally or externally to the institution. As data moves through the various stages of its life cycle, researchers can be assured it will be maintained in storage environments aligned to the varying levels of accessibility needed by the appropriate users. Initially, the data may be contained in a private state, with only the research team having access. At some point, the data may be moved to a state that allows it to be shared inter- or intra-institutionally. Finally, the data may be moved to an archive state either directly from the private or the shared environment.
Examples of storage environments that may be offered to house the data can consist of any combination of the following:
- Central IT storage, in the form of a managed storage area network, houses data for short- and long-term use.
- Decentralized research storage, such as that typically provided as part of a high-performance computing environment, provides short-term storage for use by the researcher during the heavy computation portion of the research. This storage comes at a premium and should not be considered available to the researcher for long-term archival use.
- Divisional/departmental storage environments will be used depending on the stage of research and the level of access needed during that stage. Appropriate user access will necessitate some type of federated architecture with other types of storage environments. This will require distributed access and control mechanisms, keeping in mind that the local storage may have to interact with authentication and authorization mechanisms provided by the institution or other consortia.
- Cloud-based environments, either commercial or public, have some limited offerings that researchers might find useful.
- To ensure timely backup and redundancy for all types of storage, researchers and their institutions will want to provide multiple options.
Use of External DMP Tools vs. In-House Templates
Many institutions have already created templates and examples that provide a general framework for what should be submitted to the NSF. These templates can be copied and used by their researchers as a basis for developing and submitting the required DMPs. In addition, a resource such as the online tool created by the California Digital Library can assist with the development of the DMP. Called the DMPTool, it provides a web-based environment to guide a researcher through the steps of creating a DMP, offering guidance for specific directorates within the NSF as well as the NIH and several other granting foundations and services. At the conclusion of the plan's development, a plain-text or rich-text file can be exported. This resource is an evolving tool, with development work continuing.
Use of National, International, or Discipline-Specific Repositories
The DMP will also need to identify a final destination for data created as a result of the research activity. NSF has indicated that "Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants." An obvious question then arises as to where there exists a suitable place to store the data. In addition to institutional repositories, data may be housed in national or international repositories, which typically host large data sets. Finally, some higher education disciplines have repositories dedicated to gathering publications and data from their subject area. They also may have specific guidelines for storing the data, what metadata to use, and search criteria. When considering the long-term archive of data, these types of repositories might be a possible landing spot.1
Skill Sets Required for a DMP Consulting Service
Ultimately a campus community will benefit most by ensuring that resources available through the DMP service provide a strong combination of both technical guidance and skills, as well as a solid knowledge base of the discipline in which the research is grounded.
Skills Relevant to All Types of Data
When considering the makeup of the consulting group, various types of skill sets should be taken into account. Individuals with knowledge of the following areas may need to be consulted when creating the data management plan:
- Storage: Research data will proceed through a life cycle from inception of the project to active research, publishing, and archiving. At the various stages in its life, different types of storage should be considered. Given the available storage options associated with high-performance computing, local and shared storage for intermediate results, publishing, and archiving, expertise on where and what storage is available and how to use it would be valuable to the researcher.
- Data Migration: As the research proceeds, the data may have to move from one environment to another. Depending on the type, size, and context of the data, this might involve many types of technology. If the data resides in some type of database or other structured technology, programming or utilities may be required to extract the data and put it into a more suitable format for archiving.
- Networking: Moving large volumes of data within and across multiple research environments might require significant network bandwidth. Expertise in the tools and techniques that use the network pipes would be critical to the researcher.
- Legal: Knowledge of governmental laws and regulations that may specify access, use, and management of the data is vital. For example, HIPAA, FERPA, and personal privacy laws could have a direct bearing on data generated by human research subjects.
- Financial: These issues might be particular to the grant parameters itself and housed in the grant budgeting office, but will also need to be considered beyond the scope of the grant. Expertise on costs may come from various sources on campus, including IT (e.g., for data storage costs, networking, migration, etc.) and others. In the context of the DMP service, there should be coordination between what is needed for the grant itself and any considerations beyond the grant (e.g., if data is retained at the institution beyond the scope of the grant).
- Security: Access to the data may need to be controlled at one or more stages of research and archiving. In these cases, authentication and authorization mechanisms might need to be employed to ensure that the data is accessed properly. Experts are required to implement security controls including those specific to cloud-based services.2
- Metadata creation and assignment: Basic metadata for discovery, interoperability, and provenance will be required as a minimum for deposit in institutional or disciplinary repositories.
- Scholarly data communications: There will undoubtedly be questions related to copyright, open access, policy, and a myriad of other data areas. The expertise to answer these questions, or at least point the researchers to a knowledgeable source, will be important to ensure that the DMPs are created with assurance that data issues have been addressed.
Special Skills for Different Disciplines
Different disciplines within the university will inherently produce different types of data, different volumes of data, and different contextually identifying information about the data. From the types of files being created to the metadata used to provide searchability and context, there will be a need to identify and use various digital preservation protocols. In order to provide a suitable environment for ongoing management of the research data, those skills will need to be tapped.
One challenge will be to identify individuals and/or job roles that provide these specific skill sets. This expertise could reside in the library environs, in the form of subject specialists or liaison librarians. They might exist in the graduate and postgraduate realm of research. And they may have to be cultivated in new roles that do not yet exist in some institutions.
As data management planning is considered, so too must the skills that are unique to certain disciplines. Faculty and staff with these skillsets can be engaged for consultation in order to create data management plans that are most viable. Skills should include those related to metadata creation and assignment. Different disciplines often have unique standards for metadata creation, particularly when archiving is being considered. Knowledge of discipline-specific metadata should be a factor when assisting researchers with their DMP proposals, and also when they are actually creating the metadata for deposit in a repository. The metadata that is created may be related to publications or other publishable works. Additionally, metadata may be created for data or data sets that are being placed into repositories or other types of archives.
To support increasing pressures of efficiencies and compliance, institutions are encouraged to create a data management planning service that addresses their unique research environment. Early on, the objective of this service might focus specifically on supporting the DMP as required for inclusion in a research grant proposal. However, institutional leadership should remember that indicators point toward a looming larger task — making DMPs part of the full research data life cycle — including implications for the movement to open access to published research. Thus any and all guidelines provided here are time-sensitive, and careful monitoring of current trends across peer institutions will prove valuable as this work evolves.
Based on research focusing on trends at institutions in the United States, a few key tasks have emerged as central to development of a successful DMP service. From this list, the ACTI Data Management Working Group recommends that institutions consider the following as they plan how to best support researchers in preparation of their proposals' DMPs:
- Identify a model for local administration of research DPM services.
- Provide resources for convenient access at any time during the proposal development process.
- Designate at least one or more dedicated staff to be available for a range of consulting needs, which further reinforces the value of this support system for the researcher.
IT departments play a critical role in the development of such a service. The technical infrastructure they maintain, as well as related tangential services available within that environment, form the transparent foundation on which to build a set of services aimed at ingesting, maintaining, manipulating, and preserving research data. Ongoing planning and collaboration with various stakeholder groups across the institution is critical to the development of a service that best addresses the unique needs of research across disciplines and across campus.
- "Developing an Institutional Research Data Management Plan Service," January 2013. See bottom of page 17 and top of page 18.
- Cloud strategy was identified as a Top-Ten IT issue for 2012 by EDUCAUSE members, and a number of recent reports have been published about the particular security concerns around data stored in the cloud. See, for example, Cloud Computing Synopsis and Recommendations (National Institute of Standards and Technology, Special Publication 800-146, May 2012) and NSTAC Report to the President on Cloud Computing (President's National Security Telecommunications Advisory Committee, May 2012). In addition, a useful overview of the issues in this area is "7 Things You Should Know About Cloud Security" (EDUCAUSE, August 2010).