Cyberinfrastructure: Changing a Cottage Industry

min read

© 2008 EDUCAUSE. The text of this article is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License (http://creativecommons.org/licenses/by-nc-nd/3.0/).

EDUCAUSE Review, vol. 43, no. 4 (July/August 2008)

Cyberinfrastructure: Changing a Cottage Industry

Mark C. Sheehan

Mark C. Sheehan is a Fellow with the EDUCAUSE Center for Analysis and Research (ECAR). This article is drawn from “Higher Education IT and Cyberinfrastructure: Integrating Technologies for Scholarship,” ECAR Research Study, vol. 3 (2008).

Comments on this article can be sent to the author at [email protected] and/or can be posted to the web via the link at the bottom of this page.

The term cyberinfrastructure (CI) means everything, and it means nothing. The National Science Foundation (NSF) was clear enough when it coined the term in 2001,1 but subsequent usage of the term by the higher education community has been inconsistent, sometimes broadening and sometimes restricting the original scope. In addition, many practitioners find the term to be ambiguous when the time comes to put cyberinfrastructure into place or to support it. For CIOs, the dividing line between IT infrastructure for research and scholarship and infrastructure for everyday computing is blurred by variability in the level of need of researchers and scholars; by the rapid progress of technology, in which today’s supercomputer is tomorrow’s white elephant; and by higher education’s pervasive decentralized, “cottage industry” style of conducting research and scholarship.

Until recently, CIOs looking for strategic guidance about cyberinfrastructure have had relatively few resources to turn to and have had to, for the most part, make it up as they go along. For these reasons, and spurred by a request from the EDUCAUSE Net@EDU Campus Cyberinfrastructure Working Group, the EDUCAUSE Center for Analysis and Research (ECAR) recently conducted a short study of CI resources and practices among the EDUCAUSE membership. We applied a multipart research approach to this study, involving the following: a literature review to identify issues and establish the research questions; consultation with a select group of CI experts to identify and validate research questions; a quantitative web-based survey of IT administrators (mostly CIOs) at EDUCAUSE member institutions in the United States and Canada (369 responses out of 1,688 invitations, for a 21.9% response rate); and postsurvey qualitative interviews with twelve executives and staff members involved with CI resources and practices at eleven institutions.

As we prepared the survey regarding CI resources and practices, the consensus of our advisors was that the following five technologies lie at the core of campus cyberinfrastructure:

  • High-performance computing resources: Supercomputers and clusters of computers or other computational devices integrated in such a way as to provide supercomputer-like performance to individual applications2
  • CI applications and tools: General CI applications and tools that support research but are not specific to a particular discipline, including software for simulation, parallelization, visualization, job scheduling, data mining, statistical analysis, and so forth but not specific sequencing, chemical analysis, or other disciplinary applications
  • Data storage and management resources: Large-scale research data storage systems for real-time use and for archival purposes, as well as facilities, software, and procedures for periodic backup of research data sets
  • Advanced network infrastructure resources: The institution’s high-performance networks on campus and their connections to off-campus high-performance networks3 that support such capabilities as massive data transfers to and from clusters, real-time visualization, and use of remote instrumentation
  • Resources for collaboration within virtual communities: Facilities and support for teleconferencing, for hosting collaborations with off-campus researchers, and for operating remotely located research instrumentation and related devices; support for identity management and associated middleware in collaborative research activities

In addition to the obvious questions about who uses, who provides, and who funds CI resources, we asked how much is known about each of the technologies and about their use on campus, who the CEO holds accountable for various research-related activities, and whether the CIO has sufficient authority and resources to meet his or her responsibilities for those activities. We also asked about the use of collaborative practices, about institutional incentives to use those practices, and about the central IT organization’s effectiveness at integrating CI technologies to provide seamless support for research. A final category of questions forms the basis for this article: we asked how important each of the five technologies was to various academic areas in research and in teaching and learning at present and how respondents thought the importance of these technologies might change in the near future.4

Importance of Cyberinfrastructure Technologies

To get a sense of how relevant our five CI technologies were to the survey population, we asked about their importance in four academic areas: research in science and engineering; research in other disciplines; creative activities such as arts and music; and teaching and learning. As one might expect, we found that importance in science and engineering was dominant.

Because applications of CI technologies are evolving rapidly and CIOs need to plan for their deployment and support, we also asked survey respondents to tell us how they thought the overall importance of each technology would change over the next three years with regard to its use in research and its use in teaching and learning. With some interesting exceptions, we found that anticipated future importance tracked closely to current importance.

Current Importance

Figure 1 presents mean reported importance values for each of our five CI technologies in each of the four academic areas we asked about. For all technologies except resources for collaboration within virtual communities, importance to science and engineering was relatively high. For high-performance computing, data storage and management, and advanced network infrastructure, importance was above the midpoint (3.50) between “moderate” and “high.” For CI applications and tools, it was just below the midpoint. At 3.06, the mean reported importance to science and engineering of resources for collaboration within virtual communities barely exceeded “moderate” and was nearly identical to the mean importance reported for those resources in teaching and learning (3.05). Thus it appears that the first four technologies are the stuff of science and engineering research; resources for collaboration within virtual communities obviously have applications to science and engineering research, but those are not substantially more important to that area than to the other three academic areas.

Figure 1: Importance of Cyberinfrastructure Technologies to Academic Areas

Figure 1

Scale: 1=no importance; 2=minor importance; 3=moderate importance; 4=high importance; 5=very high importance

While high-performance computing and advanced network infrastructure are of the greatest importance to science and engineering research, data storage and management and advanced network infrastructure appear to be the most important to all other academic areas. Between the areas of research in science and engineering and research in other disciplines, the greatest differences are in the areas of high-performance computing (1.08 points) and CI applications and tools (0.72 points), presumably reflecting the fact that researchers in other disciplines—the humanities and social sciences, predominantly—have not broadly adopted the computation-intensive modeling and simulation applications that have become so important to science and engineering. Brian Stewart, CIO at Athabasca University, explains it this way: “For researchers in certain disciplines, there doesn’t seem to be a peer group they can look to that has taken up advanced uses of technology. The resources and expertise are within their reach here, but their level of interest just isn’t high enough to get them engaged. They don’t see yet what benefit those technologies would bring.”

For advanced network infrastructure resources as well, the difference in mean importance between these two academic areas is relatively large (0.65 points), likely reflecting scientists’ and engineers’ more frequent use of high-bandwidth connections to remote instruments as well as those researchers’ more frequent need to transport large data sets across the network, often in or near real time.

Among the four academic areas, the importance of CI technologies to creative activities is uniformly lowest. The pattern is the same for this area as for research in other disciplines: advanced network infrastructure resources and data storage and management resources are most important, and high-performance computing resources and CI applications and tools resources are least important. The mean importance of high-performance computing resources to creative activities barely exceeds “minor” (2.07).

Excepting only high-performance computing, the importance of all CI technologies to teaching and learning is ranked about as high as their importance to research in fields other than science and engineering. Resources for data storage and management, advanced network infrastructure, and collaboration within virtual communities are the technologies most important to teaching and learning. This high relative importance for collaboration resources probably reflects their value in creating instructional communities among remotely located faculty and students and in supporting remote access to expensive hardware, software, and data resources.

Not surprisingly, the importance of CI technologies in most academic areas varies significantly by institutional mission. The broad exception is in their importance to the teaching and learning area, where the two types of institution—research and teaching—are statistically indistinguishable in terms of the importance of CI technologies. We speculate that this is because the teaching and learning academic area is common to both types of institution; it is the presence or absence of a research focus that distinguishes them, not the presence or absence of teaching and learning activities.

Similarly, for creative activities such as art and music, research institutions are statistically indistinguishable from teaching institutions in terms of the importance of high-performance computing and CI applications and tools, the two CI technologies that seem most closely associated with research in science and engineering. Here, we speculate, it is the comparative irrelevance of the technologies to creative activities that is similar regardless of institutional mission.

Figure 2 depicts the distinctions between research and teaching institutions with respect to the importance of the five CI technologies in the various academic areas (only the academic areas for which significant differences emerged are reported here). For each technology, the greatest difference between the two types of institution is in the area of science and engineering research. For all technologies, the difference between means in importance to science and engineering research for the two institutional types exceeds a full point on our five-point scale. For advanced network infrastructure resources, the difference exceeds 1.5 points. Variations between the two types of institution in the importance of CI technologies to research in other disciplines follow a similar pattern, although the magnitude of the spread exceeds a full point only for data storage and management and advanced network infrastructure resources.

Figure 2: Importance of Cyberinfrastructure Technologies to Various Academic Areas, by Institutional Mission

Figure 2

Scale: 1=no importance; 2=minor importance; 3=moderate importance; 4=high importance; 5=very high importance

Among those technologies for which teaching and research institutions report statistically different importances for use in creative activities—data storage and management, advanced network infrastructure, and collaboration within virtual communities resources—the differences in mean importance are relatively small, between 0.5 and 0.8 points. This probably reflects the fact that the level of advanced use of these technologies in the creative disciplines is just a bit more uniform across research institutions and teaching institutions than in the other academic areas.

Figure 2 also reveals the relatively high importance of all CI technologies to research institutions. Whereas Figure 1 shows no technology for which mean importance reaches a value of 4.0 (“high importance”), Figure 2 shows that high-performance computing, CI applications and tools, data storage and management, and advanced network infrastructure resources are all of “high” mean importance for research in science and engineering at research institutions, exceeding 4.0. As mentioned above, these four appear to be the current core technologies for research in science and engineering at research institutions. Yet NSF’s Cyberinfrastructure Vision for the future of research gives our fifth CI technology—resources for collaboration within virtual communities—much more than lip service, dedicating a full chapter to “Virtual Organizations for Distributed Communities.”5 Did our survey respondents envision collaboration resources rising in importance to the level of the other CI technologies in the future? We will see below.

Future Importance

According to NSF’s Cyberinfrastructure Vision, “The rapidly evolving nature of cyberinfrastructure requires ongoing assessment of current and future user requirements.”6 In this spirit, we asked our survey respondents to predict how the overall importance of each CI technology would change over the next three years. We asked them to consider this question in the context of research activities as well as teaching and learning activities (but we did not ask them to distinguish between future importance to the two types of research or to consider future importance to creative activities such as art and music).

The results, presented in Table 1, exclude reports of anticipated decreases in importance, which represented only a fraction of a percent of respondents for each technology. In all cases, the extent of anticipated increase in importance was greater for institutions with a research mission than for those with a teaching mission. Research institutions consistently anticipated greater increase in importance for research than for teaching activities, whereas teaching institutions foresaw greater increased importance for teaching than for research activities.

Table 1: Respondents Anticipating Moderate or Great Increase in Importance of Cyberinfrastructure Technologies

  Research Institutions Teaching Institutions
Importance to Research   Percentage of Institutions   Percentage of Institutions
Data Storage and Management 92.8% Data Storage and Management 35.5%
Advanced Network Infrastructure 88.4% Collaboration within Virtual Communities 25.8%
Cyberinfrastructure Applications and Tools 87.0% Advanced Network Infrastructure 23.9%
Collaboration within Virtual Communities 85.0% Cyberinfrastructure Applications and Tools 18.4%
High-Performance Computing 82.6% High-Performance Computing 16.9%
Importance to Teaching   Percentage of Institutions   Percentage of Institutions
Data Storage and Management 80.6% Data Storage and Management 51.8%
Collaboration within Virtual Communities 75.7% Collaboration within Virtual Communities 45.5%
Advanced Network Infrastructure 71.3% High-Performance Computing 31.5%
Cyberinfrastructure Applications and Tools 59.0% Cyberinfrastructure Applications and Tools 31.5%
High-Performance Computing 48.1% Advanced Network Infrastructure 31.1%

For research institutions, the greatest extent of anticipated increase (measured as the total of “moderate increase” and “great increase”) was in the importance of data storage and management resources to research activities. Here, uniquely, more than three-quarters of respondents at research institutions said they anticipated a “great increase” in importance. The second-highest growth area for research activities was advanced network infrastructure. Following within two points of each other were resources for CI applications and tools, collaboration within virtual communities, and high-performance computing.

In first place for future growth in importance to research institutions’ teaching and learning activities was data storage and management resources, followed by resources for collaboration within virtual communities, advanced network infrastructure, CI applications and tools, and high-performance computing.

Teaching institutions agreed with research institutions that data storage and management resources would grow the most in importance to research activities. Nearly tied for second place for research activities were resources for collaboration within virtual communities and for advanced network infrastructure, followed by CI applications and tools and high-performance computing resources.

Respondents at teaching institutions felt that data storage and management would also be the top growth area in teaching activities at their institutions. Athabasca’s Stewart provided one insight into this by observing: “We’re accumulating huge log files within our learning management system. Our intention is to begin mining these data and applying our findings to improving the pedagogic process. I can see that once success emerges, it is going to result in exponentially increasing demand for data resources in teaching and learning. We’re preparing for that now as we plan enhancements to our storage arrays.” In second place for teaching institutions’ teaching and learning activities were resources for collaboration within virtual communities, followed by the nearly tied group of high-performance computing, CI applications and tools, and advanced network infrastructure resources.

The future importance of data storage and management resources to research and teaching institutions alike is notable here and may reflect the pain our survey respondents feel not only as they anticipate responsibility for meeting the increasing data needs in both research and teaching and learning as the sheer volume of digital information in the academic environment explodes, but also as they anticipate an increasing share of responsibility for the short- and long-term storage of research data, a responsibility that historically has fallen more to the individual researcher. Although the cost of storage continues to decrease on a per-terabyte basis, the complexities of backing up and archiving what we perceive (for now) to be massive data collections are increasing. At the same time, funding agencies’ expectations for comprehensive data-management strategies are escalating.7 Central IT would seem to have a role to play in resolving this tension, and it looks as though our survey respondents see this role coming.

In most cases, the current importance of a technology was, to a statistically meaningful extent, positively associated with its expected future importance. The higher the reported current importance, the higher was the anticipated increase in importance over the next three years. As Table 2 shows, for advanced network infrastructure and collaboration within virtual communities, current importance and anticipated change in importance went hand in hand for all academic areas. This may have to do with the overlap of the research use of these technologies with instructional activities; both technologies are well suited to activities in either area of scholarship.

Table 2: Positive Associations between Current Importance and Anticipated Change in Importance

 
Anticipated Change in Importance
Current Importance of Cyberinfrastructure Technology
Research Activities
Teaching and Learning Activities
High-Performance Computing
Research in science and engineering
X
X
Research in other disciplines
X
 
Creative activities
X*
X
Teaching and learning
 
X
Cyberinfrastructure Applications and Tools
Research in science and engineering
X
X
Research in other disciplines
X
X
Creative activities
X
X
Teaching and learning
 
X
Data Storage and Management
Research in science and engineering
X
 
Research in other disciplines
X
X
Creative activities
X
X
Teaching and learning
X
X
Advanced Network Infrastructure
Research in science and engineering
X
X
Research in other disciplines
X
X
Creative activities
X
X
Teaching and learning
X
X
Collaboration within Virtual Communities
Research in science and engineering
X
X
Research in other disciplines
X
X
Creative activities
X
X
Teaching and learning
X
X
Key:
* =Marginal significance
X=Significant positive association

These findings are interesting mostly for the exceptions. They signal that the growth trajectories for high-performance computing resources especially, but also for CI applications and tools and data storage and management resources, are rather different for research activities than they are for teaching and learning. For high-performance computing and CI applications and tools, this interpretation reinforces what we see by scanning Figure 1: the current importance of high-performance computing and CI applications and tools resources to academic areas other than science and engineering research is visibly different from the importance of the other three CI technologies.

On the other hand, Figure 1 does not help explain the divergence in trajectories for data storage and management resources that we infer from the blank cell in Table 2. The pattern for that technology in Figure 1 is similar to that for advanced network infrastructure and, excluding research in science and engineering, for collaboration within virtual communities. In the end, we are probably safest simply (1) acknowledging that resources for advanced network infrastructure and for collaboration within virtual communities are of more general utility across all academic disciplines than is the triumvirate of high-performance computing, CI applications and tools, and data storage and management resources and (2) speculating that this difference adequately explains the variance in the growth trajectories of the latter technologies by institutional mission.

Implications

Use of the five CI technologies is fairly abundant in higher education. About two-thirds of the survey respondents reported at least some research use of resources for CI applications and tools, data storage and management, and collaboration within virtual communities. Perhaps owing to the price of admission or the steepness of the learning curve, fewer—just under half—were making any research use of resources for high-performance computing or advanced network infrastructure. As we would expect, institutions whose mission focuses on research are far more likely to use CI technologies, and to use all five of them, than are institutions whose mission focuses on teaching.

Our survey respondents told us that all CI technologies except resources for collaboration within virtual communities are, by substantial margins, of greatest importance to research in science and engineering. The importance to research in other disciplines and to teaching and learning activities is generally about equal for the CI technologies, with the exception of high-performance computing, which is more important to research in other disciplines than to teaching and learning. CI technologies seem to have the least importance to creative activities, though the margins of difference between that academic area and the others are substantial only for high-performance computing and CI applications and tools. As CIOs consider which alliances to build in the pursuit of an integrated suite of CI tools and services, researchers in science and engineering would be key candidates, but our data also suggest that digital scholars in all four academic areas are likely to feel they have a stake in the resulting initiatives.

The anticipated future importance of all CI technologies is rated higher among research institutions than among teaching institutions, and future importance to teaching activities is lower at both types of institution than future importance to research activities. Although these findings appear to express confidence that change is coming in the next three years in the form of increased opportunities to apply CI resources across the board in research and instruction, the mean change that our respondents anticipate substantially exceeds “moderate increase” only for research activities at research institutions. Changes in research activities at these institutions thus may include disruptive, even revolutionary elements, but such changes seem less likely at institutions where research takes a lower priority.

Conclusion

The aim of “the cyberinfrastructure movement,” if such a thing exists, is not to revolutionize the provision of IT infrastructure and support by consolidating it in the hands of central IT. Research and discovery has traditionally been led by faculty in whose disciplines and at whose institutions individual achievement has been the sine qua non of recognition, promotion, and tenure—all elements of professional success. But this has led to a “cottage industry” approach to research and scholarship. The current focus on cyberinfrastructure seems to stem from the recognition that there are inefficiencies in this approach, that these inefficiencies may keep the costs of research higher than necessary, and that there are things central IT can do, in partnership with researchers and scholars, to lower costs without compromising the quality of their work. The basic goal of cyberinfrastructure is thus to integrate critical research technologies with an eye to encouraging and enabling collaboration among researchers and scholars, achieving economies of scale within the institution or the discipline in order to husband scarce resources, and developing a seamless fabric of research support infrastructure and services equally valuable to and usable by novice and expert alike.

Notes

1. Fran Berman, “The Human Side of the Cyberinfrastructure,” EnVision, vol. 17, no. 2 (April-June 2001), p. 1, http://www.sdsc.edu/pub/envision/v17.2/director.html.

2. The network infrastructure on which high-performance computing relies was addressed under the heading of “advanced network infrastructure resources,” and we advised respondents not to consider it in answering questions about high-performance computing.

3. Off-campus networks used for advanced network infrastructure include regional or university consortial networks and such networks as Internet2 and National LambdaRail in the United States and CANARIE in Canada.

4. The full ECAR research study is freely available at http://www.educause.edu/ResearchStudies/1010.

5. National Science Foundation, Cyberinfrastructure Council, Cyberinfrastructure Vision for 21st Century Discovery (Washington, D.C.: National Science Foundation, March 2007, http://www.nsf.gov/pubs/2007/nsf0728/nsf0728.pdf.

6. Ibid., p. 35.

7. See, for example, the NSF program solicitation for the 2010 Project, which aims to determine the functions of all genes in the research organism Arabidopsis thaliana by the year 2010. National Science Foundation, Directorate for Biological Sciences, 2010 Project: Program Solicitation, p. 9, http://www.nsf.gov/pubs/2007/nsf07591/nsf07591.pdf.