Ithaka S+R's Report on Big Data Infrastructure at the Crossroads

min read
The CNI Interviews Podcast | Season 1, Episode 3

Ithaka S+R is a nonprofit research organization that helps academic and cultural communities serve the public good and navigate economic, technological, and demographic change. In this conversation, we discuss a report they released on the research practices and support needs of researchers working with big data. The report draws on interviews with over two hundred researchers from dozens of disciplinary backgrounds about their big data practices, with an emphasis on exploring what library leaders can learn about how to align data support with researchers’ priorities. The report is titled Big Data Infrastructure at the Crossroads.

Listen on Apple Podcasts Listen on Google Podcasts Listen on Spotify Listen on Stitcher

View Transcript

Gerry Bayne: Welcome to the CNI 2022 Podcast.

These interviews were recorded at the Coalition for Networked Information Spring 2022 Meeting.

On this episode we feature a conversation with Danielle Cooper, Associate Director of Libraries, Scholarly Communication, and Museums for Ithaka S+R and Dylan Ruediger, a Senior Analyst for Ithaka S+R.

In this interview, they share findings and insights from a recent Ithaka S+R study drawing on interviews with over two hundred researchers from dozens of disciplinary backgrounds about their big data practices, with an emphasis on exploring what library leaders can learn about how to align data support with researchers' priorities. I'll link to those materials in the shownotes.

Here's our conversation.

Danielle Cooper: So back in 2020, Ithaka conducted an inventory of research data services offered by US colleges and universities. And we did this using a systematic web searching process. This was important because data services in the US are highly decentralized, and particularly in research-intensive universities there's quite a few of them, so we needed to have a unique methodological approach of tracking the services. In contrast, you could do a survey where you would contact an individual at every institution and ask them what the scope of their data services were, and given that there's so many, it would be pretty impossible for an individual to be able to characterize that. In fact, many institutions don't have a handle on the extent to which they have data services. So in contrast, we used a methodology where we developed a representative sample and then went out and figured out a way to catalog services across institutions using that methodology simply by looking at websites.

Gerry Bayne: So what are two or three findings that you think are important for higher ed data folks to know about?

Danielle Cooper: So first of all, there is considerable variation in terms of the capacity or scope of services being offered by institution type. We see that at R1 institutions they're exceeding R2s or small liberal arts colleges by more than double the number of services that are being offered. The average at an R1 is about seven to eight services, where the average R2 is about two to three, and the average at a small liberal arts college is around one to two. So that's the first key takeaway, that there's quite a bit of difference in terms of the firepower that institutions have for their data services.

The second is that libraries are an important provider of data services across institution types. So even though you see a huge variation in the sheer scale or scope of services, that across all of the institution types, the library is always at the center of what's being offered. You will see that statistics departments or bioinformatics-type services are offered in a more specialized way, particularly at R1s, but ultimately at the R1, R2, or small liberal arts college level, you have libraries playing a pretty strong role in offering quite a bit of the services that are available.

Gerry Bayne: That's interesting. What would you like higher ed tech professionals in general to know about data and the trends you're seeing? Again, the reason I ask this question is EDUCAUSE is more generalized, more teaching and learning cybersecurity, CIOs, et cetera. So I'm wondering if there's a way we could talk to them and show them that this is important even to these folks that are not in the data world necessarily.

Dylan Ruediger: Yeah. So the first thing that I would say is that data-intensive research is now taking place in virtually every academic field, and it's becoming a increasingly normal way of conducting research, not only in the fields that have long been associated with large amounts of quantitative data, but in fields that have traditionally been more rooted in qualitative analysis and in smaller amounts of data. So big data is really everywhere across the academy, and that's likely to continue to trend in that direction for the foreseeable future.

At the same time, different disciplines have their own unique structures, cultures, and very different levels of access to research funding, which can hinder or encourage them to participate in data-intensive research, and also means that as data-intensive research spreads across disciplines, there's the potential for either equalizing disciplinary access to research funds and incentives or to exacerbating existing academic inequalities in levels of funding and whatnot.

Gerry Bayne: That's great. Can you please tell us about your work surrounding virtual and in-person conferences and meeting strategies for scholarly organizations?

Dylan Ruediger: So as you know, meetings are a really essential part of intellectual exchange and scholarly communication. They are also central gathering points for scholarly communities. And COVID has really opened new possibilities for how conferences can conserve those purposes that they've traditionally served, and it's also created what appears to be a enduring demand for hybrid and virtual meetings. But also, at the same time, it's really helped to clarify what the value of an in-person meeting is. So we have all these competing modalities. They have constituencies of people who prefer them, and they all seem to serve a meaningful purpose in the life of scholarly communities and in the process of scholarly communication.

Societies are really important organizers for many of the most important of these meetings, and they've really deeply embedded the idea of the conference into the core of the mission and into their financial and membership model, so it's a very high stakes issue for scholarly societies and other entities that organize conferences, making the stakes of what happens next in meetings really profoundly large. But the pathway to a sustainable future for meetings really is pretty unclear right now.

And with the support of the Sloan Foundation, Ithaka S+R has assembled a cohort of 17 scholarly societies for a year-long co-learning project. The cohort includes societies from a very wide range of disciplines and an equally wide range of membership numbers and resources, ranging from very large staffs to... Some of the smaller organizations we're working with have either one full-time executive director or in at least one case are staffed by volunteers entirely. Our goal for bringing them together is that they'll learn from each other across their differences and because of their differences, actually, over the course of six meetings we'll be holding this year, which will culminate in a design lab that will be run by our partner in this, JSTOR Labs, who will prototype innovations for what future meetings could look like. Our second meeting of the cohort, which will be focused on financial and membership implications of changing structures to meetings, will take place in April of this year, so just in a few weeks, and we'll be publishing findings from this project later in 2022.

Gerry Bayne: So is it in person versus virtual meetings and how those can work together? Are you finding that they're trying to meld the two or go one direction or the other? I'm not asking you to give me your findings yet, but just some ideas.

Dylan Ruediger: Yeah. I mean, my early sense is that's really still a live question. I think the trend that you articulated of having what I think of as parallel meetings, a virtual meeting and then an in-person meeting or vice versa, is a model that's appealing to a lot of communities right now. Whether or not it really will work over the long run is something that still remains to be seen. There are concerns that the virtual meeting might cannibalize the in-person meeting, make fewer people willing to show up and undermine the hotel contracts and other things that pay for meetings. But there's a real clear interest in scholarly communities to have some kind of a virtual meeting going forward. I will say that the hybrid model where you have simultaneous virtual and in-person events is one that's particularly challenging for scholarly societies. Financially, it's just very difficult to pull off, and there aren't good models for that in existence right now.

However, the project as we've construed it is pretty agnostic about platform and modality. We're really encouraging societies to think of these as potentially complementary meeting formats or to not be prescriptive about which one we think is better than others. And I anticipate that we're likely to see societies making a wide range of decisions, and that some may go back to in-person pretty permanently, or at least for as long as they can. Others may choose to stay virtual for the foreseeable future. Others may do these parallel meetings. I think we'll see a lot of diversity in that.

Gerry Bayne: That's great. Well, thank you both for your time. Very much appreciate it.

Dylan Ruediger: Absolutely.

Gerry Bayne: Dylan Ruediger and Danielle Cooper from Ithaka S+R. I'm Gerry Bayne for EDUCAUSE. Thanks for listening.

This episode features:

Danielle Cooper
Director of Libraries, Scholarly Communication, and Museums
Ithaka S+R

Dylan Ruediger
Senior Analyst
Ithaka S+R