In this conversation, Executive Director for the Coalition for Networked Information (CNI), Cliff Lynch, talks about issues concerning scholarly communication, digital information technology, and higher education technology.
View Transcript
Gerry Bayne: Welcome to the CNI 2022 Podcast. I'm Gerry Bayne. On this episode, we feature a conversation with CNI's executive director Cliff Lynch. I started our conversation by asking, what are some of the issues around digital libraries, higher education technology, research, and scholarly communication, that Cliff's been keeping his eye on in the past year?
Cliff Lynch: So let me talk about a couple of developments that I'm watching that I don't think are on a lot of other people's radar screens. One that is probably on at least some people's radar screen, and I'm going to sort of skip over a lot of other things that I'm watching because I think that there's quite a good awareness in the Educause and CNI communities. So here are the three that I'm paying close attention to.
The first, and by the way, I would invite people who hear this podcast who have information to share about this to contact me because it's been really hard to find out what's going on in this area, is the development of network connected experimental apparatus, cloud labs, and similar sorts of developments. I know, Gerry, that you had an opportunity to speak with Keith Webster from Carnegie Mellon, which is, as far as I know, way out in front among universities in the US in terms of actually doing a full fledged cloud lab.
My interest in this really started during the pandemic when a lot of institutions shut down their labs very abruptly and very hard, and then brought them back. And when they brought them back, they often brought them back in a de-densified kind of a way, which raised a lot of questions about, how should we really be running experimental facilities efficiently? How can we do them in a more automated way? Do we really need to be doing all of this on university premises, or can we contract more of it out? I think there are a lot of questions here. How many shifts should we be running labs, given the value of that space and the cost of that space? Can we really afford to only run it one shift?
Now, universities generally are notorious for inefficient use of their physical plants in terms of not operating 24 hours, not operating all year round, things like that. But I do think that the experience during the pandemic really raised a lot of questions about our just sort of tacit assumptions of how experimental research, lab research should be carried out. So I'm really thinking that what Carnegie Mellon's going to be doing here is quite revolutionary.
I know that when they started resuming research on many campuses, they gave priority to trying to get the so-called core facilities operating again, because those are typically shared resources that are used by a number of principal investigators as opposed to a principal investigator that has his or her own lab. Again, what I found there is very anecdotal. Some of the regional research networks or high performance computing facilities have gotten involved in this. But it's really interesting because some of the skills that you need to really make cloud labs or to really connect up experimental equipment to the net are a little bit different than either the skills you need in IT or in high performance networking. And this is sort of like a mystery area that's fallen through the cracks. I mean, there are people who know how to do this stuff, but they're not really well organized.
It turns out, by the way, there's quite a commercial industry that does this, mostly serving biotech material science and pharmaceutical companies. And I learned something about that poking around in this area during the pandemic. Potentially it could be a really high payoff thing in terms of effectiveness and speed of our ability to conduct research, value for investment.
Gerry Bayne: Are these endeavors money saving or are they very expensive? I don't really have a sense of what the ROI is.
Cliff Lynch: Well, that really remains to be seen. I mean, you can measure ROI in a lot of different ways. So one of the things in talking to the folks at the Emerald Cloud Lab, that Carnegie Mellon is working with, that I learned was that, historically, and still today at their facility out in South San Francisco, the vast majority of their customers are commercial pharma, biotech and material science companies. And it's quite common for them to basically queue up piles of work that can run all weekend. So one form of ROI is, can we conduct experiments faster? Can we conduct more of them? Can we get to a scientific insight or a new drug or a new drug candidate or a new material faster? Those kinds of things.
There's also a very complicated set of trade-offs between capital expenses and operating expenses. If you look at how a university will typically onboard a new scientist, a junior faculty member, there's a big startup package for a lab and for lab equipment until the grants start moving. Often there is space that needs to be allocated and then remodeled or conditioned in various ways. That takes a lot of time and is a lot of out front investment. Imagine if you can onboard one of these faculty members and say, "Here's your account. Start running experiments tomorrow morning." That's a really different set of cash flows and space management and things of that nature.
Gerry Bayne: Are these robots running these experiments?
Cliff Lynch: If you look at the cloud lab thing, they have automated the equipment to the greatest extent possible, and it's all computer connected so that you can essentially set up the parameters and record the parameters and take the data off the machine. There are human lab techs who are part of the Emerald Lab who will do things like specimen preparation-
Gerry Bayne: That's what I was wondering about.
Cliff Lynch: ... send stuff out there, FedEx. They'll set it up and put it on the machine. They'll move it from a machine to another machine. So it's not a completely robotic flow. But from the point of view of the experimenter, it has a robotic quality to it, and that becomes very important by the way, also, when you start seeing some of this fascinating research now that is delegating...
Cliff Lynch: Fascinating research now that is delegating discovery or optimization of things to machine learning systems, where basically you're trying to find, let's say, a material that has certain properties. So what you do is you've got a bunch of data points already in a database about how various materials act. You've got a machine learning model that's predicted some behavior and now, will run a couple of experiments to synthesize and test a couple of materials in that parameter space. And then think a little bit more, compute a little bit more and run another experiment in an iterative way. And you could actually take something like a cloud lab and set it up in a loop essentially, with a machine learning program. Really interesting to me.
Gerry Bayne: Yeah.
Cliff Lynch: I mean, maybe the last thing I'll say too about this whole idea is that, this is a very democratizing thing. It basically allows a researcher at an institution that doesn't have an immense amount of infrastructure to get access to these kinds of facilities and tools.
Another one that I'm tracking closely is, what happens to scholarly communications going forward? I think we've got a confluence of trends here that are taking us to some place that's really hard to predict. The journal system, the sort of traditional journal system has been under pressure for a long time. Funders are pushing it to be open. There is a reproducibility movement that is increasingly arguing that code and data need to be first class parts of the research record, along with the journal articles. You're seeing experiments like Octopus in the UK as new supplements or parallel things to journal publication. So there's a lot of pressure going on in that area. The peer review system, as far as I can tell, is just slowly collapsing because there's just so much stuff being published and nobody has time to do peer review and nobody wants to do peer review. You're starting to see that shift now. On top of all of these things that were happening before the pandemic hit, the pandemic did two things that I think were really interesting and confusing.
One is that it really forced the biomedical and even worse, the public health communities to seriously grapple with preprints, unreviewed material that was circulating. And there have been preprints forever, physicists, mathematicians, people like that have used them for a very long time, even before the digital age. But a lot of the biomedical sciences in particular were very uneasy about preprints prior to COVID. And one of the reasons I believe that they were uneasy about preprints is that, there are a lot of people who rely on the biomedical literature to make decisions about treatment, about best practices, about public health matters, who aren't necessarily in a good place, nor do they have the time to independently make judgments about the correctness of the material on which they're basing their decisions. That's just not what they do. They rely on a well vetted and hopefully accurate body of scientific knowledge.
And all of a sudden, that stopped happening in the pandemic because we couldn't wait in many cases. And that led to all kinds of problems with policy makers trying to figure out what to do with information that was poorly supported in various ways. Maybe it was right, maybe it was wrong. And so, we've saw a whole series of kind of ad hoc things grow up to do expedited or informal review of preprints. We moved away from this very formalized peer review system to something that was trying to be more pragmatic. And I really wonder, now how much that's going to stay around? How much we're going to continue to try and do traditional peer review, all of that kind of thing. Especially in light of the new demands around reproducibility, which tend to make you reassess a little bit what peer review is trying to accomplish.
Another piece of this that, I don't know that people have reflected about enough is, what happened to scholarly conferences during the pandemic? Basically, they all went virtual and it's very confusing, they became much more inclusive in a certain sense, and yet much less effective in another sense because of the divided attention, because of time zone issues, because of people's tendency to say, "I'll get to that asynchronously, I'll watch the recording" and then they don't. People accumulating this enormous amount of video debt essentially, that they would declare bankruptcy on every few months.
Gerry Bayne: I love the way you put that video debt.
Cliff Lynch: Yeah, I mean that's really what it is. And at the same time, we didn't know what to do with these as objects that are part of the scholar of the record. Because when you take these things online, you get recordings of them and yet, if you look around at the recordings of conferences made during the pandemic, the ongoing disposition and availability of these is the most astoundingly, haphazard story I've ever seen. Some people keep them, some don't, some keep them for a little while. It is become a very sore point with me that it's often almost impossible to tell when you're thinking about whether to register for a conference about , the talk happening online about, "Whether it will be recorded? And if so, under what terms? The recordings will be available subsequently", which is actually a really important piece of decision making about how to allocate your time often. Now that we're coming out of the pandemic, we see some organizations saying, "Well, we really want to be, you know..."
Cliff Lynch: ... sections saying, "Well, we really want to be as inclusive as possible. We thought it was wonderful that so many more people could join into our conferences. And by the way, the carbon footprint is probably better, so we're just going to stay virtual forever. Nevermind the other elements of the gatherings that we're completely losing out on." You see some returning to in-person meetings. You see some doing these hybrid abominations. I am very much of the opinion, and I have acted on this consistently in CNI's choices, that hybrid stuff is just a disaster. I mean, it's fine to do an in-person meeting and record everything and make it available later. That CNI has done, and we are doing even more of it now that we're back meeting in person.
But trying to do genuinely hybrid meetings, especially if you're holding them in spaces like hotels, is just a nightmare. I mean, I have seen some hybrid classrooms that work. And if money's no object and you can absolutely control and remodel the space as necessary and deal with the acoustics and the microphones and all of that, you can make a space that will work for a hybrid meeting. Now, I will completely leave aside as out of my wheelhouse the pedagogical issues in doing hybrid classes, which are a whole nother nightmare. But if what you're trying to do is primarily lecture and field questions both in the room and out of the room, you can do it in those rooms. But I don't think that's a good future for most meetings.
I'm really going to be interested to see how people split these things out and what choices they make. I think it's going to be quite different for different communities. It's also quite believable to me that we will see meetings bifurcate so that you will have in a field one scholarly meeting that is always held virtual and another that's always held in person and they have different characters and somewhat different attendees. But people will sometimes choose to put their research presentation and submit to one and other times to another, much as they make different choices of journal venues today for papers they want to submit to.
The next big trend I'm tracking is the whole notion of special collections and how we use them and provide access to them. If you think about what happened during the pandemic when our libraries physically closed, we actually did really good, relatively speaking, with journals which were just online and weren't a problem basically. And also with a lot of monographic material, the HathiTrust Emergency Access Service really helped fill a gap there.
The place where we really ran into problems was all of the researchers predominantly in the humanities and social sciences, but most of all the humanities, who rely on archives and special collections. The model for that has always been you travel to them and you use them physically. This is a terrible model in the sense that it makes them very hard to get to, it creates great inequities between the scholars who can get the travel money to go visit special collections and those that can't. It's one of the things that makes research so slow often in the humanities.
There are some serious efforts that are going on to try and figure out how we make more remote use of special collections without necessarily finding the money for a multi-generational initiative to digitize them all. There's a project that's called Sourcery that's being done at the University of Connecticut and Northeastern University that's doing very interesting work in this area, for instance. But I think just as a broader trend, it's really clear, especially when you factor in the continued friction around particularly international travel as we try and come out of the pandemic, as you factor in issues around equity, carbon footprint, and things of that nature that we're going to have to do better here somehow. So that's one that I'm watching.
One of the other areas I'm watching is the rise of specialist repositories. Now, let me explain what I mean by this. You've heard a whole lot in recent years about research data and the need to preserve research data and make it available and share it. The model we have for this, if I can just characterize it, is that you take your research data, you break it up into data sets accompanied by some documentation, and you stick it in what's essentially an FTP archive. People try and learn about it in the FTP archive either because you've cited it in the paper or through some other discovery mechanism, and they withdraw it out of the FTP archive.
There is some research data that naturally falls into that model. But if you look at what's happening in many of the sciences, and I would say NIH is really leading in this, is you're seeing funders identify these communities of people doing similar kinds of research with similar kinds of data. And so, basically, they are then making funding available for a customized platform where everybody will store that data. And they will generally fund some experts at wherever that platform is being run to normalize and vet the data that's being submitted by the investigators and to develop some custom analysis and management tools.
So what happens all of a sudden is by contributing to these data collectives as a researcher, the sum of the parts becomes more valuable than the individual parts. And as a researcher, you actually get something back for contributing your data because it's being aligned with other related data. You see a lot of-
Cliff Lynch: ... with other related data. You see a lot of this in genomics areas, for example, but you are starting to see it in many other areas as well.
Now, that is a very powerful accelerant of scientific discovery, but we've got a couple of issues with it. One is that it's expensive. Which communities are we going to fund it for, and how long are we going to maintain it? Because these things typically have very different life cycles than the sort of FTP model, where it can just collect virtual dust forever, essentially. Here, the data is really being actively managed and refined. So I think they're really interesting questions about exactly how you shut down these things responsibly when they reach the end of their life cycle.
I had the privilege, just prior to the pandemic, of serving on a committee, a National Academies committee that worked on predicting life cycle costs for biomedical data. Unfortunately, the report came out early in the pandemic, and I think it just hit at a really bad time, essentially. To me, some of the greatest value of what we did in that committee was actually to try and elaborate a data life cycle that took into account both these specialist repositories and the generalist repositories, and looked at transitions among them. We used the terms "dehydrate" and "rehydrate" for the movement between specialist and generalist repositories.
And I think this is a very important development, which is not necessarily been fully absorbed by a lot of the data curation and research data management communities at this point, and to me, it's particularly interesting, because traditional research data management and data curation is very much focused on an individual investigator or a lab, whereas specialist repositories tend to be thought through at the level of a program manager at a funding agency, or something even maybe a little higher level than a program manager. And how we move between those two levels of thinking in a helpful way, and factor that into things like the advice we give investigators on data management plans, I think is a really important challenge in the coming years.
Gerry Bayne: So I've got one last question for you.
Cliff Lynch: Sure.
Gerry Bayne: And is there anything that you'd like EDUCAUSE members, CIOs, technology managers, et cetera, to think about for higher ed tech writ large these days?
Cliff Lynch: It's really interesting that you raise this, because I think a lot of organizations have been sort of rethinking exactly what they're focused on, and I think you're very accurate in your characterization of EDUCAUSE as really embracing higher education on an extremely broad basis, really including the community college systems, and I think that there's tremendous needs across that spectrum, and EDUCAUSE is doing a really important job in serving across those needs.
Circling back to your initial question, one of the things that I think is going to be true is that over the next decade, maybe decade and a half, we are going to see research intensive institutions evolve a whole new discipline of research support services, which right now aren't really situated in a very sensible way within their organizations, in many cases are not on a sensible financial footing. It reminds me a lot of instructional technology in universities, circa 1980. And if you look at what happened to instructional technology between 1980 and 2020, I mean, it really got established as a profession, and institutionalized. And I believe we're going to do the same thing with research support services, and I really would like to believe that CNI will be a central and important place to have those discussions. And obviously information technology leadership is going to play an important, but not by any means exclusive place in that discussion.
So to the extent that there are people out there in the broad EDUCAUSE community who are really concerned about those issues, they might want to give a look to some of the things CNI's been doing.
Gerry Bayne: Well, thank you so much for your time, Cliff, and [inaudible 00:29:53].
Cliff Lynch: I always love these conversations.
Gerry Bayne: Yeah. Me too.
Cliff Lynch: We ramble off into lots of interesting areas.
This episode features:
Clifford Lynch
Executive Director
CNI