Dean of Libraries at Carnegie Mellon, Keith Webster, discusses CMU cloud lab strategies, and the Artificial Intelligence for Data Discovery and Reuse Symposium, which aims to find innovative solutions to accelerate the dissemination and reuse of scientific data in the data revolution.
View Transcript
Gerry Bayne: Welcome to the CNI 2022 podcast. These interviews were recorded at the Coalition for Networked Information Spring 2022 meeting. On this episode, we feature Keith Webster. Keith is the Helen and Henry Posner Jr. Dean of the University Libraries at Carnegie Mellon University. In this discussion, we talk about the CMU cloud lab strategies. We also discuss the artificial intelligence for data discovery and reuse symposium. It's a symposium that aims to find innovative solutions to accelerate the dissemination and reuse of scientific data in the data revolution. I started our discussion by asking Keith what he sees as his biggest challenge in providing library services in 2022.
Keith Webster: I've got three answers. The first is figuring out which of the behavior changes we've seen during the two years of the pandemic that are going to be sustainable in the long run. For example, we saw a huge shift to digital content when our libraries were closed in the middle of 2020. And what we are trying to understand now is whether the behavioral changes we've seen as a result of that and since then, really have changed to the extent that we need to think about our collecting strategies in different ways. Or are these simply a transitory state from which there will be a reverse to pre pandemic behaviors, something we simply don't know. Interestingly, one side issue that we observed was that a number of faculty were assigning students project work, requiring them to find data sets from elsewhere and run some fresh analysis of these data sets because [inaudible 00:02:01] laboratories were closed.
And that has heightened awareness of the opportunities that exist in finding data and running secondary analyses rather than repeating experiments from the start. I'll be intrigued to see if either that continues. The second theme to me is navigating the scholarly publishing landscape. There are a lot of things happening at a macro level that may well change the publishing marketplace over the next few years. Clearly the growth of open access publishing is something that we are paying close attention to, the array of transformative agreements that we and other institutions are executing with publishers, will begin to have an impact on the long term financials of publishing. Many publishers are exposed to changes like Brexit and the policy decisions made by the Chinese government to shift how researchers deposit articles with Western publishers. Those will have again, longer term implications. The pandemic and the shift of research funding towards clinical and medical research may well have an impact on other disciplinary fields with knock on effects on scientific journals.
The medium term impact of lab and campus closures, faculty with caring responsibilities, conference cancellations, will all have an impact on publishing levels, at least for a couple of years, but we need to keep an eye on the trajectory of careers, of ensuring that the scientific community is supported as they cope with these disruptions to science. And then some of the financial pressures that publishers are exposed to. Many library budgets were reduced, some temporarily, some permanently as a result of COVID budget shocks. Publishers have exhausted their sales of back files. Textbook sales have been hit hard and those, in recent times I would contend, for some publishers have offset some of the softness around journals' revenue. So it's a very messy situation that we certainly want to keep an eye on.
And my third point I'll just mention briefly is the growing interest in the European Union on digital sovereignty or digital autonomy. Concerns about the extent to which big tech can or should make use of data deposited by individuals or institutions. [inaudible 00:05:01], we in the United States are maybe sheltered from some of those debates. I've no doubt that potentially, inadvertently some of the medium size tech businesses, including publishers, repository services, potentially will be impacted if the EU's Digital Services Act really begins to shift behavior in the EU and potentially if the positive side that the EU is looking for, the creation of an economy that encourages smaller tech companies to compete, may well create greater provision for those of us in this country, looking for alternatives to the usual big sources of technology that we have to work with.
Gerry Bayne: What is the CMU cloud lab strategy and how is this changing the role of the library?
Keith Webster: So Carnegie Mellon is about to open the first academic cloud lab, we believe in the world. We've taken our inspiration from two of our alums, both graduates of the College of Science who established the Emerald cloud lab in San Francisco some years ago. The principle of the cloud lab is that instead of every researcher having their own wet lab or their own bench, they have shared access to a remote facility where there is an extensive array of laboratory instrumentation. The researcher, rather than going into the lab, works from a laptop using the cloud lab operating system to code a job or an experiment, send it to the cloud lab and a few hours later receive their data back in the form of a digital file. The cloud lab is able to operate 24 by 7, 365 days a year. And one of our hypotheses is that we will be able to provide our researchers with much more opportunity to increase their experimental time without having to wait in line for equipment.
Lab is operated remotely, partly by robot control, partly by technicians staffing the laboratory. One of the great attractors to us as we rethink our science buildings at Carnegie Mellon, is we can move away from the need to build a laboratory every time we hire a faculty member. No more startup packages where people are ordering bits of equipment that might well be in the laboratory next door, it's all one big facility. So from the library's perspective, we see great opportunities here to help extend the work that we have been doing, both to advance open science and to support our research community in meeting data management best practices and increasingly data management mandates. We've been supporting open science at CMU for a number of years now. We've built an end to end open science workflow. That has become part and parcel of how our scientists approach their work.
The great question for us just now is the Emerald cloud lab and its underlying technology operated primarily for the commercial sector. And therefore their focus was on keeping things locked down and inaccessible rather than open and free to reuse. So we are working with the cloud lab team to help them work with us, to navigate opportunities for our researchers, to insert protocols from www.protocols.io and at the other end, export their data into our fig share data repository. So exciting time. A lot of training underway at the moment. We are one of the two accredited trainers for the cloud lab in the university libraries. Really looking forward to seeing what this means in terms of the volume of data that we are asked to help curate as well as understanding what it means for the future of automated science.
Gerry Bayne: Will there ever be any need to build more than one lab or do you just schedule them so people don't use them at the same time.
Keith Webster: There's a lot of optimization. Clearly if there was a lot of demand for a particular instrument, we would add a second one. The beauty of the lab is that it is based upon a racking system. So kit can be added or subtracted at any time, but using the cloud lab operating system, jobs can be cued and synchronized to optimize the availability of kit.
Gerry Bayne: That makes a lot of sense. Can you give us some sense of the situation with foreign students, COVID and CMU?
Keith Webster: Sure. In the early days of the pandemic, it was a pretty difficult time. Any university that relies upon international students was very concerned in the summer of 2020 with travel restrictions, really difficult time for international students to arrive. In the current academic year, the one that's just coming to an end 21/22, we saw not only a return of our international students, but the arrival of many who had decided to defer enrollment. So we've got an even busier population than normal. We are in a great spot at the moment, in terms of international students coming to campus. The travel bans have gone. The immigration restrictions for coming to the US also have been removed. The only procedural thing people have to worry about just now, is the entry requirements related to the virus, proof of vaccines, negative tests, and the like. The one thing we're conscious of is that US embassies and Consular services around the world are not fully back to their pre pandemic operations just yet.
But the state department has prioritized visa appointments for students and scholars. They have waived the in person interview requirements for repeat visa applicants. So even there are things are in a fairly healthy place. The one thing that we are conscious of and it's causing a bit of end of semester issues for some of our community, is that traveling from the United States to some countries overseas can be challenging. The big difficulty we are aware of is passengers traveling to China, who have to get samples for various antibody tests collected, not from the city from where they begin their journey, but from the city from which they will travel to China.
So for students from Pittsburgh, in our case, they have to travel to a departure city, and these include places like Dallas, Detroit, Seattle, LA, San Francisco, New York, take their tests and then they have to wait in place for seven days before they travel to China. And we found a bit of travel issue for students planning to return to China at the end of semester, but compared to where we were two years ago, generally we are in a much happier place.
Gerry Bayne: So could you tell us a little bit about the artificial intelligence for data discovery and reuse or AIDR and that's a conference. And do you have any future plans for that event?
Keith Webster: Sure. So let me see a bit about the thinking behind the conference and then answer your question about timing. So what we've seen is that the explosion in the volume of scientific data has made it increasingly challenging to find data scattered across various platforms. Universities around the world today are making researchers data available for reuse, reproduction, other purposes. And keeping on top of all of that is increasingly challenging. We've also seen at the same time, increasing numbers of new data formats, greater data complexity, lack of consistent data standards across disciplines, metadata or links between data and publications, making it even more challenging to evaluate data quality, reproduce results, and reuse data for new discoveries. So around about three years ago, with colleagues from the Pittsburgh super computing center, the CMU libraries applied to the national science foundation to conduct the first in, what is becoming a series of conferences called AIDR, the AI for Data Discovery and Reuse.
We viewed AIDR as providing a platform for AI and machine learning researchers, data professionals, and practicing scientists to come together and benefit from each other's expertise to address the data challenges that I just described and to facilitate the next breakthroughs in science and technology, using the combined powers of AI and scientific data. So we had a three day in person conference in Pittsburgh in 2019, large attendance, a huge amount of conversations underway. We had planned to repeat the conference in 2020, but of course the pandemic interrupted things. We held a one day event in 2020 online. Again, a big attendance with a bigger international audience because of the format.
We anticipate having another event later this year to maintain momentum and then pivot, hopefully to a bigger event in 2022. Separately, but related, we had for a couple of years before the pandemic run an open science symposium, and that was attracting an increasingly large audience over a couple of days. The 2020 AIDR was held in partnership with Open Science Symposium and that's how we think we will keep things moving forward. We are also looking at relationships with the CMU university lecture series. If any of your listeners have any recommended speakers on open science, reproducibility, AI, and data discovery, do please get in touch. We'd love to hear anyone's suggestions for speakers.
Gerry Bayne: What is one or two things in library data and research that you'd like folks in higher education technology [inaudible 00:16:08] large to know about.
Keith Webster: I would like people to understand that we are serving as a hub for many things related to data and research. For example, as a data provider, we are working with our Department of Statistics and Data Science, helping their undergraduates work with the data that we are curating for some of their projects. And it's a great way of blending the theoretical aspects of data management with practical student projects. We have become very prominent on campus as an educator and trainer in supporting students, researchers, other members of the community. We are increasingly delivering credit bearing courses, again in partnership with our Statistics and Data Science Department. We've been running Carpentries workshops for a few years, and we're beginning to turn these into slightly more specialized offerings. Last week, we ran a Carpentry for Genomics workshop, for example. We are working with the Carnegie Mellon sustainability initiative, which I co-chair, to tag publications and courses that are being presented at CMU so that we can talk about the relationship between scholarship at Carnegie Mellon and the UN Sustainable Development goals.
I maybe mentioned just two or three nuts and bolts practical projects that might be interesting. With generous support from the Sloan foundation, we are working on what started off as the robotics project. It's increasingly moving towards robotics archives and museums. Where we are recognizing that Carnegie Mellon is home to a huge amount of foundational work in robotics, in autonomous vehicles, in computer vision. And with 50, 60 years of research behind us, we are trying to bring together artifacts and other records of robotics research. These bring particular curatorial challenges. How do you look after machines designed to go to the moon or into a nuclear power plant? How do you curate the different parts of the record of scholarship, which increasingly is in a very multimedia format, photographs, videos, sound recordings, code, sensors, and other devices. Those are some of the questions we are exploring with this robotics project.
We are working with our colleagues in the Software Engineering Institute, a federally funded R and D center at CMU to establish an open source programs office. Many universities have [inaudible 00:19:15] in place, but we are looking at particularly the relationship between open source and government interests in that work. And we are very proud of the recent rollout of our Islandora 8 digital archives. Again Islandora is widely used across the higher education landscape. We are one of the first institutions to roll out the 8th version Islandora and we encourage anyone to come to our website and look at our digital collections.
Gerry Bayne: Well, Keith Webster, thank you so much for your time today. Appreciate it.
Keith Webster: It's been a pleasure.
Gerry Bayne: Great. That was Keith Webster, the Helen and Henry Posner Jr. Dean of the University Libraries at Carnegie Mellon University. I'm Gerry Bayne for EDUCAUSE. Thanks for listening.
This episode features:
Keith Webster
Dean of Libraries and Director of Emerging and Integrative Media Initiatives
Carnegie Mellon University