AI and Scholarship

min read
The CNI Interviews Podcast | Season 3, Episode 1

Cliff Lynch, executive director for the Coalition for Networked Information (CNI) discusses the evolving impact of AI on scientific research and education.

Listen on Apple Podcasts Listen on Google Podcasts Listen on Spotify Listen on Stitcher

You can also watch the episode on YouTube.

View Transcript

Gerry Bayne: Welcome to the Coalition for Networked Information Podcast. Today I'm speaking with CNIs executive director Cliff Lynch. We're talking about the impact of artificial intelligence on higher education and research. I started by asking him to give some general comments about what he's thinking about in terms of artificial intelligence.

Cliff Lynch: It's really important I think, for us to keep in mind that the sort of shiny generative AI thing that's captured everybody's imagination is only a very, very small and perhaps not even that important piece of the very extensive toolbox of things that we kind of refer to as AI and machine learning broadly. We also need to be mindful that we're increasingly going to see a lot of those technologies combined with other technologies, notably robotics, remote sensing, things of that nature in a sort of unusual hybrid ways. One of the things I've spent a lot of time on over the past year is looking at how AI is affecting the practice of science. If you look at what's going on with scientific practice, generative AI of the sort that has captured the public imagination doesn't look like it's going to play a very big role. The really big ones look like they're more machine learning driven, and they deal with things like predicting the properties of materials, predicting how proteins will fold or how they will bind with other molecules, these kinds of things, and these are having a really genuine impact in some areas.

Now, there are some very interesting shortcuts showing up between traditional computational science-based approaches and machine learning approaches. So to just give you one example of that, one of the sort of flagship problems that has driven the development of high performance computing over the last 50 years has been weather prediction and weather prediction. If you look at how they traditionally do it is very much physics based. You set up a three dimensional grid essentially and simulate physics phenomena in there. And the issue is how fine a grid can you simulate? How much resolution can you go to essentially, and then can you do the predictions quickly enough so that they're actually useful? So they've built some machine learning based systems, which really, they're not sort of classic super computer sized applications. These will run on a good laptop and they give pretty darn good weather predictions just about as good as the physics-based models.

So now you're starting to see people thinking about how can we use this in tandem with the physics-based predictions, fascinating kinds of phenomena. So I think thinking about how these things can really change the practice of science over the next decade or two are really, really exciting. We're also seeing these fascinating sort of closed loop things with robotics. So you think about something like predicting the properties of new materials and the results seem pretty good, but they're not perfect. And what we can think about those predictions as doing is providing sort of signposts for fruitful experimentation so you can start thinking about setting up systems of experimental labs. Maybe they're automated, maybe they're not. Maybe there's something in between that. Go and test out these predicted properties. See if you really can synthesize the material and it really behaves as predicted and then cycle back and make the next round of predictors that much better. We have better training data. You really are genuinely, I think, starting to see some changes in the practice of scholarship out of this. How

Gerry Bayne: How do you see AI developing in the next five to ten years?

Cliff Lynch: When we take a five-year view or a 10 year view, these things will find their place in teaching and learning and research. They'll adapt the immediate response when we saw, for example, these chatbots was pretty disgraceful in some circles. It was basically, how can we improve our plagiarism detectors to stop people from using this? Rather than thinking about how can we really use this in a constructive way to improve educational processes, which frankly if they're that vulnerable to a fairly low level generative AI application, have something very wrong with them to begin with. And I think if we look at a five-year period that will get sorted out. Academia frequently has these patchwork stupid reactions to things. I mean, I'm thinking of when Wikipedia, for example, surfaced into the consciousness of many people who were teaching introductory courses and people were absolutely forbidden from using this instrument of the devil, and now it's just kind of assimilated in and it's another tool I think that we need to be careful about extrapolating some of the very short-term stuff.

So for example, we are, if you look at the chatbots du jour, they are being driven off of these large language models and it seems like there is a testosterone laden competition for who can waste the most cycles, building the biggest model with the largest number of parameters and trained on the biggest amount of stuff, but it's not really clear how those curves extrapolate out. There's some evidence that there's a lot of leveling off starting to happen that bigger is not after a certain point, bigger doesn't get you too far. The conversation as people are getting smarter about these models is becoming much more nuanced. It's to the point of, well, if I can make a model that I can run on a high-end phone as opposed to a massively expensive computational resource and it's 90, 95% is good, is that good enough for most of the applications we want from it?

Is that going to work for quick and dirty translation? Is the extra 5% worth it? I think that there are some more visionary kinds of goals, some of which have been around for a really long time now, like building intelligent agents, intelligent assistance. That's really hard. That was hard 30 years ago when it was first being promoted. It's hard now, and if you look the mess, for example, that generative AI is making as Google has attempted to put it into search or it gives you all of these terrible summaries, everybody hates them. And I certainly don't want to delegate something like this to go around doing my banking or booking my travel or anything else like that, and I don't really think I could be wrong, but I don't think that the pathway to that is going to be just training models with more parameters.

I think that the way forward is going to be some new technologies and some new techniques. Let me just spin a possible feature here for you, and don't take this as a prediction, but rather as something that conceivably could happen that maybe gives you an insight into some of the forces at play. One thing that's happening right now, at least in US universities is that staffing for research has become massively more expensive. The move towards giving postdocs a living wage, graduate students living wage, the unionization of graduate students has all basically meant that people budgets for research grants have gone up a lot. The research grants themselves are not getting any bigger, so that means you're going to have less grants or less people. So now can we delegate some of what some of those people are doing to various kinds of machine learning and AI and robotics kinds of things?

Now, I think we probably can, and I actually think that maybe that's a good thing. Higher ed has a history of being very abusive to its graduate students in the sense that it treats them as cheap labor and doesn't necessarily have them do things that are particularly sensible or even useful to their education. Think of the history of using graduate students as laboratory system administrators for 20, 30 years just because it was cheap and you could make 'em do it. I think a lot of the bench technician work is actually being done in a lot of our labs by rather overqualified people right now, just because historically that labor has been cheap.

Now let's go on from here and say, okay, so maybe we can automate some of these people or some of these roles. How does this play out? What would it be like if we could basically generate as many bench chemists as we needed? How does that change the role of principal investigators, which now, I mean Dan Cohen for example, has speculated maybe they become sort of visionary leaders of these armies of both computational and human scientists in a very different way. How constrained are our various scientific disciplines by the numbers of bench scientists or scientists more broadly that we have in them? I don't think anybody understands that terrifically well. It seems to me quite plausible that as we get into this over the next decade or two, we may find that some disciplines or sub-disciplines or sort of research practices are very amenable to this multiplier effect. It looks, for example, like some areas of material science as we were discussing are really good. This probably some areas of molecular biology are good. This I'm much less clear, very much less clear that much of physics is going to be like this, particularly theoretical physics or astrophysics.

I don't necessarily see how until you get to the point where computational things can formulate hypotheses and then prove or disprove 'em, we really get that multiplier effect in those areas. And right now, I think we're a very long way away from knowing how to do that computationally, although there are people working on it. That leads me to believe that mathematics, theoretical mathematics outside of a few niche sub areas may be quite resistant to this kind of multiplier effect. On the other hand, there is a lot of evidence starting to emerge that we'll see other kinds of mathematical practices get revised. There's been a lot of progress in basically the formalization of theorems and their proofs. And so you now are seeing, you've always had this problem in leading edge mathematics that someone will prove something with a long and difficult proof, and it can take years literally for that proof to get socialized into the mathematical community, both through peer review of the formal sort around an article, but also the less formal peer review of just other respected peers in the field before they really have confidence that that proof is right.

There are some proofs, some bodies of work that are so complicated that they never get there or it literally takes 10 years or the techniques are largely lost. So they're fascinating stories around these kinds of things and how confidence is built in them. Now, they are getting to the point where they can formalize some of these proofs, and they actually did this recently with a very complicated piece of leading edge research mathematics, and then run 'em through these theorem provers to basically as a way of verifying that the mathematician got the proof right and didn't any major steps. So it's not that they can prove the theorem from scratch, but rather that given some guidance in the general layout of the proof, you get a formalized verification that can give you a much higher degree of confidence in the correctness of the proof.

Gerry Bayne: What ethical concerns around AI should higher education and digital scholarship institutions address,

Cliff Lynch: I think that when they start using AI techniques on their students for whatever reasons, success prediction, identifying people who are having difficulties placement, there are a million reasons why they might want to do this. I'm not actually all that convinced that AI itself adds a lot of new concerns. Many of the fundamental concerns are about consent, transparency, ethical treatment of people. AI gives you some tools to be even more horrible than you could with the preceding set of tools, but the fundamental problems to me feel like they're largely conserved. I think we need to be careful about importing tools that have been trained on data that we don't understand, and that because of that they may have bias. Bias is a very popular word that I'm not crazy about. They may have been taught to recognize patterns that are not applicable to the target population that our institutions may be using them on, which I think is often a better way to say it. I mean bias often, at least the way I hear it used implies some sort of deliberate malign intent. And I think attributing malign intent to machine learning algorithms is just stupid. So that's a piece of it. I think that when we talk about applications to research in the broader world, there is a great challenge getting our students to think about what they're doing and use these technologies in appropriate ways.

For example, we train a lot of people who go into the law who become judges and attorneys and things like that. Now, we don't actually exercise a lot of legal system ourselves in the universities, but we train people to go out and be part of the broader legal system. I think that some of the ways that I have seen machine learning and predictive systems deployed in the judicial and criminal justice system just make my head explode. And certainly I would hope that we are having a conversation with people who will go out and be part of that system about understanding what makes sense and what doesn't in there. I mean, it almost goes beyond ethics to just not being stupid when we're doing engineering, for example. I think that you got to get people to think hard about autonomous systems of various kinds and how autonomous they are. And I mean, the whole experience with self-driving vehicles is very interesting there. And one that's been, I think, oversimplified a bit from an ethical point of view. And there are a million niche applications for autonomous vehicles. Just because you don't want them necessarily on crowded city streets in between the school buses doesn't necessarily mean they're not a good idea when you're trying to do logistics support on a battlefield.

Gerry Bayne: That was Cliff Lynch, executive director for the Coalition for Networked Information. I'm Jerry Bain for Ed. Thanks for listening.

 

This episode features:

Clifford Lynch
Executive Director
CNI