In this conversation, Mark Laufersweiler, Research Data Specialist, and Tyler Pearson, Director of Digital Scholarship and Data Services, both from the University of Oklahoma, discuss the National Research Platform (NRP), the Nautilus Project, and the Kubernetes framework. The NRP is an National Science Foundation-funded Kubernetes environment managed by the NRP. These tools were implemented at the University Libraries to help with workshops and research. A common framework for software installations will allow researchers and students to focus on the pedagogy instead of setting up work environments.
Gerry Bayne: This is Gerry Bayne at the Coalition for Networked Information Spring 2023 Meeting, and I'm here with Mark Laufersweiler, Research Data Specialist at the University of Oklahoma, and Tyler Pearson, Director of Digital Scholarship and Data Services, also at the University of Oklahoma. Thanks for coming, guys.
Mark Laufersweiler: Well, great to be here.
Gerry Bayne: You guys have a lot going on. I'm seeing three different things here you guys are working on. Could you talk a little bit about the National Research Platform, the Nautilus Project and the Kubernetes framework, and how did you learn about the NRP?
Mark Laufersweiler: So James Deaton, the former Executive Director of the Great Plains Network, was instrumental in us adopting this. And he brought our attentions to Kubernetes back in 2018, and we started playing around with it, and I think we went kind of the more difficult route. And James Deaton was instrumental in redirecting us to keep it simple, and mentioned the National Research Platform and their Nautilus environment, which is a Kubernetes environment that is NSF funded. They've had over six NSF grants to date, totalling over $27 million.
Gerry Bayne: Could you talk about each of those things and what they are for people that may not know?
Mark Laufersweiler: Definitely. I'll start out with Kubernetes. It's a framework for orchestrating the lifecycle of containerized applications. So the startup, the shut-down, the spinning up multiple instances of a pod. And instead of having to do that manually, you can configure this platform to do all that for you.
Tyler Pearson: Basically, it's a conductor, it runs the pods on the resources that you've designated to it, and it goes out and finds those open resources and fires up your container.
Gerry Bayne: So you just talked about the Kubernetes framework. What specifically is the Nautilus Project?
Mark Laufersweiler: So the Nautilus project is the Kubernetes running instance that is managed by the National Research Platform. So Kubernetes is available through a lot of cloud services like Amazon, Azure, Microsoft, Google. You can also self-host it as well. One of the benefits of going through the National Research Platform is they manage all of that. And so, we are users of their Nautilus Kubernetes environment, and that allows us to focus on the application, and list on the system administration.
Gerry Bayne: Why did you want to incorporate these tools into the university library's offerings, and how did it happen that this project originated in the library as opposed to other institutional units?
Tyler Pearson: That's probably my end of the house. So as the research data specialist, I do a lot of consultations with researchers and faculty instructors, graduate students performing research. And part of that, we offer workshops, particularly in software carpentry, data carpentry. And both of those workshops require installation of software. And when we first got involved in the carpentry eight plus years ago, those installations were not necessarily problematic. But as we move forward in time, people's laptops not necessarily advancing at the same rate, or as university ITs start to lock down the access to having the ability to install software, we were finding that we were losing a lot of our workshop time to trying to just get the work environment set up for them.
The Kubernetes and the Nautilus Project allowed for us to have those frameworks to where that software that was required for those workshops to be already installed and implemented, and it just required a simple OU authentication ID to log into and have access. And now, all of the participants of the workshop, including the instructor, are working from the same platform with the same framework. And then, from the workshop, we can concentrate right away on the pedagogy. We also were hearing from researchers that when they've done software installs in their lab, not all of the machines are necessarily kept up to date with the current software.
And as we've been hearing in a lot of the talks here at this conference, reproducibility is a topic that's become very, very hot. And so, with this common framework, what you do for a lab group is that they're working from a common framework, and that means all the libraries are going to be up-to-date, everyone's working there. Any changes that occur in that framework exchange will occur for all the members of the lab. And so they can have a written providence of what they were running at a particular time of research up until when they publish. And that means that all of the people in that group are able to do that.
And also, with reproducibility and people outside the university, again, it provides that common framework where co-PIs can work on projects together. And that access to that same common work environment means that they're doing all their code development, their analytics, their visualizations in the same framework, so that when they publish, then that code that gets developed under that framework, runs in that framework. Everyone knows that. And the other nice feature of this is that these containers that we develop are public access free, right? So they can actually then incorporate the particular pod that we have describing that and they can download it and run it on their own machines. So since that becomes a barrier, it's no longer a barrier, right? We have that tractability.
And then finally, the students in education, there is a real gap in what we call kind of like this compute inequality, especially when we're dealing with non-STEM fields where students are coming in with Chromebooks that can't install this software. So there's no way they can make use of some of these analytical tools around data, irregardless of domain. If I can't install Python, if I can't install OpenRefine, I can't run those tools. This now gives that platform where, again, the valid ID, they log in, they have access to it. And then it also frees up the faculty members. So the faculty member teaching the course isn't having to deal with a class of, say, fifty students and he's got fifty installs and he doesn't have a TA. They can facilitate jumping right into the pedagogy of what they're trying to teach rather than the installation and working with the software.
And it allows for a framework for equitability in grading, right? If everyone is running on this platform, everyone has access to the same libraries, you can't say, "Well, my laptop has a newer library, that's why my code can't work on your machine," type of thing. So it starts to create avenues of making the life of the instructor faster, better, which means they can concentrate more on the pedagogy. So we were getting these as questions coming from our faculty in institutional with these consultations. And so, Tyler and I started talking about this, and that's when we said, "Well, hey, James Deaton kept talking about this to us. Maybe we should actually take a look at it."
Why not central IT or other groups? We were solving our problems at first, right? And to bring in other shareholders at this point didn't seem to make sense. We wanted to do a proof of concept to see if this really had some traction, and we knew it wasn't going to take a lot of our time to test this out. And now that we've been moving on, we can talk about that maybe in another question. But at the time, it didn't seem pertinent, and it was one of those things where we had the knowledge, and we had the time, and it was solving our problem. So we did it.
Mark Laufersweiler: I will say that we've got some domain knowledge that other groups on campus did not have. In my former informatics world, we heavily utilized docker containers, and no one else on campus had that experience. So we could translate that prior experience in containerizing applications that no other group on campus at the time had that knowledge and ability.
Gerry Bayne: I'm outside the field, but maybe you can speak on this. In digital scholarship, in applications, in libraries, technology, it sounds like we're really trying to move to like, "Okay, we're all doing something pretty close, but let's actually combine our forces and have something that has common standards, common tools that we can use seamlessly together." Does that resonate with you?
Tyler Pearson: Oh, yeah. Our new associate dean is the idea that we represent the technology, that the technology in itself is not the pedagogy of the domain, and so the commonality of tools is always present. And we saw that when we started the early move into helping with the digital scholarship realm. And then, that kind of led to the digital humanities, and led to other groups on campus that maybe had never thought about their data in a quantitative sense, right? They worked with their data qualitatively, that it didn't mean that their output, their scholarly output, right, we got to be careful. Not everything is necessarily considered to be data, but is a binary file of one and zeros, that there were some tools that would facilitate with them working with their output in ways that they had never thought of before.
But they, in their domain specialties, had never been introduced to it. It wasn't part of their graduate studies. So a lot of researchers do not necessarily have these tools in their toolbox. And so, the library, in trying to service everyone, and consider that no project, big or small domain, doesn't matter. It's this commonality of shared tools really becomes the focus point. And where we can then help with faculty is be able to isolate groups that know technology and get them to talk to the groups that don't know of the technologies, and let the pedagogies be governed by the individuals. So that really, we didn't want the technology tool to be the barrier of any education or research, that it is seamless across these technology tools. And what then we can do with consulting is help them with then their pedagogical goals with the tools, and let them drive the ship, so to speak, rather than have the technology drive them.
Gerry Bayne: Interesting. So back to the National Research Platform, where does the project stand now, and how do you see it progressing from here?
Mark Laufersweiler: That is a great question, and I know that they continue to have meetings. I haven't been able to attend them. And I'm trying to think when was their last? They received-
Tyler Pearson: February.
Mark Laufersweiler: ... Yeah. So they received some additional NSF funding recently. And the University of-
Tyler Pearson: Nebraska, Lincoln.
Mark Laufersweiler: ... Nebraska, Lincoln has contributed a lot of compute resources to this platform. So there are over fifty partnering institutions, over fifty partnering institutions that have contributed hardware to this platform. And I don't see it going away anytime soon. We are seeing it being more and more utilized when we go and run our instances, and we are actually talking to a faculty member on campus that is going out and buying hardware to attach to this-
Tyler Pearson: Infrastructure.
Mark Laufersweiler: ... the infrastructure. But, anyway. So we've got people on our campus that are looking to invest in hardware to run on this platform.
Gerry Bayne: So, last question. What advice or suggestions could you offer organizations interested in implementing something similar?
Mark Laufersweiler: So the National Research Platform has a website, nationalresearchplatform.org that has links to their documentation and how to start up. If you're wanting to just test the waters, they have instructions on their site and on how to sign up with an account, so you can kick the tires without even having to do any configurations or purchase any hardware. Once you want to start experimenting by maybe customizing some environments for your purposes, we share all of our configurations out on GitLab, and the rest of the community does too. And those links are also available through that National Research Platform website. And so anyone can download and follow their instructions on spinning up an environment. One of the things that we heavily utilize in our environment is an application called JupyterHub, and that is-
Tyler Pearson: Very well documented and-
Mark Laufersweiler: ... very well documented.
Tyler Pearson: ... implemented in a lot of places.
Mark Laufersweiler: And there's a group that has also made it easy to deploy into a Kubernetes environment like Nautilus. So we heavily utilize the community in deploying our instance, and anyone else can follow what we've done. We've put our configurations out on GitLab, and that lowers that barrier of entry even more, where the few things you need to do is talk to your local IT to at least get a couple of needed items, like some DNS configurations. And if your IT hasn't already set up SSO with CI login, that'll need to be set up. Other than that, running through our configurations, a user can be up and running within a day.
Gerry Bayne: Wow.
Tyler Pearson: I'll add to it that part of once you start to go into their ecosystem and you get authorized and you create your little, they call them name spaces, but you create your name space to start to work in, they have a chat type community, element is what they use. And one of the things that I have found that as an environment that no question is too trivial, and generally, you'll get one of the system people or even someone else from the community whose run into that problem giving you answers and guides, or usually in the form of a link to either documentation or their own configuration. And so, it's been really a welcoming community overall. People want to see this succeed across all levels. And I think that's part that's going to help with this longevity issue and where it goes, because the community starts to begin to see that with these ideas for small amount of shared resource.
And one of the things we did not talk about is that if you do purchase equipment and you work with your local IT group, it sits outside the firewall of the university and actually gets managed by the folks at NRP-
Gerry Bayne: Okay.
Tyler Pearson: ... so that the system administration and keeping of the software is managed by another group. You're just responsible for making sure there's power and cooling to that system. And the specifications for the equipment has all been thought out. So there's not a guessing game of what I need to buy to be able to participate on that. So they've really streamlined the entry points to where you contribute what you can or you feel is worthwhile, and join the family, so to speak. And it's been, like I said, a very welcoming environment overall.
Gerry Bayne: That's exciting. Is there anything about this that we haven't touched on that you'd like to talk about?
Mark Laufersweiler: I'll mention that people are free to reach out to us, we can point them to our configurations. If they run into any problems, well, I'll hold on to that thought. Yeah, yeah.
Gerry Bayne: Bite off more than you can chew.
Tyler Pearson: Exactly. I will say that the faculty members who have been kind enough to run this in their courses with the understanding of the certain caveats, we don't own the hardware, the network could go out at any time. Bad nodes do occur, and so there are some issues sometimes around the connectivity. It's not a 24/7, it's not designed to be because it's a pilot within ourselves, but also a pilot with the research community at large, is the fact that it has freed them up. They talk about how it's so much easier to teach within their class that has this common framework. And what we also, our appreciation is the student participation in that when we see a class running and we see twenty-five, thirty logins, we know, "Hey, oh, meteorology's teaching their Introduction to Programming course right now."
And the feedback that we've been getting has been positive, right? They see it as a bonus. It's been working out well for them. And what we're excited about is what we maybe haven't mentioned is that this cohort that we've been testing with, at least in meteorology, it's a freshman, sophomore level course, and the environments that we create and their storage that gets attached to them is persistent. And so they will be able to use this platform as long as the platform exists. We do not see that going away anytime soon. So tracking them as they get into their upper-level courses and start to do more computing around resources and having this resource available to them, we're kind of interested to see how it follows up.
One of the goals is then to also, when we talk about allow for a student or a faculty member in research to kick the tires, and if they decide that they really want this on their local systems, now we can go through the burden of what would it take to install. So it's cutting back on the number of installs and then people don't want it, and now they want you to get it off their machine. It's freed that up. But when they don't have a barrier to the install and they can get right into using the tool, then the install problems don't seem to be as large because they know what the tool can do as opposed to when they're trying to install it prior...
Mark Laufersweiler: They decide if there's any value in it.
Gerry Bayne: Yeah.
Mark Laufersweiler: And then decide if they want to go through the work of getting it installed.
Gerry Bayne: Yeah. It's exciting stuff. Well, Mark, Tyler, thank you so much for your time.
Tyler Pearson: Thank you.
Mark Laufersweiler: Thank you.
This episode features:
Research Data Specialist
University of Oklahoma
Director of Digital Scholarship and Data Services
University of Oklahoma