The OA Book Usage Data Trust and the University of Michigan Press convened a National Science Foundation-supported workshop in April to explore how best to leverage shared cyberinfrastructure to support cross-platform systems integrations and advance Findable, Accessible, Interoperable, and Reusable (FAIR) usage data. Christina Drummond, Executive Director of the Open Access Book Usage Data Trust, and Charles Watkinson, Associate Librarian for Publishing and Director of the University of Michigan Press, discuss the outcome of the workshop and future plans for this project.
Gerry Bayne: This is Gerry Bayne at the Coalition for Networked Information Spring 2023 Meeting. And I'm here with Christina Drummond, who's Executive Director of the OA Book Usage Data Trust based at the University of North Texas, and with Charles Watkinson, Associate University Librarian for Publishing and Director of the University of Michigan Press. He's also the 2022/2023 President of the Association of University Presses. Welcome guys.
Christina Drummond: Thanks for having us.
Charles Watkinson: Thank you, Gerry. Great.
Gerry Bayne: So you guys were awarded an NSF grant to create a workshop, and it's called Exploring National Infrastructure for Public Access Usage and Impact Reporting. Can you tell us a little bit about receiving that grant and what was your hopes in creating this workshop?
Christina Drummond: Absolutely. So we wanted to bring together international experts and leading scholarly cyber infrastructures to really explore what opportunities we have and shared interests we have around improving the fairness of usage data. At the OA Book Usage Data Trust, we've been working with a number of global stakeholders to ask questions around this issue, but they've all been book focused. And so we actually came together and said, well, where are those other interests? What does it mean now that we have a Nelson Memo? And we know that data journals and books are all going to be putting out there in increasing volumes in a publicly accessible way.
And so this particular workshop evolved to bring together diverse stakeholders, to identify those challenges, to cross-platform public and open analytics at scale. But to also identify what's needed to scaffold America's national infrastructure for the scholarly output impact reporting in light of the Nelson Memo, and I'll say also in light of what's happening in Europe. There have been some really interesting developments around the European Open Science Cloud and its interoperability framework and the work that we're leveraging within our data trust around industrial data spaces. Again, heavily funded through the European Commission and its Horizon Europe and Gaia-X grants. And so given what's happening in Europe, we wanted to ask this question about what should we be looking to do here?
Gerry Bayne: Why do you think it's important to focus on this topic?
Charles Watkinson: Yes. So just to personalize this a little bit, I'm a publisher and our organization, Michigan Publishing publishers, books, journals, data sets, and increasingly digital scholarship things that are actually not really fitting into those categories neatly. And more and more of those are open or public access. And achieving that open and public access involves a lot of commitment from the authors who are willing to do their work, the publisher, the library, and also funders. And I think we're all looking for the sort of return on investment, which is often expressed as evidence of impact. And we are getting lots and lots of usage information coming our way, but we really don't know how to best use that, how to best aggregate it, how to analyze it, how to communicate it in an ethical way that respects user privacy but also provides something usable for all those stakeholders who've invested so much effort.
And one of the particular issues we're facing is that usage information comes from lots of different channels. So even usage, just usage information comes through multiple platforms. And then you also have indicators of engagement from social media, from public policy documents. You have an awful lot of information coming the publisher's way, and the publisher wants to pass that back to the stakeholders and also coming to the library. And I think we are very unclear about how to do what with that and what it all means as well. How much of it could really be said to be evidence of impact on the real world?
Gerry Bayne: So this is what we were talking about before we started recording, and we had had a conversation about what does impact mean in terms of our bots reading it, how do you know that a real human is consuming your content? Is that right?
Charles Watkinson: Absolutely. And also a real human consuming the content, but what are they doing with it? What are the changes they're making in the world around them? What's the provable effect of this research on real world issues? And that's what we mean by impact, and we need to get from where we are now, which requires some fairly basic work to clarify what we mean by all the usage. But we really want to ultimately get at some really good ideas, stories of real impact on real global challenges.
Gerry Bayne: That makes sense. Do you want to add anything to that?
Christina Drummond: Yeah. If I can add, I think what we've been learning, and I'll start with from the book side, is that there's a lot of technologies that can facilitate this. You can write scripts to pull the data together. You can find ways. You know we have a number of services that exist today to aggregate information, share it back through dashboards, through usage metrics services. But the challenges really come out from, first of all, things like our counter standard, which is through usage reporting. It's not universally adopted. Many smaller organizations don't have the capacity to implement that standard.
So they may be using things such as Google Analytics. So there's a lot of variety within this as well. And one of the reasons we want to have this workshop is to look for economies of scale. If each organization that is pulling together this data is trying to figure out how do we compare the apples and oranges and bananas, can we do that in a shared infrastructure fashion to improve data quality, but to also make that data available in a more timely fashion while also reducing the resource burden on individual publishers, libraries, platforms, and services that are all today going through that process of data normalization and curation.
So that's why we're looking at should there be shared infrastructure that can pick up that burden for everyone?
Gerry Bayne: So what do you hope will come out of this project?
Charles Watkinson: So we had an awesome workshop yesterday. We also had contributors from outside the workshop providing their briefings and we're going to connect back with them. And there were four recommendations or areas of recommendation that of course we will refine. But those are, firstly, we really need more education and advocacy. We need to really catch up a number of organizations on using these basic tools. Things like persistent identifiers, things like these standardized measures that already exist. And there's a big gap for especially smaller publishers in using those. So that's the first recommendation area. Secondly, we need more clarity around rights. There's kind of a chilling effect at the moment because nobody exactly knows what can be done with people's usage information. And we need to separate out what's kind of legally possible. So statute law versus contract law versus just the norms, like what are good norms to adopt?
And that leads to the third area, which is around values and principles. Like this is a very touchy area, and especially as we come into looking at AI, reuse of data and privacy areas and just general concerns, we need to have a strongly value-based and principle-based framework. So building from the communities to articulate those values and principles and get them adopted. And then the fourth thing is that there's real necessity need for this kind of data trust, the application of this industrial data space methodology to this world. So industrial data spaces are used in many other industries with a much more complex competitive situations happening. It's an incredibly powerful framework, but now we really need to operationalize it. So there's a strong focus yesterday on the need for what was called a minimum lovable product, which was sort of like a minimum viable product, but something where we really use product management techniques to build out something really powerful. So that's what came out.
Christina Drummond: If I could just add something to clarify. One of the things I'll note about the industrial data spaces model or the IDS, is that it really serves to provide baseline data governance and stewardship across data sharing. When that data sharing involves not only public institutions and public data or openly available data, but proprietary sensitive private data. And so if you think of usage, that's being generated by not only public libraries, not only public repositories, but also commercial publishers, commercial platforms. And so we need something, and I'll [inaudible 00:09:13] note, it involves IP addresses that in some parts of our world are considered to be personal information. And so we need this type of platform to explore ways to simplify the data governance, the legal aspects of this data sharing, because that's really what the challenges are. It's not the technical side.
Charles Watkinson: And I should also credit our colleague Tasha Mellins-Cohen, who's the Executive Director of Counter for the minimum lovable product. It's my new favorite term.
Gerry Bayne: I like that phrase. So how can folks get involved in this? If there's somebody listening that is connected with the publishing aspect of higher ed and research, what would you recommend them to do?
Christina Drummond: So if they're interested in learning more about the data spaces application to usage data, I would recommend that they can start with our website, oabookusage.org. They're also welcome to reach out to myself by email if they'd like: [email protected].
Gerry Bayne: Is there anything else you'd like to add that we haven't touched on?
Charles Watkinson: I think this is great, Gerry, and I mean this is important because all of us are generating usage data all the time. And there are really good reasons why open-access advocates, public-access advocates need to be able to use that to really demonstrate this big question, all this investment in public access, all this investment in open access, what's it actually doing to create change in the world? But at the same time, it needs to be done in a really respectful, ethical principles, grounded way. Because this touches all of us and there are multiple potentials for misuse, and we just need to put safeguards around what happens and have a mutual understanding.
Christina Drummond: And I just want to add to that, I think one of the reasons why it's so important that we're talking about this today is to really think about how you do those safeguards. Right now in our industries with respect to usage data, everyone is like, well, we can solve that if we harvest the data, pull it together, and provide a service. There's inherent risk, there's privacy risk, there's institutional risk to having all that data pulled in one area. And I'll note, we're at this point of innovation where with the data spaces framework, it really is looking at data sovereignty and processing in a rule-based way, the data in transit. So it's different than we'll pull it into a repository and figure it out at that point. And so I'll just want to flag the innovative component of what we're trying to do at the same time.
Gerry Bayne: Great. Christina, Charles, thank you so much for your time.
Christina Drummond: Thank you.
Charles Watkinson: Thank you.
This episode features:
Executive Director for OA Book Usage Data Trust
University of North Texas
Associate University Librarian for Publishing and Director of the University of Michigan Press
University of Michigan