- This short report summarizes the results of a survey of the Coalition for Networked Information academic membership on authentication and authorization practices related to sharing user information with external content providers.
- Basically, content suppliers to research libraries authorize users either by origin IP address (perhaps established via a proxy), or by obtaining and examining user attributes from a trusted source such as a Shibboleth identity service.
- With proxy-based address authentication, re-identification of users by the content supplier and the subsequent reuse of that re-identified data does not seem to be much of a consideration by responding institutions.
- It seems that little is being done in terms of either content vendors returning usage data faceted by user attributes passed to them, or detailed use logs that include anonymized unique identifiers (allowing the institution to re-attach attributes).
In June and July 2016 the Coalition for Networked Information (CNI) conducted a brief e-mail survey of its college and university members on authentication and authorization practices related to sharing user information with library-licensed external content providers (publishers, platform providers and aggregators). The survey was sent to member representatives representing both the library and information technology organizations at some 190 institutions. We asked about both technical and contractual approaches to the control and management of this data.
These results should be read with a number of strong caveats. We had responses from about 60 institutions, and we claim no statistical rigor in this work. This is a complex and nuanced area, and often the answer is "it varies from content supplier to content supplier." Not all responses were entirely clear. These results are best viewed as giving a sense of what's actually being done at present, and perhaps as offering some insight into trends and underlying thinking.
Results of the Survey
Basically, content suppliers to research libraries authorize users either by origin IP address, or by obtaining and examining user attributes from a trusted source (using Shibboleth as a mechanism and the InCommon Trust Federation as a business framework). Slightly over half of the respondents had implemented Shibboleth (with larger universities outnumbering smaller ones by about two to one). However, very few reported that they were using Shibboleth for content resources; most of the applications were in other areas. Even those using it for content resources said that it was only used selectively, with JSTOR, Project MUSE, and HathiTrust cited as the examples. It is worth noting that the list of InCommon sponsored partners [https://www.incommon.org/participants/] includes a number of major commercial and nonprofit publishers such as Elsevier and the Association for Computing Machinery, and while we did not specifically ask respondents about these, nobody cited them as examples. About a dozen responses indicated that they were passing personally identifiable data (names, e-mail addresses, etc.) in attributes. Note that there is a much shorter list of Research and Scholarship Service Providers [https://incommon.org/federation/info/all-entity-categories.html#SPs] that are part of the InCommon infrastructure, and at present there are no publishers on this list; most respondents say that they will pass personally identifiable data to these services.
All but five respondents have EZproxy or some variant (the remaining few are using VPN-based solutions). This seems to be the main (or only) way to handle access to off-campus content resources, and with IP-based authentication no personally identifiable data is passed to the content suppliers. A number of respondents use Shibboleth to manage access to the EZproxy system. Note that many content suppliers seem to have no plans to support Shibboleth, so EZproxy or something similar is clearly going to be required on an ongoing basis; recognizing this, some institutions simply went with EZproxy as a standard mechanism for all external resources. Several respondents also noted that it was easy with a proxy solution to ensure that no personal data was passed to content suppliers, and that this was entirely within the library's control, avoiding complex discussions and potential lack of clarity about attribute release policies.
We asked whether contracts with content suppliers contained language limiting collection, retention and reuse or resale of data about users and their activities. About 15 institutions made at least some effort to include language limiting retention or resale, though often this was inconsistent from one contract to the next. A number noted that this wasn't much of an issue because they weren't passing any personal data to the content suppliers in the first place. Re-identification of users by the content supplier (by, for example, soliciting e-mail addresses to get notifications of new content) and the subsequent reuse of that re-identified data does not seem to be much of a consideration by responding institutions.
Finally, a number of respondents mentioned contractual provisions for content providers to provide usage data back to institutions, most commonly following the NISO SUSHI work [http://www.niso.org/workrooms/sushi/faq/general] and Project COUNTER. Given the apparently very limited use of attribute passing to content providers, however, it seems that little is being done in terms of either content vendors returning usage data faceted by user attributes passed to them, or very detailed use logs that include anonymized unique identifiers passed from the institutions and returned to the institution, where they can be de-anonymized at various levels of specificity.
We have no immediate next steps planned for this work, though we are having conversations with the STM publishing community, NISO, and the TIER initiative. Readers may also be interested in a lengthy, much broader article on privacy and reader analytics that is currently in the final stages of preparation by the author.
My thanks to the member representatives who took the time to respond to this survey; in many cases this involved considerable coordination within their institutions. I am particularly grateful for those who not only shared what they were doing, but some of the reasoning behind the choices that they had made. Thanks also to the CNI Steering Committee members who helped refine the questions we asked, and to Joan Lippincott and Diane Goldenberg-Hart of CNI for their help with the process.
Clifford Lynch is director of the Coalition of Networked Information.
© 2016 Clifford Lynch. The text of this article is licensed under Creative Commons BY 4.0.