Overcoming Big Data Challenges: IMS Caliper Analytics®, Linked Data & JSON-LD

min read
computer screen showing code overlaid with this blog's title

The drive to deliver education at scale coupled with a demand for measurable accountability has spurred interest in the application of “big data” principles to the business of education. Opportunities to tap new data sources, ask new questions and pursue new insights have grown as both the learning technology ecosystem has expanded and the definition of what constitutes learning has evolved beyond the formal classroom experience to include informal, social and experiential modes of acquiring knowledge and skills.

The challenges inherent in describing, collecting and exchanging learning activity data originating from such diverse sources are now formidable.

The Caliper Analytics® specification attempts to address the underlying interoperability challenge posed by these shifts in the learning technology landscape.  Released in November 2015, Caliper 1.0 provides an information model and controlled vocabulary for describing learning activities. Caliper’s information model is composed of profiles, each of which describes a learning event or an event that facilitates learning. The profiles provide a standardized set of concepts and terms that application designers and developers can draw upon to describe user interactions.

Annotating a reading, playing a video, taking a test or grading an assignment submission represent a subset of the activities or events that Caliper attempts to describe. Caliper 1.0 also introduced the Sensor API™ for marshalling and transmitting Caliper events to a target endpoint for storage and analysis. If achieved, industry-wide adoption of the Caliper specification would offer institutions the prospect of a more unified learning data environment in which to build new and innovative services designed to measure, infer, predict, report and visualize.

For Caliper 1.1, we are focusing on extensions to the information model, slimming down our event payload and refining our use of JavaScript Object Notation for Linked Data (JSON-LD) to represent Caliper event data. This last aspect of 1.1 work reflects our interest in enabling both data and semantic interoperability and forms the subject of this blog post.

Over the last decade, the advent of cloud-based, networked applications have led to changes in the way data is structured and represented. Data once considered strictly hierarchical such as a curriculum, a course roster or a transcript now frequently link out to other kinds of data.  Modeling “bundles of data pointing to other bundles of data” now requires thinking in terms of graphs. Caliper event data present us with similar structures. A Caliper Event may link to user or group data, institutional/organizational data, digital resources, courses and rosters, grades and credentials, application and session data and so on. JSON-LD provides us with the “representational horsepower” to describe these kind of data linkages and specify how data is to be understood when published and shared across a network.*]

The linked data principles first outlined by Tim Berners-Lee that inform today’s Semantic Web technologies as well as JSON-LD are relatively straightforward: use URIs as names for things; use HTTP URIs so that information about things (e.g., people, objects, concepts) can be retrieved using a standard format; refer to other relevant things by way of their HTTP URI identifiers to encourage further discovery of new relationships between things.

JSON-LD abides by these “rules”. It features a lightweight syntax and a JSON-based format for serializing linked data. It requires that globally-scoped entities and their attributes be uniquely identifiable using IRIs/URIs. JSON-LD also provides a means of expressing relationships between entities in one or more directed graphs. For machine-to-machine data exchange, JSON-LD provides a crucial mechanism for rendering comprehensible the underlying semantics of a JSON Document via a mapping of its terms to one or more published vocabularies.  

In a world where learners are interacting increasingly with an array of learning applications, the need to blend learning data generated from multiple sources and discern its meaning across application boundaries is of vital importance.

Let’s examine how Caliper leverages JSON-LD. We will draw our example Event from the Caliper Forum Profile (new for 1.1). The Forum Profile models a set of activities associated with online discussions involving instructors and students. The profile currently includes a ForumEvent, MessageEvent, NavigationEvent, ThreadEvent and ViewEvent.  Each Event describes a relationship formed between two entities, an actor and an object, resulting from some purposeful action undertaken by the actor in relation to the object at a moment in time and (optionally) within a given learning context. The Event properties actor, action and object form a data triple that echoes an RDF triple linking a subject to an object via a predicate. An action sequence mediated by the Forum Profile might involve a learner navigating to a forum, subscribing to it, viewing a thread, posting a message in reply to an earlier post and then marking the message as read.

Now consider the following statement:

“Person X posted Message Y at 2016-12-15T10:15:00.000Z.”

We can represent this assertion as a Caliper MessageEvent expressed as a JSON-LD document to be serialized and sent over the wire to a target endpoint (see example 1 below).  To keep the illustration simple, I’ve stripped out of the document nearly all optional Caliper Event and Entity properties that would otherwise provide important details about the learning context in which this activity is situated.  

Example 1: Minimal Caliper MessageEvent referencing a remote JSON-LD context

{
  "@context": "http://purl.imsglobal.org/ctx/caliper/v1p1",
  "type": "MessageEvent",
  "actor": {
    "id": "https://example.edu/users/554433",
    "type": "Person"
  },
  "action": "Posted",
  "object": {
    "id": "https://example.edu/sections/1/forums/2/topics/1/messages/2",
    "type": "Message",
    "body": "What does Caliper Event JSON-LD look like?",
    "dateCreated": "2016-12-15T10:15:00.000Z"
  },
  "eventTime": "2016-12-15T10:15:00.000Z",
  "uuid": "0d015a85-abf5-49ee-abb1-46dbd57fe64e"
}

Note that the document (i.e., the stuff inside the outer curly braces {…}) is well-formed JSON that’s easy to generate and easy to parse. Yet the document also features a @context, a special JSON-LD keyword that points to, in this case, a remote JSON document designed to map each term or key employed to an IRI (e.g., http://purl.imsglobal.org/caliper/MessageEvent) that links the term to a controlled vocabulary.  Inclusion of a JSON-LD context provides an economical way of communicating document semantics to services interested in consuming Caliper event data.

The Caliper entities referenced in the document (Person, Message) are largely self-describing via a type property set to terms defined in the active context. Each entity is also provisioned with a unique identifier. If the entity is a web resource the identifier should be in the form of a dereferenceable IRI; i.e., one capable of returning a representation of the resource. Enumerated values such as the “Posted” action can also be defined in the context as can values with associated data types (e.g., string, integer, boolean, datetime) that should be mapped to properties like body, dateCreated, eventTime and uuid.  Although intended for machine-to-machine exchange and consumption this Caliper document is human-readable and easily understood by the educated layperson. All of this is in keeping with linked data principles.

I can also define a “local” context embedding term definitions within the document itself (see example 2 below). The @context viewed here is a slimmed down version of the Caliper context that includes only those terms relevant to describing our minimal MessageEvent.

Example 2. Minimal Caliper MessageEvent with a locally defined JSON-LD context [gist.github.com/arwhyte/fb5b8fd8f8a0e408d9de3518be579494]

{
  "@context": {
    "id": "@id",
    "type": "@type",
    "caliper": "http://purl.imsglobal.org/caliper/",
    "verb": "http://purl.imsglobal.org/vocab/caliper/action#",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "MessageEvent": "caliper:MessageEvent",
    "Message": "caliper:Message",
    "actor": "caliper:actor",
    "action": {"@id": "caliper:action","@type": "@vocab"},
    "object": "caliper:object",
    "body": {"@id": "caliper:body","@type": "xsd:string"},
    "dateCreated": {"@id": "caliper:dateCreated","@type": "xsd:dateTime"},
    "eventTime": {"@id": "caliper:eventTime","@type": "xsd:dateTime"},
    "uuid": {"@id": "caliper:uuid","@type": "xsd:string"},
    "Posted": "verb:Posted"
  },
  "type": "MessageEvent",
  "actor": {
    "id": "https://example.edu/users/554433",
    "type": "Person"
  },
  "action": "Posted",
  "object": {
    "id": "https://example.edu/sections/1/forums/2/topics/1/messages/2",
    "type": "Message",
    "body": "What does Caliper Event JSON-LD look like?",
    "dateCreated": "2016-12-15T10:15:00.000Z"
  },
  "eventTime": "2016-12-15T10:15:00.000Z",
  "uuid": "0d015a85-abf5-49ee-abb1-46dbd57fe64e"
}

You can cut-and-paste the above example into the json-ld.org playground textarea input and explore the current crop of JSON-LD parsing algorithms designed to transform or re-shape the document in various ways (compaction, expansion, framed, N-Quads, etc.).[9]

Both these examples demonstrate how we can provision Caliper Event data representations with a simple mechanism designed to enhance comprehensibility as well as enforce accuracy of the terms shared among Caliper documents rendered from anywhere in the learning technology ecosystem. Nevertheless, our JSON-LD representation of a MessageEvent remains at heart a developer-friendly JSON document that can be rendered deterministically if required.

Caliper’s learning activity profiles provide a common, curated vocabulary for describing learning interactions at scale. JSON-LD provides a ready mechanism for communicating across a network what we mean by the terms we employ. The two in combination provide academic institutions and EdTech organizations a path to interoperable and well understood data representations of learning interactions.  It should prove a powerful combination.


Anthony Whyte is a member of the ITS Teaching and Learning Team at the University of Michigan and co-chair of the IMS Caliper Working Group.


*Phrases drawn from a conversation with my friend John Tibbetts, an enterprise architect whom I admire greatly.  15 December 2016.

Author’s notes: Officially JSON-LD specifies that objects be identified by International Resource Identifiers (IRIs) but folks can be easily forgiven if they think in terms of URIs/URLs.
The JSON-LD community has also defined an API for processing JSON-LD documents. For an interesting glimpse into the motivations prompting the creation of JSON-LD, see Manu Sporny: JSON-LD and Why I hate the Semantic Web [http://manu.sporny.org/2014/json-ld-origins-2/].