TL;DR — Building an LRS means committing to xAPI’s statement shape, picking indexes that survive analytics-style queries, and accepting that the schema is essentially fixed by the spec. Most of the engineering goes into query patterns and aggregation, not storage.
A Learning Record Store accepts xAPI statements, stores them, and lets you query them later. The xAPI specification fixes the statement shape, so the schema is not where the creativity goes. The creativity goes into indexes, retention, and aggregation queries that answer real questions about learner behaviour.
This is what running an in-house LRS inside Nexia Academy taught us.
The statement shape
xAPI fixes the statement at the top level: actor, verb, object, with optional result, context, timestamp, and authority. Each top-level field is itself a structured object.
Our storage table:
statements (
id UUID PRIMARY KEY,
tenant_id UUID NOT NULL,
actor_id TEXT NOT NULL, -- normalized from actor.mbox or account
verb_id TEXT NOT NULL, -- verb.id IRI
object_id TEXT NOT NULL, -- object.id IRI
stored TIMESTAMPTZ NOT NULL DEFAULT now(),
timestamp TIMESTAMPTZ NOT NULL,
statement JSONB NOT NULL, -- the raw statement
context_au TEXT, -- cmi5 Assignable Unit IRI if present
context_reg UUID, -- cmi5 registration if present
result_score REAL -- denormalized for fast queries
)
The denormalized columns are deliberate. JSONB extraction is cheap once but expensive in aggregations across millions of rows.
The indexes that matter
- (tenant_id, actor_id, timestamp DESC) — learner timeline; the most common query shape
- (tenant_id, object_id, timestamp DESC) — course or AU timeline; cohort analytics
- (tenant_id, verb_id, timestamp DESC) — verb-level analytics (“how many completions today?”)
- (tenant_id, context_reg) — full session replay for cmi5
- JSONB GIN index on
statementfor ad-hoc queries
The four query patterns we care about
- Learner timeline — every statement for a learner, time-ordered. The default report.
- Course completion funnel — for a course, what fraction of enrolled learners reached each milestone?
- Cohort comparison — for cohort A vs cohort B, completion rate, time-to-complete, average score.
- Per-question item analysis — for an assessment item, P-value (fraction correct), discrimination index, time spent.
Aggregation strategy
Rolling aggregations from the raw statements table get expensive past ~5M rows per tenant. We materialize daily roll-ups for the four query patterns above into separate tables and refresh nightly. Real-time queries hit the roll-up; “fresh today” toggles hit the raw table for the current day only.
What we wish we had done from day one
- Stored the actor identifier as a normalized canonical form, not a JSON path expression evaluated at query time
- Denormalized
result.score.scaledfrom the start (we added it later, painfully) - Tagged statements with the producing AU’s cmi5 launch URL on the way in; deriving it later from
contextis fragile - Captured
statement.timestampANDstatement.storedseparately — they diverge in mobile/offline scenarios - Built a “replay” debug view in week one, not month six
The validation layer
Every incoming statement is schema-validated against xAPI rules before storage: required fields, IRI well-formedness, ISO 8601 timestamps. Invalid statements get a 400 with a clear pointer to the offending path, not a generic error. This catches authoring-tool bugs early and prevents poisoned aggregations.
Frequently asked questions
When should I build vs buy an LRS?
Buy SCORM Cloud or Veracity if you need an LRS in two weeks. Build only when query patterns the SaaS LRS does not support are core to your product — typically deep multi-tenant analytics with custom aggregations.
PostgreSQL or a dedicated store for xAPI statements?
PostgreSQL with JSONB scales surprisingly well into millions of statements. Move to ClickHouse or similar only when aggregation queries cross the threshold where Postgres planning costs dominate.
Working on something similar?
T-Square architects, builds and operates production systems for learning, AI and custom software products. Talk to a senior engineer if you want a second opinion on your design or roadmap.
