ADR-0008: v1 game-film delivery — R2 + Worker-gated HLS¶
Status¶
Proposed.
The substantive investigation is the TGF-337 spike and its Confluence
page (TGF-337: Video Delivery MVP). This ADR records the decision that
spike points to; it moves to Accepted once the proof-of-concept plays a
game end-to-end with auth gating in place.
Context¶
TGF-335 needs to play existing game film in the browser, and the user wants video from v1 even on basic infrastructure. The TGF-337 spike characterized the catalog and weighed delivery options. Three findings constrain the decision:
- The catalog is HLS-friendly already. 1,203 files, 2.95 TiB; 99.3%
H.264 video, 99.8% AAC audio, and every sampled file has a regular
keyframe interval ≤10s (98.7% ≤6s). That is exactly the shape that
segments into HLS with a lossless
ffmpeg -c copyremux — no re-encode — except for ~10 outlier files (VP9/AV1/HEVC video, Opus/AC-3 audio) that need a one-off transcode. - Storage is decided: Cloudflare R2. No egress fees and S3 compatibility, consistent with deploying the portal on Cloudflare Pages (ADR-0002). Roughly \(150/mo per 10 TB versus ~\)215/mo plus ~$75/TB egress on S3.
- HLS fans out into many requests. A stream is a manifest plus hundreds of segment objects, each a separate HTTP request. R2 presigned URLs authorize a single object, work only on the S3 endpoint (not a custom domain), and cap at a 7-day expiry — so they cannot gate a stream. Access control has to sit in front of every request.
Auth already exists end-to-end: the portal authenticates with Clerk (ADR-0006) and the Griddy API validates Clerk JWTs (TGF-312).
Decision¶
Serve v1 game film as HLS from Cloudflare R2, gated by a Cloudflare Worker on a dedicated video origin, with manifests/segments produced by a one-time batch remux. Concretely:
- Packaging. A one-time batch job runs
ffmpeg -c copy(or Shaka Packager / Bento4mp4hls) per game → CMAF/fMP4 segments +.m3u8at a ~6s target, uploaded to R2. The ~10 non-conforming files get a transcode pass first. Not on-demand packaging — the catalog is static history. - Access control. The portal calls
GET /api/games/{id}/playbackwith its Clerk JWT. Django checks entitlement and returns the manifest URL plus a short-lived, game-scoped token. A Worker bound to the R2 bucket (e.g.video.griddy.football) validates that token — exchanging it once for a Worker-set signed cookie so it rides every subsequent segment request — and streams objects from its bucket binding. Unauthorized requests to the manifest or any segment get a 403. - CORS. The Worker emits explicit allowed origins (the portal dev and
prod origins, never
*, because requests are credentialed) withAccess-Control-Allow-Credentials: true,GET, HEAD,Rangeallowed, andContent-Range/Accept-Ranges/Content-Length/ETagexposed for scrubbing. - Player. The portal uses
@vidstack/react, which wrapshls.jsfor cross-browser HLS and falls back to native HLS on Safari, withcrossorigin="use-credentials"so segment fetches carry the cookie.
Consequences¶
- A new deployable enters the picture. The portal itself stays browser-only (no server component), but v1 video introduces a Cloudflare Worker and an R2 bucket as infrastructure. Same vendor as Pages (ADR-0002), so no new account or operational surface — but it is the first server-side code the project owns, and needs its own deploy and logging story.
- The v1 PoC validates this design locally, at zero cloud spend. The
Worker runs under
wrangler devagainst Miniflare's local R2 simulation, and the packaging pipeline writes to that local store — so the auth gate, signed cookie, CORS, and range/scrubbing behavior are exercised exactly as in production while provisioning nothing in Cloudflare. Promoting to real R2 (and the custom video origin) is a binding/config change, not a code change. This also keeps the apex-domain dependency off the PoC's critical path. - Auth stays single-sourced. Entitlement decisions live in Django against the existing Clerk JWTs; the Worker only verifies a token Django minted. No second identity surface.
- Credential lifetime must outlive a segment fetch. A game runs hours, so the signed cookie carries a few-hours TTL (or the portal silently re-mints the token). Baking one short expiry into per-segment URLs would make late segments 403 mid-game — explicitly avoided.
- Packaged storage roughly doubles catalog footprint. Remux is ~1× the source, so budget ~3 TiB of segments alongside the originals (well under ~$50/mo on R2, zero egress).
- Cheap path to v1.5 ABR. Vidstack + hls.js consume multi-rendition manifests with no client change, so adding a bitrate ladder later (TGF-338) is a packaging concern, not a player rewrite.
@vidstack/reactis a single-PR-reversible choice. Per the ADR bar it would normally live in the README stack table; it is recorded here only because it is inseparable from the delivery approach. Swapping it for barehls.js+<video>later does not require a new ADR.
Alternatives considered¶
Serve raw progressive MP4 from R2, no packaging (ticket Option C-ish). Genuinely viable: 99.8% of files are progressive MP4/H.264/AAC and play natively via HTTP range requests with zero pipeline. Rejected as the headline approach because player assumption #8 commits to HLS and HLS buys adaptive bitrate for v1.5 — but it remains the fallback if the packaging job slips, since it ships nothing new.
S3 + CloudFront with signed URLs / signed cookies (ticket Option A). The most familiar pattern and CloudFront signed cookies solve the fan-out cleanly. Rejected on cost (egress) and vendor consistency — we are already all-in on Cloudflare for Pages and storage.
R2 presigned URLs as the access mechanism (ticket Option B, naive form). Rejected on hard constraints: one object per URL, S3 endpoint only (no custom domain), 7-day max expiry. Workable for a single-file download, unworkable for a multi-segment stream.
On-demand / just-in-time packaging in the Worker. More flexible, no pre-baked storage cost. Rejected for v1 — the catalog is static, so JIT adds latency and complexity for no benefit. Revisit under TGF-338 if the library grows or becomes dynamic.
Reference¶
- TGF-337 — Spike: Minimum viable video delivery for v1, and its Confluence page TGF-337: Video Delivery MVP (catalog report + question-by-question determinations).
- TGF-335 — the player-component story this unblocks.
- TGF-338 — video delivery at scale (ABR ladders, JIT packaging) for v1.5+.
- ADR-0002 (Cloudflare Pages) — the platform-consistency principle.
- ADR-0006 (Clerk) / TGF-312 — the auth surface the playback endpoint builds on.