Video delivery at scale — build vs buy (TGF-338)¶
Spike deliverable 1 of 4. Companion to ADR-0009 (the decision), migration-plan.md, and backend-design.md.
This document compares the four delivery stacks named in TGF-338 — Mux, Cloudflare Stream, self-hosted R2 + Worker (the v1 stack, extended), and self-hosted AWS (S3 + CloudFront + MediaConvert) — on cost, developer experience, feature fit, and operational burden, and identifies the crossover points. It answers ticket questions #1, #2, #4, and #5.
TL;DR¶
For this catalog — large (~2.95 TiB), static history, already 99% H.264/AAC, served from inside the Cloudflare ecosystem with free egress — extending the self-hosted R2 + Worker stack wins at every viewing volume. Managed services (Mux, CF Stream) and AWS CloudFront all start at a higher fixed cost and grow with delivered minutes; R2 stays flat at roughly $120/mo because egress is free and storage is cheap. The managed services' value (turnkey transcoding, bundled analytics) does not offset a 3–5× monthly premium for a catalog that transcodes once and never churns.
The one piece worth buying is player analytics (Mux Data), which is sold decoupled from delivery, has a free tier, and works with the existing hls.js/vidstack player — so we get QoS metrics without the delivery premium.
Catalog sizing assumptions¶
All figures are projected to ~1.5× accuracy — enough to separate "\(100/mo" from "\)1,000/mo", per the same bar the v1 cost model used.
| Input | Value | Basis |
|---|---|---|
| Catalog size on disk | 2.95 TiB (~3,242 GB) | video_stats.md (TGF-337) |
| Games | ~1,204 | TGF-337 catalog report |
| Catalog content-minutes | ~120,000 min (~2,000 hr) | 3,242 GB ÷ ~27 MB/min mean (bitrate centroid ~2–3 Mbps, weighted to the 0.5–1 Mbps SD majority); ~1.5× estimate |
| Mean watched minutes / stream | ~30 min | PoC condensed game ran 32 min; users watch portions |
| ABR ladder storage multiple | ~2–2.5× source | 240/480/720/1080 renditions vs single remux (v1 was ~1×) |
Viewing-volume scenarios (carried from the v1 cost model, now expressed in delivered minutes at ~30 min/stream):
| Scenario | Streams/mo | Delivered min/mo |
|---|---|---|
| Pilot | 100 | 3,000 |
| Small | 1,000 | 30,000 |
| Active | 10,000 | 300,000 |
| Heavy | 100,000 | 3,000,000 |
Published pricing used (USD, early 2026)¶
Prices drift; treat as order-of-magnitude. Sources are each vendor's public pricing page as of the spike.
| Stack | Storage | Delivery / egress | Transcode | Notes |
|---|---|---|---|---|
| R2 + Worker | $0.015/GB-mo | $0 egress | one-time batch (own compute) | Workers $5/mo + $0.30/M reqs >10M; Class B reads $0.36/M |
| CF Stream | $5 / 1,000 min stored-mo | $1 / 1,000 min delivered | included | Bills on source minutes; renditions managed for you |
| Mux Video | ~$0.003/min-mo (encoded) | ~$0.001/min delivered | ~$0.0072/min one-time encode | Mux Data analytics sold separately (free < 100k views/mo) |
| AWS | S3 $0.023/GB-mo | CloudFront ~$0.085/GB egress | MediaConvert ~$0.0075–0.015/min | Egress is the dominant term |
Projected monthly cost¶
ABR-ladder storage is taken as ~6–7.5 TB for the self-hosted stacks (renditions + retained originals). Delivered GB assumes ~22.5 MB/min (~3 Mbps avg rendition).
| Scenario | R2 + Worker | CF Stream | Mux Video | AWS (S3+CF) |
|---|---|---|---|---|
| Pilot (3k min) | ~$120 | ~$603 | ~$363 | ~$146 |
| Small (30k min) | ~$120 | ~$630 | ~$390 | ~$197 |
| Active (300k min) | ~$121 | ~$900 | ~$660 | ~$714 |
| Heavy (3M min) | ~$145 | ~$3,600 | ~$3,360 | ~$5,878 |
| One-time transcode | ~$300–500 | included | ~$864 | ~$900–1,800 |
| Fixed storage floor | ~$115/mo | ~$600/mo | ~$360/mo | ~$140/mo |
Reading the table¶
- R2 is flat. Storage dominates and is fixed; reads/Workers stay in the single-to-low-double digits even at 100k streams because R2 reads are $0.36/M and egress is free. This is the same dynamic the v1 cost model found, carried forward with a larger (ABR) storage base.
- CF Stream and Mux are storage-bound at low volume. Their per-stored-minute pricing puts the floor at $360–600/mo before anyone watches anything — 3–5× the entire R2 bill. They only approach parity-of-growth at Heavy volume, and even then they are 20–30× more expensive in absolute terms.
- AWS is cheapest among the alternatives at Pilot/Small but the worst at scale — CloudFront egress grows linearly with delivered minutes and crosses R2 between Small and Active, reaching ~40× R2 at Heavy.
Crossover points¶
- There is no viewing volume at which any alternative beats R2 for this catalog. R2's storage-only floor (~\(115/mo) sits *below* the managed services' storage-only floors (\)360–600/mo), and its marginal delivery cost is ~$0.
- AWS's fixed cost (~$140/mo) is within ~20% of R2's, but its marginal cost (egress) makes the total cross above R2 at roughly 40,000 delivered min/mo (~Small+) and diverge sharply after.
- The managed services would only win if the catalog were small and churning (storage cheap, transcoding frequent) — the opposite of Griddy's static archive — or if their bundled features were independently worth the premium (addressed below).
Developer experience¶
| Stack | DX | Verdict |
|---|---|---|
| Mux | Best in class. Upload → playback URL in one API call; dashboard, signed playback, Data analytics, clipping API all turnkey. | Highest, but we have already built the equivalent gate/packaging for v1. |
| CF Stream | Strong; single API, native to our Cloudflare account, Stream Player included. | High; least-effort managed option given we are already on Cloudflare. |
| R2 + Worker | We own it. v1 already shipped the Worker gate, signed cookie, CORS, and -c copy packaging. Marginal DX cost for v1.5 is the ABR ladder + clip service. |
Moderate up-front, zero vendor lock-in. The hard parts are done. |
| AWS | Heaviest. MediaConvert job templates, CloudFront signed cookies, OAC, S3 lifecycle — most moving parts. | Lowest; only justified if live streaming (MediaLive) were on the roadmap. It is not. |
Feature fit (v1.5 needs)¶
| Need (ticket) | Mux | CF Stream | R2 + Worker | AWS |
|---|---|---|---|---|
| ABR ladders (240/480/720/1080) | Automatic | Automatic | Batch transcode (one-time); player already consumes multi-rendition manifests (ADR-0008) | MediaConvert |
| Clip playback across catalog (TGF-339) | Clipping API (per-asset; cross-asset is manual) | Clipping (per-video) | Synthesized clip-manifest over stored segments — no re-encode, no new storage (see backend-design.md) | Manual (packager + manifest gen) |
| Analytics / QoS (#5) | Mux Data (best) | Stream analytics (good) | Build, or bolt on Mux Data standalone | Build (CloudFront logs + own pipeline) |
| Signed access at scale (#7) | Signed playback tokens | Signed URLs / tokens | Existing Django-token → Worker → signed-cookie; extend to session/entitlement scope | CloudFront signed cookies |
| DRM (#4) | Add-on (Widevine/FairPlay/PlayReady) | Add-on | Not built | SPEKE + DRM vendor |
Clip generation (#6) is the deciding feature for TGF-339 and the one place
self-hosting is better, not just cheaper: because we own the segment store, a
clip is a generated HLS media playlist that references the existing ABR
segments overlapping [start, end] — zero new storage, zero re-encode, segment
(~6 s) granularity for v1.5 with optional boundary trimming later. The managed
clipping APIs are per-asset and would force either re-rendering each clip
(storage/compute blow-up, the "server-side concat" anti-pattern the ticket
flags) or paying to store derived clips. Detailed in
backend-design.md.
DRM (#4) — decision: defer¶
The v1.5 catalog is owned/coaching and archival football film, not licensed broadcast content. The signed-cookie gate (entitlement enforced in Django, verified per-request at the Worker) is the appropriate control. DRM is deferred. Revisit only if Griddy licenses third-party broadcast footage whose rights contract mandates it — at which point Mux/CF Stream's DRM add-on becomes a reason to put that subset on a managed origin, not a reason to re-platform the whole catalog.
Analytics & QoS (#5) — bar for v1.5¶
Track per playback session: startup time, rebuffer ratio, completion rate, fatal error rate, plus rendition/bandwidth distribution. Self-hosting delivery does not mean building analytics: Mux Data is sold standalone, integrates with hls.js/vidstack via a small SDK, and is free under ~100k monthly views — covering Pilot through Active for ~$0. This is the recommended way to hit the QoS bar without taking on the delivery premium.
Ops burden¶
| Stack | Ongoing ops |
|---|---|
| R2 + Worker | A Worker deploy + an R2 bucket (both already in v1). Batch transcode is one-time for a static catalog. Lowest ongoing burden precisely because nothing recurs. |
| CF Stream / Mux | Near-zero ops, but a recurring bill and a vendor dependency for the core product surface. |
| AWS | Highest: MediaConvert templates, CloudFront distributions/invalidation, signed-cookie key rotation, S3 lifecycle. |
The ops argument for "buy" is strongest when transcoding is continuous — i.e. once user-generated uploads land. That is out of scope for v1.5 (the catalog is fixed) but is the documented trigger to revisit (see ADR-0009).
Recommendation¶
Extend the self-hosted R2 + Worker stack for v1.5. Add a one-time ABR transcode pass, build the clip-manifest service in-house, and adopt Mux Data for player QoS. Keep Mux Video documented as the escape hatch for the day user uploads create a persistent transcode pipeline. Rationale and consequences are recorded in ADR-0009.