auge sees video — Apple Vision over video frames, from your terminal

01Reading a NASA timelapse, frame by frame

Real public-domain footage. The output below is the exact, unedited result of running auge over the clip on the left — the same "no mocks" rule as the rest of this site.

NASA/JPL-Caltech — “Building the Mars 2020 Rover” timelapse. Public domain (NASA). Source on Wikimedia Commons. Transcoded to H.264 for AVFoundation.

auge --video

$ auge --video nasa-mars2020-rover.mp4 --every 4s
video: 20.02s · 6 sampled frames
[t=0.00s] ocr: PIT CREW FOR MARS | Mars 2020 Gets Some
          Wheels | NASA | Jet Propulsion Laboratory |
          California Institute of Technology
[t=0.00s] classify: structure (30%)
[t=4.00s] classify: people (42%)
[t=8.00s] ocr: On June 13, engineers at NASA's Jet
          Propulsion Laboratory installed the suspension
          system and a set of test wheels on the rover.
[t=8.00s] classify: people (77%)
[t=12.00s] classify: people (76%)
[t=16.00s] ocr: The wheels will be replaced with flight
          models after the rover is shipped to Kennedy
          Space Center in Florida.
[t=16.00s] classify: people (75%)
[t=20.00s] classify: people (57%)

It reads the title card and the body captions verbatim, and notice the classification tracks the content over time: structure on the title slate, then people (up to 77%) once the engineers are at the bench. Same JSON envelope as every other mode — uniform snake_case, schema 2:

auge --video … --json --compact | jq

$ auge --video nasa-mars2020-rover.mp4 --every 8s --json --compact | jq
{
  "mode": "video",
  "metadata": { "on_device": true, "schema": "2", "version": "1.8.0" },
  "results": { "video": {
    "duration_seconds": 20.02,
    "frame_count": 3,
    "frames": [ { "time": 0, "ocr": ["PIT CREW FOR MARS", …], "classifications": [ … ] }, … ]
  } }
}

02And a JPL animation

NASA/JPL — PIA10149. Public domain (NASA). Source on Wikimedia Commons.

auge --classify

$ auge --classify nasa-jpl-pia10149.mp4 --every 5s
video: 8.00s · 2 sampled frames
[t=0.00s] classify: celestial_body, outdoor, recreation
[t=5.00s] classify: celestial_body, sport

# --trajectories on a video is honest about its limits:
$ auge --trajectories nasa-jpl-pia10149.mp4
error: [error] --trajectories on video is not yet
implemented; pass a single image frame

auge --video clip.mp4 --every 1s

OCR and classify every sampled frame.

auge --ocr clip.mp4 --every 2s

OCR only — read text as it appears over time.

auge --classify clip.mp4 --every 5s

Classify only — what's on screen, frame by frame.

Honest about limits. auge decodes video through Apple's AVFoundation, which on a stock Mac reads .mp4 / .mov / .m4v / .avi / .mkv but not .webm or .ogv (Wikimedia's native formats) — those need a one-line ffmpeg transcode first. Modes that need spatial context across frames (--trajectories) refuse a video with a clear error rather than returning a fake empty result.

03What shipped in v1.8.0 — a full audit

Video sampling rode in on a top-to-bottom audit of auge: a fan-out of reviewers read every source file, cross-checked usage against Apple's Vision docs, and adversarially verified each finding. Every confirmed bug below is fixed, with regression tests, and the suite is green.

source files audited

fixes shipped

308

unit tests passing

network calls (still)

Video a feature that was dead on arrival

--video, --ocr <video> and --classify <video> now work — path validation silently rejected every video extension, so the whole feature was unreachable.
--trajectories on a video failed loudly instead of returning a fabricated empty result.
Help text documented a video command that never worked; now it documents the real ones.

Correctness results that were quietly wrong

--top N above 10 was capped at 10, and --min-confidence below 0.01 did nothing — classify now returns the full ranked list and honours both flags.
Float16 Core ML outputs were emitted as all-zeros; now decoded for real.
Face-landmark points were face-box-relative but labelled like image coords — now mapped into the same image space as every other point.
Feature-print buffer reads are bounds-checked; optical-flow sampling stride fixed.

Honesty the site's first rule, applied inward

auge --all plain/markdown output now renders all 23 capabilities — it was showing only 4 (the rest hid in the JSON).
README and badges corrected to the real macOS 26 (Tahoe) baseline (they claimed 10.15+).
JSON keys unified to snake_case (schema 2); a failed encode now errors loudly instead of printing {}.

Robustness edges that could bite

Compiled Core ML temp models are cleaned up; oversized PDF pages are capped before rasterizing.
Directories, symlinked dirs, and non-finite --every values are rejected up front.
Mask-coverage guards a mismatched buffer; document text now includes lists and tables, not just paragraphs.

Method. The audit ran as a multi-agent workflow — parallel reviewers per file cluster plus three research tracks (Apple Vision API correctness, OpenAI-pipeline fit, golden-goal/honesty compliance) — with every finding adversarially re-verified against the source before it was accepted. Fixes preserve the project's non-negotiables: 100% on-device, zero dependencies, Swift 6 strict concurrency.