Point auge at a video file and it samples frames at a fixed interval, then runs Apple's on-device Vision OCR and image classification on each one. No upload, no API key, no network — the same 100%-on-device promise, now over time.
Real public-domain footage. The output below is the exact, unedited result of running auge over the clip on the left — the same "no mocks" rule as the rest of this site.
NASA/JPL-Caltech — “Building the Mars 2020 Rover” timelapse. Public domain (NASA). Source on Wikimedia Commons. Transcoded to H.264 for AVFoundation.
$ auge --video nasa-mars2020-rover.mp4 --every 4s video: 20.02s · 6 sampled frames [t=0.00s] ocr: PIT CREW FOR MARS | Mars 2020 Gets Some Wheels | NASA | Jet Propulsion Laboratory | California Institute of Technology [t=0.00s] classify: structure (30%) [t=4.00s] classify: people (42%) [t=8.00s] ocr: On June 13, engineers at NASA's Jet Propulsion Laboratory installed the suspension system and a set of test wheels on the rover. [t=8.00s] classify: people (77%) [t=12.00s] classify: people (76%) [t=16.00s] ocr: The wheels will be replaced with flight models after the rover is shipped to Kennedy Space Center in Florida. [t=16.00s] classify: people (75%) [t=20.00s] classify: people (57%)
It reads the title card and the body captions verbatim, and notice the classification tracks the content over time: structure on the title slate, then people (up to 77%) once the engineers are at the bench. Same JSON envelope as every other mode — uniform snake_case, schema 2:
$ auge --video nasa-mars2020-rover.mp4 --every 8s --json --compact | jq { "mode": "video", "metadata": { "on_device": true, "schema": "2", "version": "1.8.0" }, "results": { "video": { "duration_seconds": 20.02, "frame_count": 3, "frames": [ { "time": 0, "ocr": ["PIT CREW FOR MARS", …], "classifications": [ … ] }, … ] } } }
NASA/JPL — PIA10149. Public domain (NASA). Source on Wikimedia Commons.
$ auge --classify nasa-jpl-pia10149.mp4 --every 5s video: 8.00s · 2 sampled frames [t=0.00s] classify: celestial_body, outdoor, recreation [t=5.00s] classify: celestial_body, sport # --trajectories on a video is honest about its limits: $ auge --trajectories nasa-jpl-pia10149.mp4 error: [error] --trajectories on video is not yet implemented; pass a single image frame
auge --video clip.mp4 --every 1s
OCR and classify every sampled frame.
auge --ocr clip.mp4 --every 2s
OCR only — read text as it appears over time.
auge --classify clip.mp4 --every 5s
Classify only — what's on screen, frame by frame.
.mp4 / .mov / .m4v / .avi / .mkv but not .webm or .ogv (Wikimedia's native formats) — those need a one-line ffmpeg transcode first. Modes that need spatial context across frames (--trajectories) refuse a video with a clear error rather than returning a fake empty result.
Video sampling rode in on a top-to-bottom audit of auge: a fan-out of reviewers read every source file, cross-checked usage against Apple's Vision docs, and adversarially verified each finding. Every confirmed bug below is fixed, with regression tests, and the suite is green.
--video, --ocr <video> and --classify <video> now work — path validation silently rejected every video extension, so the whole feature was unreachable.--trajectories on a video failed loudly instead of returning a fabricated empty result.--top N above 10 was capped at 10, and --min-confidence below 0.01 did nothing — classify now returns the full ranked list and honours both flags.auge --all plain/markdown output now renders all 23 capabilities — it was showing only 4 (the rest hid in the JSON).snake_case (schema 2); a failed encode now errors loudly instead of printing {}.--every values are rejected up front.