v0.2.0 · MIT Licensed

See. Understand.
Perceive.

A powerful Python CLI for video understanding using Google's Gemini API. Native video processing, YouTube support, cost estimation—all from your terminal.

$ pip install merleau

Features

True Video Understanding

The first CLI that actually understands video—not just frames.

Native Video Processing

Upload and analyze videos directly with Gemini's native multimodal understanding. No frame extraction required.

YouTube Support

Analyze YouTube videos directly via URL. Gemini's preview feature enables seamless video understanding from the web.

Long Video Support

Process videos up to 2+ hours with Gemini's 2 million token context window. Perfect for lectures, meetings, and documentaries.

Audio + Visual

Combined audio-visual analysis without separate transcription steps. Understand speech, music, and visuals together.

Cost Transparency

See token usage and estimated costs for every analysis. Budget-friendly at $0.11-0.32 per hour of video.

Simple CLI

One command to analyze any video. Customize prompts, choose models, and get results in seconds.

Quick Start

Perception in Seconds

Install and analyze your first video in under a minute.

# Install the package
$ pip install merleau

# Set your Gemini API key
$ export GEMINI_API_KEY="your-api-key"

# Analyze a video
$ ponty video.mp4
Uploading video: video.mp4
Upload complete. File URI: files/abc123
Waiting for file to be processed...
File state: ACTIVE

Analyzing video with gemini-2.5-flash...

--- Video Analysis ---
The video shows a product demonstration...

--- Usage Information ---
Prompt tokens: 45,231
Response tokens: 847
Total tokens: 46,078
Estimated cost: $0.007
# Summarize key points
$ ponty lecture.mp4 -p "Summarize the main topics covered"

# Extract action items from a meeting
$ ponty meeting.mp4 -p "List all action items and who is responsible"

# Analyze sports footage
$ ponty game.mp4 -p "Describe the key plays and turning points"

# Use a different model
$ ponty video.mp4 -m gemini-2.0-flash -p "What products are shown?"

# Hide cost information
$ ponty video.mp4 --no-cost

Why Gemini

The Only Choice for Native Video

Gemini is the only major AI provider with true video understanding.

Capability Gemini GPT-4o Claude
Native Video Upload ✓ Direct upload ✗ Frame extraction ✗ No support
Audio from Video ✓ Combined analysis ✗ Separate Whisper ✗ No support
Max Duration 2+ hours Minutes N/A
YouTube URLs ✓ Free preview ✗ No ✗ No
Cost per Hour $0.11-0.32 ~$7.50 N/A
"
The world is not what I think, but what I live through.
— Maurice Merleau-Ponty, Phenomenology of Perception

CLI Reference

The ponty Command

Simple, powerful video analysis from your terminal.

ponty <video> Analyze a video file. Supports MP4, MOV, AVI, and other common formats.
-p, --prompt Custom prompt for the analysis. Default: "Explain what happens in this video"
-m, --model Gemini model to use. Default: gemini-2.5-flash. Options: gemini-2.0-flash, gemini-1.5-pro
--no-cost Hide token usage and cost estimation from the output.