live perception

Watch your camera and audio become structured events

No audio/video is sent to external LLM providers
Ready
Anger
Contempt
Disgust
Fear
Happy
Neutral
Sadness
Surprise

Emotion Analysis

HAPPY
0%
NEUTRAL
0%
ANGER
0%
CONTEMPT
0%
DISGUST
0%
FEAR
0%
SADNESS
0%
SURPRISE
0%

Content Scanning

Do something interesting! Video captions will appear here...
Backed by Y Combinator

What is Pinch?

Pinch is a perception stack that extracts meaning from raw audio and video.

From chaos to structure

Raw sensor data — pixels, waveforms, noise — becomes structured events: emotion, speech, sound understanding, person engaged, environment analysis, and more.

Built for real-time or post-analysis

Use Pinch to give your agents awareness. Let them see when someone smiles, hear when they're confused, know when to respond. Or use Pinch to analyze your media library after the fact.

How it works

Power real-time agents or analyze your entire media library

Power real-time agents
Analyze your entire media library
Real-time

Give your agents awareness

Perfect for live calls, customer interactions, and agent-driven experiences

1

Stream live audio/video

Connect your call, meeting, or camera feed through our SDK

2

Get instant perception events

Receive emotion, engagement, speech as structured events over WebSocket

3

React in the moment

Let your agent detect when a customer hesitates, when a student is confused, when tone shifts

Async

Index and search your library

Perfect for post-production, compliance review, and training analysis

1

Send your media files

Upload recordings, archives, or raw footage via API

2

We extract everything

Every visual, every sound, every emotional beat becomes searchable

3

Search by meaning, not metadata

"Show clips where the rep sounded confident" — get exact timestamps back

Ready to build?

Get started with Pinch today

Talk to an engineer