Attention

The Cocktail Party Problem

Cocktail party at the Imperial Hotel, Tokyo, March 1961
Cocktail party, Imperial Hotel, Tokyo, 1961. Wikimedia Commons.

You're at a party and the room is loud. Dozens of conversations running simultaneously, music underneath, glasses, laughter, the whole wash of social noise. You're talking to someone. You're focused on them. And then, from somewhere behind you, a voice you weren't tracking says something, and your attention moves toward it before you've decided to listen. Not because it was louder than everything else. Because it was specific. Someone said a name you know, or described a situation you recognise, or said something so candid that it cut through the ambient noise the way a clear note cuts through static.

Cognitive scientists have been studying this for over seventy years. Colin Cherry, working at Imperial College London in the 1950s, called it “the cocktail party problem” and ran experiments to understand how the brain selects one voice from a room full of competing signals. What he found, and what subsequent research has confirmed, is that the selection isn't entirely voluntary. Your attention system is running a constant background filter on everything you're not consciously processing, and certain stimuli, the specific, the concrete, the personally relevant, will break through that filter involuntarily. You don't choose to notice. You just notice.

I've been thinking about this because I think it describes what happens in a social feed more precisely than any content strategy framework I've encountered. The feed is the cocktail party. Every podcast clip is a voice competing for attention. Most get filtered out. Some break through. And the qualities that determine which ones break through turn out to be the same ones Cherry identified: specificity, concreteness, the feeling that this particular thing is speaking to your particular situation.

· · ·
The Distinction

Extraction vs. Construction

A Steenbeck flatbed editor at the Israel Broadcasting Authority TV Archive
Steenbeck flatbed editor, IBA TV Archive. Wikimedia Commons.

Thousands of clips get cut from podcast episodes and posted to social feeds every week. Most of them are competent. Clean audio, reasonable framing, captions that track the speech. They exist in the feed, they accumulate some views, and they recede without leaving much residue. There's nothing technically wrong with them. But there's nothing about them that makes the viewer's attention system flag them as worth pulling into focus, either.

Then, occasionally, one does more than hold attention. When it ends, it creates a small pressure, a sense that the thing you just watched belongs somewhere other than where you found it, in a friend's hands or in your own thinking later that evening. The clip didn't just occupy thirty seconds of your time. It gave you something you want to do something with.

The more I've tried to understand the difference between these two experiences, the more I've come to think of them as two distinct categories that share a surface resemblance but operate on fundamentally different principles. I've started calling them clips and excerpts.

A clip is extraction. You find a moment in an episode, cut around it, and post it. The content does the work. The editorial contribution is minimal: identify the moment, set the boundaries, deliver it.

An excerpt is construction. It has its own arc, however compressed. A beginning that orients, a middle that develops, and a close that resolves. It makes you want the full episode not because it withheld the conclusion, but because you experienced something complete and satisfying and want more of that quality of thinking. The relationship between an excerpt and its parent episode is closer to the relationship between a short story and the world it's set in than the relationship between a trailer and the film it's advertising.

Most of what gets posted from podcasts are clips, and there's nothing wrong with that. But the ones that travel, the ones that get sent from person to person with the implicit endorsement of everyone in the chain, those tend to be excerpts. The distance between the two is where the interesting editorial questions live.

· · ·
The Blind Spot

The Curse of Knowledge

A person looking at a smartphone in the dark
Wikimedia Commons, CC BY-SA 4.0.

The most reliable way for a clip to fail is also the hardest for the person who made it to detect.

You've been inside the episode for hours. You know the guest's history, the conversational arc, the reference that was made at minute twelve that set up the moment you're now isolating. When you watch the clip back, it makes perfect sense. It has impact. It has context. Except the context isn't in the clip. It's in your memory of the episode, loaded into your experience of the clip invisibly, like subtitles only you can read.

The viewer has none of it. They're encountering this person mid-thought, mid-reference, mid-allusion to something that happened in a conversation they've never heard. If the clip opens with “And when sales sees marketing doing that...” the viewer has no way to fill in what “that” refers to. The sentence is taking for granted a shared understanding that doesn't exist. Every time the clip references something the viewer wasn't present for, it's a small act of exclusion: the content is treating them as someone who was already in the room, and they know they weren't. The cognitive cost of trying to fill in the blanks is small, but it's enough friction to let the thumb keep moving.

I learned this in the most direct way possible. I pulled a clip I was certain was excellent, sent it to a friend who'd never heard the episode, and watched her pause and say, “What's he talking about?” The clip was well-edited and the guest was compelling. But it was a window into a conversation she couldn't see into. I had sixty minutes of context she didn't have, and that context was doing all the work I thought the clip was doing.

Achieving self-containment requires something close to deliberate amnesia: you have to forget everything you know about the episode.

The excerpts that travel all share one quality: they make complete sense to someone encountering them for the first time, knowing nothing. They contain their own setup, their own logic, their own resolution. No prior knowledge required, no references that depend on context the viewer doesn't have. Achieving this requires something close to deliberate amnesia: you have to forget everything you know about the episode and experience the clip as though you've never heard a word of the conversation. That act of becoming a stranger to your own material is one of the more important editorial skills in this kind of work, and one of the hardest to develop, because it works against the familiarity that makes you good at every other part of the job.

And the self-containment isn't only informational. It's emotional. A moment of vulnerability that's moving forty minutes into a conversation, after you've gotten to know the guest and begun to care about their experience, can feel unearned or even hollow in a thirty-second clip that hasn't done any of that emotional groundwork. Excerpts that travel carry their own emotional logic as well as their own informational logic, and testing for both usually means getting the clip in front of someone who has no relationship with the source material and watching how they respond.

· · ·
Technical

The Audio They Hear Is Not the Audio You Mixed

VU recording meters on a Revox B77 MK II reel-to-reel tape recorder
VU meters, Revox B77 MK II. Wikimedia Commons, CC BY-SA 4.0.

There's a technical dimension to this that I didn't fully appreciate until I started measuring, and that most editors I speak with still aren't accounting for in their workflow.

Podcast episodes are typically mastered at around −16 LUFS, the loudness target Apple recommends for its podcast directory. Social platforms normalise audio to roughly −14 LUFS. The gap sounds insignificant on paper. In practice, it means that audio taken directly from a podcast master and placed into a Reel or Short will be noticeably quieter than the content surrounding it in the feed.

The viewer doesn't consciously think “this clip is quiet.” What they experience is a subtle reduction in presence, a feeling that this piece of content is somehow less immediate than what came before it, and their thumb, which was already in motion, completes the gesture. The clip had something to say but couldn't say it loudly enough to matter. This is the same perceptual bias that drove the music industry's “loudness wars” for decades: louder recordings, played back-to-back with quieter ones, are consistently perceived as more engaging regardless of their actual quality. The bias doesn't disappear just because the medium has changed from radio to social feeds.

A professional mixing console in a recording studio
Wikimedia Commons.

The issue extends beyond loudness. Podcasts are mixed on studio monitors or decent headphones, environments with genuine bass response and clear mid-range separation. A substantial proportion of social video is consumed through phone speakers, which reproduce almost nothing below 200 Hz. A voice that sounds warm and full in the mixing environment can sound thin and slightly brittle through the driver on someone's phone, which is where a meaningful portion of the audience is encountering the work for the first time, often in acoustically unforgiving environments: kitchens, commuter trains, bathrooms with tile walls.

About two years ago, I started treating the audio for social clips as a separate deliverable from the episode audio. Adjusting the EQ for phone speaker playback, raising the loudness target to match platform normalisation, and checking every clip on the cheapest, smallest speakers I own before posting. The improvement in engagement was consistent enough that it stopped feeling like an extra production step and became part of the standard workflow. The episode and the clip are mixed for different ears in different rooms. Treating them as interchangeable is leaving performance on the table for no reason other than convenience.

· · ·
Selection

Would Someone Send This to a Particular Person?

this is you

Even a clip with clean audio and complete context can sit in a feed and go nowhere, though. The quality that determines whether a clip gets viewed or actually travels through people's networks comes down to a question I've started using as a selection filter, and it's reshaped how I think about material more than anything else.

The question isn't “is this an interesting moment?” It's: would someone send this to a particular person?

Not share it in the abstract way you might tap a heart icon because you vaguely agree with the sentiment. Actually send it. Open a text thread with someone specific, paste the link, and type something like “this is you” or “told you so” or just the person's name with nothing else attached. The kind of sending where you're thinking of a specific face while your finger is on the button.

“I saw this and thought of you” is a way of saying “you're on my mind,” and the clip is the vehicle.

This is worth thinking about because the act of forwarding a clip to a friend isn't really about the clip. It's about the relationship. The content is the vehicle, but the real function is social. “I saw this and thought of you” is a way of saying “you're on my mind,” and the clip that enables that message is providing value that goes beyond whatever the guest was talking about. That's why specificity matters so disproportionately. A general observation about perseverance might be true and well-articulated, but it doesn't remind you of anyone in particular. It's too broad to trigger the matching process. A specific description of the particular flavour of exhaustion that comes from building something nobody else believes in yet, delivered by someone who's clearly lived through it, that description matches your friend who left her corporate job eight months ago to start something and hasn't had a proper night of sleep since. You send that clip to that friend before you've even finished watching it. The specificity of the content found a specific person in your life, and the urge to forward became almost automatic.

When I'm reviewing candidate moments from an episode, I try to picture the text message. If I can see the person someone would send this to, and I can almost hear what they'd type when they forwarded it, the clip is likely to travel. If I can't construct that mental image, the moment might still be worth posting, but I've stopped expecting it to move the way the good ones move.

· · ·
Strategy

Generosity Creates Curiosity

Withhold
Cuts before the payoff. “To hear how this turns out…”
Give it away
Complete thought, fully resolved. The viewer wants more.

There's a related instinct that took me longer than I'd like to admit to let go of, and I think it costs editors more than they realise.

When you're cutting from a longer conversation, there's a natural pull toward withholding. You want to end the clip just before the payoff, right at the moment of peak engagement, so the viewer has to seek out the full episode to hear how the thought resolves. It feels like sound strategy. You're creating desire by withholding satisfaction. Every marketing instinct says this is the right move.

What it actually creates, in practice, is a low-grade sense of being manipulated. The viewer was engaged. They were going along with the thought. And then the clip ended with an implicit “to hear how this turns out, listen to the full episode.” Here's what makes this backfire: a viewer who's been following a thought for twenty seconds has started to feel a kind of provisional ownership of the idea. They're invested. Cutting it off before resolution doesn't feel like an invitation to go deeper. It feels like something they were in the process of receiving was taken away, and the emotional response to that removal is stronger than the curiosity it's meant to generate. The withheld payoff doesn't read as intrigue. It reads as a small betrayal of the attention they'd already given.

If the thirty seconds they gave away for free were that considered, what does an hour of this person's thinking sound like?

The excerpts that consistently drive full-episode listens, in my experience, are the ones that give everything away. They deliver a complete thought, fully resolved. The viewer finishes feeling satisfied, and that satisfaction generates a specific form of curiosity: “If the thirty seconds they gave away for free were that considered, what does an hour of this person's thinking sound like?” Generosity creates curiosity. Withholding creates irritation. I resisted this conclusion for a long time before the pattern became too consistent to argue with.

· · ·
Tools

The Longlist Problem

Twenty candidates. Three usable.

I should say something about the AI clipping tools, because they're embedded in most production workflows now and the conversation around them tends toward either uncritical adoption or reflexive dismissal, neither of which matches what I've observed.

Tools like Opus Clip, Vizard, and the clipping features built into Riverside and Descript can take a ninety-minute episode and generate a longlist of candidate moments in minutes. They analyse transcripts for high-intensity statements, track vocal energy, look for segments that appear to be self-contained. As a method of generating a first pass quickly, they save genuine time and are worth using.

Where they fail, and fail consistently, is contextual judgment. An AI tool will surface a clip that opens mid-thought because the first sentence scored high for emotional intensity. It will cut a beat before the payoff because the transcript shows a pause it interprets as a natural endpoint. It will flag a moment that's technically energetic but tonally wrong for the show, the kind of clip that would confuse an existing audience and attract listeners who won't stay. Out of twenty AI-generated clips from a typical episode, I'd estimate two or three are usable without substantial reworking. That ratio has been stable across every tool I've used, and it roughly matches what I hear from other editors working at similar volumes.

The tools are useful for the longlist. The editorial judgment of which moments are actually worth developing, and the craft of shaping them once you've chosen, those remain the part of the process where human judgment does the work. The distinction between a clip and an excerpt, between extraction and construction, lives in the layer of contextual understanding that these tools don't yet possess. They can identify raw material. They can't yet hear the difference between a moment that will travel and one that will sit there.

· · ·
Craft

The Space Between the Notes

The last observation, and the one I find hardest to articulate, has to do with pacing.

Pacing inside a full episode and pacing inside a clip are different disciplines, and the instincts don't transfer directly. A two-second pause that feels contemplative inside an hour-long conversation can feel like dead air inside a forty-second clip. The viewer's proximity to departure, the fact that one thumb movement takes them somewhere else entirely, changes how silence is perceived. What registers as thoughtful at minute thirty-five registers as empty at second twelve.

The instinctive response to this, and the one I see most often from editors making the transition to short-form, is to tighten everything. Remove every pause. Close every gap between sentences. Keep the words arriving fast enough that the viewer never has an opportunity to consider leaving. The result is something that sounds polished and professional and is somehow drained of the quality that made the podcast worth clipping in the first place.

What I've learned, through enough edits that went too far in both directions, is that the pauses in a podcast clip aren't all the same. There's dead space and there's breathing room, and the difference between them is the difference between a gap where nothing is being communicated and a gap where something is settling. Dead space is filler seconds, false starts that lead nowhere, verbal tics that add no meaning. Those can go. Breathing room is the half-second after a strong statement where the weight of what was just said has a moment to register. It's the pause where you can almost see someone's next thought forming. Remove it and the statement still exists, but it arrives with less force, because the next thought crowds in before the previous one has had time to land.

Those pauses are what tell the viewer they're listening to a real conversation between people thinking in real time rather than a script being read at pace. They're functional. And protecting them while cutting everything around them is a significant part of what makes short-form editing from podcasts its own discipline rather than just a faster version of long-form editing.

When you get this balance right, the clip feels tighter than the source episode but still sounds like a room with people in it. When you get it wrong, the viewer can hear the editing even if they can't name what they're hearing. The human texture has been processed out, and what's left is efficient but inert.

· · ·
Closing

The Signal in the Noise

These observations came from breaking down specific excerpts frame by frame, trying to understand the editorial decisions inside them and why some of those decisions worked while others didn't. The patterns are still forming. Each new breakdown refines some of these ideas, overturns others, and occasionally surfaces connections I hadn't considered.

The feed keeps scrolling. Thousands of clips enter it every day, and almost all of them will be filtered out by the same attentional machinery Colin Cherry was studying in 1953. The ones that break through will share certain qualities: they'll be specific rather than general, complete rather than teasing, concrete rather than abstract. They'll speak with enough particularity that someone's brain will match them to a real person in their life and feel the pull to forward.

And inside those clips, usually invisible at the speed they're consumed, will be a series of editorial decisions that nobody notices unless they slow the whole thing down and look at it frame by frame. That's what I've been doing. I plan to keep doing it.