How to · 10 min read
How to Generate AI B-roll for YouTube: The 60-Minute Workflow
Generate original AI B-roll for YouTube narration using Runway. Honest prompt workflow, credit math, and when stock footage still beats generation.
- Difficulty
- intermediate
- Time needed
- 1 hour
- Published

Disclosure: Some tool links below are affiliate links. If you sign up through one we may earn a commission — at no extra cost to you. We'd recommend the same tools either way.
We earn a commission if you sign up through our links. It doesn't change what we write — we'd tell you the same thing either way.
Who this is for
You're narrating a YouTube video about an abstract topic — the history of interest rates, how compound ideas spread, the philosophy of attention, the future of an industry. There's no obvious footage. The topic lives in ideas, not in places or people. Stock libraries return the same generic business-person-pointing-at-laptop clip you've seen in forty thumbnails, and it pulls your channel toward generic.
This guide is for narration-style YouTubers producing 5–15 minute videos on conceptual subjects, who want B-roll that matches the idea on screen without looking like a Shutterstock reel. It assumes you're comfortable editing in Premiere, DaVinci, CapCut, or Final Cut and you can cut B-roll to narration.
What you'll need
- A narration track already recorded, or a detailed script timed for a 5–15 minute video. You need to know what you're covering before you generate visuals.
- A shot list: 12–20 lines describing the B-roll moments you want. More on this in Step 1.
- A Runway account on Pro ($28/mo) for a single video per month, or Unlimited ($76/mo) for weekly production. Standard at $12 is too credit-constrained for this workflow.
- An editor you already use (Premiere, DaVinci Resolve, Final Cut, CapCut Pro, or Descript for transcript-first editing).
- Disk space: 2–5 GB for a video's worth of generated clips and iterations you'll discard.
- Budget: $28–$76/month on Runway depending on cadence; optionally $10/month for Pika or free-tier Luma as secondary sources.
- Skill level: you've edited with B-roll before. Prompt writing is learned through iteration — expect the first video to be slow.
Step 1: Build a shot list tied to your narration timestamps
Action: Open your script or narration transcript. Mark the moments that need B-roll with specific visual descriptions.
Most narration videos cut to B-roll every 4–8 seconds. A 10-minute video with active B-roll has 70–120 cuts, but only 15–25 of those need original generated footage — the rest can be repeated shots, zooms, or your existing stock library. Identify the 15–25 moments where the narration lands on an idea that has no obvious visual and write a one-sentence shot description for each.
Good shot descriptions are specific about three things: subject (what's in frame), motion (what's happening), and mood (the visual tone). "A city street" is a bad prompt. "A low-angle shot of a wet city street at dusk, neon reflections in puddles, slow dolly forward, cinematic teal-orange grade" gives the model enough to work with.
Three prompt archetypes that work well for narration B-roll:
- Concept metaphors: "a single coin falling into a still pool, ripples expanding outward, top-down macro shot, slow motion."
- Historical or period visualization: "a 1920s trading floor, paper tickers flying, warm film grain, medium shot, handheld camera movement."
- Abstract texture: "ink diffusing into water in slow motion, deep blue and gold, studio lighting, macro close-up."
Prompts that work badly: anything requiring specific real people, recognizable brands, precise text on signs, or consistent characters across multiple shots.
Success signal: your shot list has 15–25 entries, each tied to a narration timestamp, each with subject + motion + mood.
Step 2: Pick your model tier within Runway
Action: Decide whether each shot runs on Gen-4, Gen-4 Turbo, or Gen-4.5.
Runway's model choice has real cost implications:
- Gen-4 Turbo — faster, cheaper in credits, lower shot coherence. Best for fill B-roll where the viewer's attention is on the narration.
- Gen-4 — the workhorse. Strong prompt adherence, good camera moves, 187-second maximum per generation. Best for hero shots that carry a section of the video.
- Gen-4.5 — newer, better prompt-following, 90-second cap, significantly more expensive per second. Only use for the one or two shots that need to carry the video.
For a typical 10-minute narration video, mix 80% Gen-4 Turbo with 20% Gen-4 hero shots. Gen-4.5 is usually overkill unless you're producing a thumbnail-critical opener.
Failure mode: running everything on Gen-4.5 "for quality." You'll burn through a Pro-tier month's credits in three shots and discover the marginal quality gain doesn't read at YouTube player size.
Step 3: Generate in batches, not one at a time
Action: Queue 5–8 prompts back-to-back before reviewing any of them. Work on something else while they generate.
Runway generation takes 30 seconds to 3 minutes per clip depending on model and length. Watching the progress bar is the slowest way to produce B-roll. Paste your prompts in sequence, kick them off, and come back in 20 minutes with a batch to review.
Budget for a 1-in-5 hit rate on first-attempt prompts. Experienced Runway users hit 1-in-3; total beginners start at 1-in-10. This means a shot list of 20 B-roll moments typically requires 60–100 generations over the first few videos. Plan your credits accordingly.
For hero shots that you know need multiple attempts, use Runway's Unlimited tier's Explore Mode — it's slower and slightly throttled but doesn't burn credits, so you can iterate freely. Standard and Pro tiers pay per attempt, which creates financial pressure to accept weak output.
Success signal: you've queued a batch of 5–8 prompts and stepped away to do other work. You are not watching the generation bar.
Failure mode: generating one clip, judging it, regenerating with tweaks, and burning 20 minutes on a single 5-second shot. Batch processing plus batch review is 3x faster.
Step 4: Review, rate, and cull the generations
Action: Open each generated clip. Give it a fast keep/cut/maybe rating. Don't overthink.
The review heuristic: does this clip enhance the narration at its intended timestamp, or does it distract from it? B-roll serves the voiceover — if viewers notice the B-roll more than the words, the clip is too busy or too weird. Cut it.
Specific red flags that mean "cut, don't rescue":
- Garbled text or logos anywhere in frame. AI models render in-frame text wrong 30–50% of the time. Regenerate without text or composite real text over a clean plate in post.
- Anatomical glitches on humans — extra fingers, floating limbs, faces that drift. Viewers catch these instantly and lose immersion.
- Prompt bleed where the clip generates the wrong subject (you asked for a mountain, got a hill with a weird mountain-shaped tree). Don't try to salvage; regenerate.
- Style inconsistency with neighboring clips. If your previous clip is moody and cinematic and this one is cartoonish, the cut will jar. Match your channel's aesthetic across the batch.
Expect to cut 60–80% of first-attempt generations. That's normal. Rate the keepers on a 1–3 scale — 3s go in the final cut, 2s are fallbacks if you can't regenerate a better option.
Success signal: you've gone through 30+ clips in 15 minutes with fast decisions. You have 10–12 keepers out of that batch.
Step 5: Regenerate the gaps with refined prompts
Action: For each shot list entry that didn't produce a keeper, rewrite the prompt and regenerate.
Prompt refinement patterns that actually work:
- Add a camera move if the first attempt was static and boring. "Slow dolly in," "handheld shake," "whip pan" all produce more engaging B-roll than a locked shot.
- Specify a reference aesthetic like "35mm film grain," "anamorphic lens flares," or "Wes Anderson symmetry" if your channel has a consistent visual style.
- Reduce subject count if the first attempt was chaotic. One subject in frame beats three — AI models still struggle to track multiple moving objects coherently.
- Shorten the prompt if you wrote a paragraph. Focused 1-sentence prompts often outperform 4-sentence ones; the model gets confused by too many constraints.
If a specific shot has failed three times in a row, drop it from the shot list and substitute stock footage or a static card. Not every idea is generatable today. Cheaper alternatives like Pika or Luma Dream Machine sometimes succeed where Runway fails on specific aesthetics — worth a cross-tool attempt for a critical hero shot.
Success signal: every shot list entry has a keeper, even if a few are substitutions from stock or alternate tools.
Step 6: Organize and import into your editor
Action: Download all keepers. Rename them to match your shot list. Import into your editor.
File naming that saves time: 01-coin-ripple-gen4.mp4, 02-trading-floor-1920s-gen4turbo.mp4, etc. The timestamp number matches your shot list order so the edit flows in sequence. The model tag lets you regenerate from the right tier if you need to re-cut.
Place each clip at its narration timestamp. Trim the head and tail — AI generations often have a 0.5-second ramp-up where the motion stabilizes. Cut that. Same for the tail, where motion sometimes drifts.
For a 10-minute video, budget 45 minutes for the edit pass (placement + trimming + sound design). The generation time you saved on sourcing original footage gets partially spent on matching AI clips to the audio. If your editor is Descript, the transcript-first workflow lets you drop clips against specific narration words, which speeds up the placement decision.
Success signal: your edit timeline has AI B-roll placed across 15–25 narration moments, head and tail trimmed, first pass watchable end-to-end.
Common pitfalls
- Generating without a shot list: exploratory prompting without a target wastes 80% of credits. Write the shot list first, even if it feels slow. It pays back immediately.
- Chasing realism where stylization would serve better: AI-generated "realistic" footage of people, places, or products looks fake. AI-generated "stylized" footage of concepts, textures, and metaphors looks intentional. Lean into stylization for B-roll.
- Ignoring audio design on AI clips: generated clips arrive silent. Dropping them into a narration video without any ambient sound under the voiceover leaves them feeling disconnected. Add a subtle sound bed matched to the visual (water sounds for water clips, room tone for interiors).
- Running through credits without a budget ceiling: Runway's credit math is punishing. Set a per-video budget (2,000 credits on Pro, for example) and stop generating when you hit it. Use fallback stock footage rather than double your credit spend chasing one difficult shot.
When not to use this approach
- Literal B-roll of real subjects: if you need footage of a specific person, a recognizable brand product, or a named place, AI generation won't get you there reliably. Stock footage, licensed archival, or a real shoot is the right path.
- Consistency-dependent narratives: if your video needs the same character or setting across 5+ shots, AI character consistency still breaks too often. Use real footage or illustrated assets instead.
- Low-budget first videos: if you're producing your first or second YouTube video and need to validate whether the channel works at all, free stock libraries (Pexels, Pixabay) plus your phone camera beat a $28–$76/month AI tool. Invest in generation after the channel has a format and an audience.
Bottom line
AI B-roll earns its place on narration channels covering abstract concepts that don't have obvious footage — history, philosophy, finance, ideas. For those videos, Runway Gen-4 produces visuals that beat generic stock for originality and match the tone of the narration without the cost of a real shoot.
See Runway on Pro if you're producing one video a month and want to test the workflow; move to Unlimited if you're shipping weekly and iteration cycles are eating your credit budget.
Common questions
Questions people ask.
- Is Runway the only option or can I use cheaper tools?
- Runway Gen-4 is currently the best at prompt adherence and shot coherence, but Pika at $10/month for Standard and Luma Dream Machine with a usable free tier both produce shippable B-roll for abstract concepts. Pika tends toward stylized output — good for mood visuals, less good for realistic scenes. Luma generates longer clips but with less consistent camera control. For a first video, test all three with the same prompt and pick the one that matches your channel's aesthetic; for ongoing production, Runway's Unlimited at $76/month is the only tier that handles heavy iteration without credit anxiety.
- How much will one 8-minute YouTube video actually cost in credits?
- A narration video with 12–15 B-roll cuts averaging 5 seconds each needs roughly 60–90 seconds of finished footage. At a 1-in-5 hit rate on Runway Gen-4 (realistic for first-time prompters), that's 300–450 seconds of attempted generation, which burns 3,000–4,500 credits. Runway Pro at 2,250 credits/month covers one video comfortably; the second video in a month starts running short. If you produce weekly, Unlimited at $76/month pays back on the third video.
- Will the AI footage look obviously generated on my channel?
- Gen-4 footage for abstract or stylized subjects (financial charts dissolving into smoke, historical illustrations coming to life, concept visualizations) holds up. Footage of real people, recognizable places, or specific products looks generated within 2–3 seconds because consistency and detail break down. Use AI B-roll for the non-literal, and stock footage or real shots for anything the viewer would recognize. Viewers forgive stylized AI when it's clearly a visual choice; they don't forgive fake-realistic.
- Can I do this for free or close to it?
- Luma Dream Machine's free tier gives a handful of monthly generations, enough to evaluate the tool on one test video. Runway's 125 one-time free credits don't include Gen-4 video access, so they're not useful for this workflow. Realistic minimum for one produced video per month is Pika Standard at $10 or Runway Standard at $12 — expect to run out of credits mid-video at those tiers and either wait for the monthly reset or fall back to stock footage for the remaining clips.
- How long does this take end-to-end?
- About 60 minutes per video once you have a working prompt style: 10 minutes writing shot-list prompts, 20 minutes iterating on generations while you work on something else, 15 minutes reviewing and selecting keepers, 10 minutes downloading and organizing clips into your editor, 5 minutes for final placement. First attempt takes 2–3 hours because prompt iteration is learning-by-doing. Once you have a prompt template that matches your channel's look, generation becomes faster than sourcing comparable stock footage.
Get more guides.