상품 이미지 1장으로 15초 광고 영상 자동 생성하기

[미검증]

📌 0. 시리즈

응용편	제목	난이도	핵심 기술
응용 1	사진 10장으로 AI 캐릭터·프로필 이미지 만들기	⭐⭐⭐	FLUX.1 dev · LoRA · ComfyUI
응용 2	내 목소리 AI 클론 — 유튜브 내레이터 자동화	⭐⭐⭐	F5-TTS · Kokoro · Sesame CSM-1B
응용 3	상품 이미지 1장 → 15초 광고 영상 자동 생성	⭐⭐⭐⭐	Wan2.2 · HunyuanVideo 1.5 · LTX-2
응용 4	공장 불량 자동 검사 — NG/OK 탐지 + 로봇 좌표 추출	⭐⭐~⭐⭐⭐⭐⭐	YOLOv12 · OpenCV · RealSense
응용 5	영상 분위기 분석 → BGM 자동 생성 & 싱크	⭐⭐⭐	MusicGen · AudioCraft · Stable Audio Open
응용 6	주제 한 줄 입력으로 유튜브 영상 완성 — AI 콘텐츠 자동화 파이프라인	⭐⭐⭐⭐	LangGraph · CrewAI · AutoGen 0.4
응용 7	사진 보고 글 쓰는 AI — Vision LLM 상세페이지 자동 작성	⭐⭐⭐⭐	Qwen2.5-VL · InternVL3 · LLaVA-Next
응용 8	내 PDF 문서를 AI가 읽는다 — 사내 지식 RAG 챗봇 구축	⭐⭐⭐	LlamaIndex · ChromaDB · Qdrant

📌 1. 들어가며

이 포스트에서 만들 것

이 포스트를 끝까지 따라하면 상품 이미지 1장만으로 아래 결과물을 만들 수 있습니다.

입력: 상품 이미지 1장 (예: 향수병 사진)
   ↓
ComfyUI + Wan2.2 Image-to-Video
   ↓
출력:
  ├─ 상품이 천천히 회전하는 5초 클립
  ├─ 카메라가 줌인되며 상품 클로즈업 클립
  ├─ 배경이 바뀌며 감성적인 장면 클립
  └─ 위 클립들을 FFmpeg으로 이어붙인 15초 광고 영상

Image-to-Video 원리 — 왜 이미지 1장으로 영상이 되는가

일반 영상: 이미지 프레임 N장을 시간 순서로 나열

AI 영상 생성:
  입력 이미지 1장
      ↓
  Diffusion 모델이 "다음 프레임"을 예측
      ↓
  각 프레임 간 자연스러운 중간값 보간 (Interpolation)
      ↓
  최종: 입력 이미지로부터 자연스럽게 움직이는 영상

💡 Wan2.2는 입력 이미지를 첫 번째 프레임으로 고정하고, 프롬프트에 따라 이후 프레임을 Diffusion 방식으로 생성합니다. 즉, 이미지가 "씨앗"이 되고 프롬프트가 "성장 방향"을 결정합니다.

📌 2. 환경 준비

2-1. VRAM별 모델 선택 가이드

VRAM	권장 모델	최대 해상도	생성 시간 (5초 영상)
24GB+	Wan2.2 / HunyuanVideo 1.5	1280×720	5~15분
16GB	Wan2.2 (fp8 양자화)	832×480	10~20분
8~12GB	LTX-2	768×432	3~8분
8GB 미만	LTX-2 (경량)	512×288	5~10분

VRAM 확인 방법:
  nvidia-smi --query-gpu=memory.total --format=csv,noheader

2-2. ComfyUI + 각 모델 설치

ComfyUI 설치:

bashgit clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt

ComfyUI-Manager 설치 (필수 노드 관리):

bashcd ComfyUI/custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager
# ComfyUI 재시작 후 브라우저에서 Manager 탭 확인

필수 커스텀 노드 설치 (ComfyUI-Manager에서 검색):

- ComfyUI-WanVideoWrapper       ← Wan2.2 핵심 노드
- ComfyUI-VideoHelperSuite      ← 영상 저장 & 프리뷰
- ComfyUI_rembg                 ← 배경 제거
- ComfyUI-KJNodes               ← 유틸리티 노드

Wan2.2 모델 다운로드:

bash# HuggingFace CLI 설치
pip install huggingface_hub

# Wan2.2 I2V 모델 다운로드 (약 14GB)
huggingface-cli download \
  Wan-AI/Wan2.2-I2V-A14B-720P \
  --local-dir ./ComfyUI/models/wan2.2

# fp8 경량 버전 (VRAM 16GB 이하)
huggingface-cli download \
  Wan-AI/Wan2.2-I2V-A14B-480P \
  --local-dir ./ComfyUI/models/wan2.2-480p

LTX-2 모델 다운로드 (8~12GB VRAM용):

bashhuggingface-cli download \
  Lightricks/LTX-Video-2 \
  --local-dir ./ComfyUI/models/ltx2

HunyuanVideo 1.5 다운로드 (24GB+ VRAM용):

bashhuggingface-cli download \
  tencent/HunyuanVideo \
  --local-dir ./ComfyUI/models/hunyuan

📌 3. 상품 이미지 준비

3-1. 배경 제거 (rembg)

광고 영상에서 상품만 주인공으로 만들려면 배경을 제거하거나 원하는 배경으로 교체해야 합니다.

bashpip install rembg pillow

pythonfrom rembg import remove
from PIL import Image
import io

def remove_background(input_path, output_path):
    with open(input_path, 'rb') as f:
        input_data = f.read()

    # 배경 제거
    output_data = remove(input_data)

    # PNG로 저장 (투명 배경 유지)
    image = Image.open(io.BytesIO(output_data)).convert("RGBA")
    image.save(output_path, "PNG")
    print(f"배경 제거 완료: {output_path}")

remove_background("./product.jpg", "./product_nobg.png")

배경 제거 후 원하는 배경 합성:

pythonfrom PIL import Image

def composite_background(product_path, bg_color=(240, 240, 240), output_path="./product_final.jpg"):
    product = Image.open(product_path).convert("RGBA")

    # 단색 배경 생성
    background = Image.new("RGBA", product.size, bg_color + (255,))

    # 합성
    composite = Image.alpha_composite(background, product).convert("RGB")
    composite.save(output_path, "JPEG", quality=95)
    print(f"배경 합성 완료: {output_path}")

# 흰 배경
composite_background("./product_nobg.png", bg_color=(255, 255, 255))
# 다크 배경 (고급스러운 느낌)
composite_background("./product_nobg.png", bg_color=(20, 20, 20), output_path="./product_dark.jpg")

3-2. 해상도 & 비율 맞추기

pythonfrom PIL import Image

def resize_for_video(input_path, output_path, target="16:9", resolution=720):
    """
    target: "16:9" (유튜브/가로), "9:16" (쇼츠/세로), "1:1" (인스타)
    """
    ratios = {
        "16:9": (1280, 720),
        "9:16": (720, 1280),
        "1:1":  (720, 720),
    }

    target_w, target_h = ratios[target]
    if resolution == 480:
        target_w, target_h = target_w // 2, target_h // 2 + (target_h // 2) % 2

    img = Image.open(input_path).convert("RGB")
    w, h = img.size

    # 비율 유지하며 크롭
    target_ratio = target_w / target_h
    current_ratio = w / h

    if current_ratio > target_ratio:
        new_w = int(h * target_ratio)
        left  = (w - new_w) // 2
        img   = img.crop((left, 0, left + new_w, h))
    else:
        new_h = int(w / target_ratio)
        top   = (h - new_h) // 2
        img   = img.crop((0, top, w, top + new_h))

    img = img.resize((target_w, target_h), Image.LANCZOS)
    img.save(output_path, "JPEG", quality=95)
    print(f"[{target}] 리사이즈 완료: {target_w}x{target_h} → {output_path}")

# 유튜브용 (가로)
resize_for_video("./product_final.jpg", "./product_16x9.jpg", target="16:9")
# 쇼츠/릴스용 (세로)
resize_for_video("./product_final.jpg", "./product_9x16.jpg", target="9:16")

⚠️ Wan2.2는 해상도가 64의 배수여야 합니다. 1280×720, 832×480, 512×288 등을 사용하세요.

📌 4. Wan2.2 워크플로우 구성

4-1. ComfyUI 노드 구성

기본 I2V 워크플로우 노드 연결 순서:

[Load Image] ──────────────────────────────────────────┐
                                                        ↓
[CLIP Text Encode (+)] → [Wan2.2 I2V Sampler] → [VAE Decode]
[CLIP Text Encode (-)] →                               ↓
                                            [Video Combine] → MP4 저장
[Wan2.2 Model Loader] ─────────────────────────────────┘
[VAE Loader] ──────────────────────────────────────────┘

ComfyUI에서 직접 불러오는 워크플로우 JSON:

json{
  "nodes": [
    {"id": 1, "type": "LoadImage",          "title": "상품 이미지 로드"},
    {"id": 2, "type": "WanVideoModelLoader","title": "Wan2.2 모델 로드",
     "inputs": {"model": "wan2.2/wan2.2-i2v-14b-720p.safetensors"}},
    {"id": 3, "type": "CLIPTextEncode",     "title": "긍정 프롬프트"},
    {"id": 4, "type": "CLIPTextEncode",     "title": "부정 프롬프트"},
    {"id": 5, "type": "WanVideoSampler",    "title": "영상 샘플러",
     "inputs": {"steps": 30, "cfg": 6.0, "num_frames": 81}},
    {"id": 6, "type": "VAEDecodeVideo",     "title": "VAE 디코드"},
    {"id": 7, "type": "VHS_VideoCombine",   "title": "영상 저장",
     "inputs": {"frame_rate": 24, "format": "video/mp4"}}
  ]
}

💡 ComfyUI 브라우저에서 Drag & Drop으로 JSON 파일을 올리면 워크플로우가 자동으로 불러와집니다.

4-2. 프롬프트 작성법

기본 구조:

[상품 묘사] + [카메라 무브먼트] + [분위기] + [품질 키워드]

카메라 무브먼트 키워드:

줌 계열:
  zoom in slowly       → 천천히 클로즈업
  zoom out smoothly    → 전체 샷으로 빠져나가기
  extreme close-up     → 질감·디테일 강조

패닝 계열:
  pan left / pan right → 좌우로 카메라 이동
  arc shot             → 상품 주변을 곡선으로 돌기
  orbit slowly         → 360도 천천히 회전

트래킹 계열:
  dolly forward        → 카메라가 앞으로 전진
  dolly backward       → 카메라가 뒤로 후퇴
  tracking shot        → 상품을 따라 이동

분위기 키워드:

고급스러운:   luxury, elegant, premium, cinematic lighting
자연스러운:   soft natural light, warm tones, lifestyle
역동적인:     dynamic, fast motion, energetic
감성적인:     dreamy, bokeh, golden hour, film grain
제품 광고:    product shot, studio lighting, clean background

상품별 프롬프트 예시:

# 향수 광고
"A luxury perfume bottle on a marble surface,
 slow orbit camera movement, cinematic lighting,
 golden particles floating in air, soft bokeh background,
 premium product advertisement, 8K quality"

# 스니커즈 광고
"A pair of sneakers on a clean white surface,
 dynamic zoom in to show texture detail,
 studio lighting, energetic atmosphere,
 smooth camera movement, product shot quality"

# 스킨케어 광고
"A skincare serum bottle surrounded by fresh flowers,
 slow zoom out, soft natural lighting, warm tones,
 dreamy bokeh, elegant and clean aesthetic,
 luxury beauty product advertisement"

네거티브 프롬프트 (공통):

"blurry, shaky camera, distorted product, watermark,
 text overlay, low quality, overexposed, underexposed,
 duplicate product, deformed shape"

4-3. 파라미터 설정 (steps, CFG, 길이)

python# Wan2.2 핵심 파라미터 설명

steps = 30
# 디노이징 반복 횟수
# 낮을수록 빠르지만 품질↓
# 20 → 빠른 테스트용
# 30 → 균형 (권장)
# 50 → 고품질 최종 렌더링

cfg = 6.0
# Classifier-Free Guidance
# 프롬프트 충실도 강도
# 4.0 → 창의적, 프롬프트 덜 따름
# 6.0 → 균형 (권장)
# 9.0 → 프롬프트 엄격 준수 (부자연스러울 수 있음)

num_frames = 81
# 생성할 프레임 수
# FPS 24 기준:
#  49 frames = 약 2초
#  81 frames = 약 3.4초  ← 단일 클립 권장
# 121 frames = 약 5초

# 최종 설정값 (24GB VRAM 기준)
params = {
    "model":        "wan2.2-i2v-14b-720p",
    "steps":        30,
    "cfg":          6.0,
    "num_frames":   81,      # 3.4초 클립
    "fps":          24,
    "resolution":   "1280x720",
    "sampler":      "unipc",
    "scheduler":    "simple",
    "seed":         42,      # 재현 가능한 결과를 위해 고정
}

# VRAM 16GB 이하 설정
params_low_vram = {
    "model":        "wan2.2-i2v-14b-480p",
    "steps":        20,
    "cfg":          6.0,
    "num_frames":   49,      # 2초 클립
    "fps":          24,
    "resolution":   "832x480",
}

📌 5. 모델별 비교

Wan2.2 — 오픈소스 최고 품질

특징:
  - 2026년 오픈소스 I2V 1위
  - MoE(Mixture of Experts) 아키텍처 도입
  - 프롬프트 충실도 최고 수준
  - 카메라 무브먼트 제어 가장 정확

권장 상황:
  - 최고 품질 광고 영상 필요 시
  - 카메라 움직임이 중요한 촬영 시
  - RTX 3090 / 4090 보유 시

단점:
  - VRAM 최소 16GB (720p 기준 24GB)
  - 생성 시간 다소 김 (720p 기준 10~15분/클립)

HunyuanVideo 1.5 — 고해상도 특화

특징:
  - 텐센트 오픈소스, 최대 1080p 지원
  - 시각적 일관성 (상품 형태 왜곡 적음)
  - Wan2.2보다 상품 디테일 보존 우수
  - 로컬 GPU 실행 가능

권장 상황:
  - 상품 디테일(질감, 색상) 보존이 중요 시
  - 고해상도(1080p) 광고 영상 필요 시

단점:
  - VRAM 24GB+ 권장
  - 커뮤니티 워크플로우가 Wan보다 적음

LTX-2 — 빠른 속도, 저사양 친화

특징:
  - 8GB VRAM에서도 동작
  - 생성 속도 Wan2.2 대비 2~3배 빠름
  - 빠른 프로토타이핑에 최적

권장 상황:
  - 소비자급 GPU (RTX 3070/4060 등) 보유 시
  - 초안·테스트 용도
  - 빠른 반복 작업 필요 시

단점:
  - 품질은 Wan2.2 대비 낮음
  - 고해상도(720p+) 지원 제한적

항목	Wan2.2	HunyuanVideo 1.5	LTX-2
품질	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐
속도	⭐⭐⭐	⭐⭐	⭐⭐⭐⭐⭐
최소 VRAM	16GB	24GB	8GB
상품 보존	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐
카메라 제어	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐

📌 6. 여러 장면 조합 & 편집

6-1. 장면별 영상 생성 전략

15초 광고 = 3개 클립 × 5초 구성 예시:

[클립 1 — 오프닝: 3초]
  프롬프트: "product zoom out slowly, dark moody background,
             cinematic lighting, luxury atmosphere"
  목적: 상품 전체를 처음 보여주는 임팩트 있는 오프닝

[클립 2 — 디테일: 5초]
  프롬프트: "extreme close-up of product texture, slow pan left,
             soft studio lighting, 8K detail, premium product shot"
  목적: 소재·디자인 디테일 강조

[클립 3 — 클로징: 7초]
  프롬프트: "product on elegant marble surface, slow orbit shot,
             golden particles, bokeh background, luxury advertisement"
  목적: 브랜드 감성 전달 + 여운

Python으로 클립 자동 생성 (ComfyUI API 활용):

pythonimport requests, json, time

COMFYUI_URL = "http://127.0.0.1:8188"

def generate_clip(image_path, prompt, clip_name, num_frames=81):
    """ComfyUI API로 영상 클립 생성"""

    workflow = {
        "1": {"class_type": "LoadImage",
              "inputs": {"image": image_path}},
        "2": {"class_type": "WanVideoModelLoader",
              "inputs": {"model": "wan2.2-i2v-14b-720p.safetensors"}},
        "3": {"class_type": "CLIPTextEncode",
              "inputs": {"text": prompt, "clip": ["2", 1]}},
        "4": {"class_type": "CLIPTextEncode",
              "inputs": {"text": "blurry, shaky, distorted, low quality",
                         "clip": ["2", 1]}},
        "5": {"class_type": "WanVideoSampler",
              "inputs": {"model": ["2", 0], "image": ["1", 0],
                         "positive": ["3", 0], "negative": ["4", 0],
                         "steps": 30, "cfg": 6.0,
                         "num_frames": num_frames, "fps": 24}},
        "6": {"class_type": "VHS_VideoCombine",
              "inputs": {"images": ["5", 0],
                         "frame_rate": 24,
                         "filename_prefix": clip_name,
                         "format": "video/mp4"}},
    }

    # 작업 큐에 추가
    response = requests.post(f"{COMFYUI_URL}/prompt",
                             json={"prompt": workflow})
    prompt_id = response.json()["prompt_id"]

    # 완료 대기
    while True:
        status = requests.get(f"{COMFYUI_URL}/history/{prompt_id}").json()
        if prompt_id in status:
            print(f"✅ [{clip_name}] 생성 완료")
            break
        print(f"⏳ [{clip_name}] 생성 중...")
        time.sleep(10)

# 3개 클립 순차 생성
clips = [
    ("product_16x9.jpg",
     "product zoom out slowly, dark moody background, cinematic lighting",
     "clip_01_opening", 73),    # 3초
    ("product_16x9.jpg",
     "extreme close-up of product texture, slow pan left, soft studio lighting",
     "clip_02_detail", 121),    # 5초
    ("product_dark.jpg",
     "product on marble surface, slow orbit shot, golden particles, bokeh",
     "clip_03_closing", 169),   # 7초
]

for image, prompt, name, frames in clips:
    generate_clip(image, prompt, name, frames)

6-2. FFmpeg으로 이어붙이기 + 트랜지션

기본 이어붙이기:

bash# clips.txt 파일 생성
echo "file 'clip_01_opening.mp4'" > clips.txt
echo "file 'clip_02_detail.mp4'"  >> clips.txt
echo "file 'clip_03_closing.mp4'" >> clips.txt

# 단순 이어붙이기
ffmpeg -f concat -safe 0 -i clips.txt -c copy output_raw.mp4

트랜지션 포함 이어붙이기 (페이드 전환):

pythonimport subprocess

def concat_with_fade(clips, output_path, fade_duration=0.5):
    """
    clips: [("clip_01.mp4", 3.0), ("clip_02.mp4", 5.0), ("clip_03.mp4", 7.0)]
           파일명, 영상 길이(초)
    """
    filter_parts  = []
    input_args    = []

    for i, (clip_path, _) in enumerate(clips):
        input_args += ["-i", clip_path]

    # 페이드인/아웃 필터 구성
    for i, (_, duration) in enumerate(clips):
        fade_out_start = duration - fade_duration
        filter_parts.append(
            f"[{i}:v]fade=t=in:st=0:d={fade_duration},"
            f"fade=t=out:st={fade_out_start}:d={fade_duration}[v{i}];"
        )

    # 영상 합치기
    concat_v = "".join([f"[v{i}]" for i in range(len(clips))])
    filter_parts.append(f"{concat_v}concat=n={len(clips)}:v=1:a=0[vout]")

    filter_complex = "".join(filter_parts)

    cmd = ["ffmpeg"] + input_args + [
        "-filter_complex", filter_complex,
        "-map", "[vout]",
        "-c:v", "libx264",
        "-crf", "18",
        "-preset", "slow",
        output_path
    ]

    subprocess.run(cmd, check=True)
    print(f"✅ 최종 영상 저장: {output_path}")

concat_with_fade(
    clips=[
        ("clip_01_opening.mp4", 3.0),
        ("clip_02_detail.mp4",  5.0),
        ("clip_03_closing.mp4", 7.0),
    ],
    output_path="./final_ad_15sec.mp4"
)

BGM 추가 (응용 5 연동):

bash# 생성한 BGM 파일을 영상에 합치기
ffmpeg \
  -i ./final_ad_15sec.mp4 \
  -i ./bgm.wav \
  -c:v copy \
  -c:a aac \
  -shortest \
  -filter:a "volume=0.8" \
  ./final_ad_with_bgm.mp4

📌 7. 결과 확인 & 트러블슈팅

영상이 흔들릴 때

원인 1: CFG 값이 너무 높음
  → cfg 9.0 → 6.0으로 낮추기
  → 너무 강하게 프롬프트를 따르면 프레임 간 불일치 발생

원인 2: steps 수가 너무 낮음
  → steps 15 → 30으로 높이기

원인 3: 프롬프트에 상충하는 키워드 존재
  → "dynamic fast motion" + "slow zoom" 동시 사용 금지
  → 하나의 카메라 무브먼트만 지정

해결책 (후처리):
  # FFmpeg으로 영상 안정화 (vidstab 필터)
  ffmpeg -i shaky.mp4 \
    -vf "vidstabdetect=stepsize=6:shakiness=8:accuracy=9" \
    -f null - 2>&1
  ffmpeg -i shaky.mp4 \
    -vf "vidstabtransform=zoom=5:smoothing=30" \
    stabilized.mp4

상품이 변형될 때

원인 1: num_frames 수가 너무 많음
  → 81 frames → 49 frames 로 줄이기
  → 긴 영상일수록 상품 형태가 변형될 확률↑

원인 2: cfg 값이 너무 낮음
  → cfg 3.0 → 6.0으로 높이기
  → 낮으면 모델이 자유롭게 생성해 상품이 변형됨

원인 3: 입력 이미지 해상도 문제
  → 입력 이미지를 최소 512x512 이상으로 사용

해결책:
  → HunyuanVideo 1.5 사용 (상품 형태 보존 특화)
  → 프롬프트에 "maintain product shape, no distortion" 추가
  → ControlNet(참조 이미지 고정) 워크플로우 적용

VRAM OOM 해결법

python# 1단계: 해상도 낮추기
# 1280x720 → 832x480 → 512x288 순서로 시도

# 2단계: fp8 양자화 모델 사용
# "wan2.2-i2v-14b-720p-fp8.safetensors" 버전 다운로드
# Full 모델 대비 VRAM 40~50% 절약

# 3단계: num_frames 줄이기
# 81 → 49 → 33 프레임

# 4단계: ComfyUI 실행 옵션 추가
python main.py \
  --lowvram \               # VRAM 최소화 모드
  --fp8_e4m3fn \            # fp8 연산
  --bf16-unet               # bf16 정밀도

# 5단계: CUDA 캐시 비우기
import torch, gc
gc.collect()
torch.cuda.empty_cache()

# 실시간 VRAM 모니터링
watch -n 1 nvidia-smi

✅ 완성 체크리스트

상품 이미지 배경 제거 & 비율 조정 완료

ComfyUI + Wan2.2 정상 설치 확인

단일 클립 (5초) 생성 테스트 성공

3개 클립 (오프닝 / 디테일 / 클로징) 생성 완료

FFmpeg으로 15초 최종 광고 영상 합치기 완료

BGM 추가 (응용 5 연동) 확인