최대 1 분 소요

  • 내용: video dataset 조사한 것 중에 하나 선택해서 video caption을 input으로 했을 때 audio 결과가 어떤지 audioldm 모델로 결과 뽑고, video(caption에 해당하는)랑 audio(audioldm모델로 뽑은 결과) 를 합쳤을 때 두개가 sink가 얼마나 잘 안 맞는지 확인하기.
  • 과정
    1. audioldm 환경 설정 & 실행

       # Optional
       conda create -n audioldm python=3.8; conda activate audioldm
       # Install AudioLDM
       pip3 install audioldm
              
       ### Text-to-Audio Generation: generate an audio guided by a text
       # The default --mode is "generation"
       audioldm -t "A hammer is hitting a wooden surface" 
       # Result will be saved in "./output/generation"
      

      GitHub - haoheliu/AudioLDM at dda0f54ab283ecdc1fe94ffc3182236cb8c343bf

      • Input Text: A hammer is hitting a wooden surface
      • Output Audio (generated audio):


    2. Video Dataset: Webvid에서 Video 4개 선정하고, Video의 caption을 넣고, Audio 생성

      • Input Text: Travel blogger shoot a story on top of mountains. young man holds camera in forest.
        • video

        • generated audio

      • Input Text: Horse grazing - seperated on green screen
        • video

        • generated audio

      • Input Text: City traffic lights. blurred view
        • video

        • generated audio

      • Input Text: Young woman flexing muscles with barbell in gym.the coach helps her.
        • video

        • generated audio


댓글남기기