最近我在看一个外国同事录的 Demo 视频呀,那里面有两句他说得那叫一个快,我听了好几遍都没听清楚。这可咋办呢?
说干就干,有了 AI 的加持, 没花多长时间就写好了一段小程序,你别说,效果还不错。
虽然只是句玩笑话, 不过说真的,时代的洪流不服不行,好多职业都在 AI 这股洪流中“灰飞烟灭”了。
在人工智能蓬勃发展的时代,好多岗位都面临着被取代的风险呀,我们程序员也不例外, 令人唏嘘的是,程序员自己写程序把自己的职位给搞没了.
第一步:将mp4文件的语音提取为 wave 文件
def extract_audio_from_video(mp4_file, audio_file):
"""Extract audio from an MP4 file and save as WAV."""
ffmpeg.input(mp4_file).output(audio_file).global_args('-loglevel', 'error').run()
return audio_file
第二步:调用 whisper 识别语音为文本。
这个whisper就厉害, openai 出口, 必属精品, 可以把视频里的语音识别成文字呢。代码如下:
def transcribe_audio_with_whisper(audio_file, model_name="base"):
"""Transcribe audio to text using Whisper."""
# Load Whisper model
model = whisper.load_model(model_name)
# Transcribe audio
result = model.transcribe(audio_file)
return result["text"]
为了制作字幕文件, 我们最好按段落识别, 并记录段落的赶止时间, 代码如下:
def transcribe_audio_with_segments(audio_file, model_name="base", pause_threshold=0.5):
model = whisper.load_model(model_name)
result = model.transcribe(audio_file, word_timestamps=True)
segments = result["segments"]
sn = 0
paragraphs = []
current_paragraph = []
previous_end_time = 0.0
paragraph_start_time = 0.0
for segment in segments:
start_time = segment["start"]
end_time = segment["end"]
text = segment["text"].strip()
# the first paragraph
if len(current_paragraph) == 0:
paragraph_start_time = start_time
# create new paragraph if time distance is greater than pause threshold
if start_time - previous_end_time > pause_threshold:
if current_paragraph:
sn += 1
merged_text = " ".join(current_paragraph)
paragraphs.append(f"\n{sn}\n[{format_time(paragraph_start_time)} --> {format_time(end_time)}]\n{merged_text}")
# start new paragraph
current_paragraph = []
paragraph_start_time = start_time
previous_end_time = end_time
# the last paragraph
if current_paragraph:
paragraphs.append(" ".join(current_paragraph))
return paragraphs
第三步:调用Google Translate把文本翻译成中文。
借助 google translator 这个强大的翻译工具,英文文本能轻松翻译成中文啦。代码如下:
async def translate_text(text, src, dest):
"""Translate text from English to Chinese."""
translator = Translator()
translated = await translator.translate(text, src=src, dest=dest)
return translated.text
def do_translate(text_file, text, src, dest):
translated_text =, src, dest))
dest_file_name = text_file.replace(src, dest)
with open(os.path.join(dest_file_name), "w", encoding="utf-8") as f:
print(f"### Transcribed Text:\n {translated_text}")
def format_time(seconds):
millis = int((float(seconds) % 1) * 1000)
seconds = int(float(seconds))
mins, secs = divmod(seconds, 60)
hrs, mins = divmod(mins, 60)
return f"{hrs:02}:{mins:02}:{secs:02},{millis:03}"
如此这般, 我们就能看着带有中文字幕的视频啦。这一步通常是在视频播放软件中操作,比如常见的VLC播放器,你在播放mp4文件时,在播放器界面找到加载字幕的选项,然后选择我们程序生成的srt
具体的用法可以参见 。
这个脚本是用Python写的,它可以把mp4视频通过 whisper 转换成文本,并且还能把文本从默认的源语言“en”翻译成默认的目标语言“zh-cn”, 最后生成字幕文件。
brew install ffmpeg
pip install -r requirements.txt
./ -h
就能看到具体的使用说明咯, 有如下参数可以设置:
- --input INPUT: 也就是你要输入的mp4文件的路径;
- --output OUTPUT”: 这是输出的文本文件的路径,如果不指定的话,就会和输入文件同名,但是扩展名是“.srt”哦;
- --model MODEL: whisper 使用的模型,默认是“small”;
- --src SRC: 源语言,默认是“en”, 即英文
- --dest DEST: 目标语言, 默认是“zh-cn”, 即中文
- --format FORMAT: 输出格式,目前只支持“txt”和“srt”格式哟。
./ -i./example/5_minutes_for_50_years.mp4
[00:00:00,620 --> 00:00:32,600]
I'm gonna talk to you about some things I've learned in my journey. Most from experience, some of them I've heard in passing, many of them I'm still practicing, but all of them I do believe are true. Life is not easy. It is not. Don't try to make it that way. Life's not fair. It never was. It is it now and it won't ever be. Do not fall into the trap, the entitlement trap, a feeling like you're a victim. You are not. Get over it and get on with it.
[00:00:28,079 --> 00:04:04,680]
So the question that we're gonna ask ourselves is what success is to us. What success is to you? Is it more money? That's fine. I got nothing against money. Maybe it's a healthy family. Maybe it's a happy marriage. Maybe it's to help others to be famous, to be spiritually sound, to leave the world a little bit better place than you found it. Continue to ask yourself that question. Now your answer may change over time and that's fine. But do yourself this favor. Whatever your answer is, don't choose anything that will jeopardize your soul. Prioritize who you are, who you want to be, and don't spend time with anything that antagonizes your character. Be brave, take the hill, but first answer that question what's my hill? For first, we have to define success for ourselves and then we have to put in the work to maintain it. Take that daily talent. Tend our guard. Keep the things that are important to us in good shape. Where you are not is as important as where you are. It is just as important where we are not as it is where we are. Look, the first step that leads to our identity life is usually not. I know who I am. I know who I am. That's not the first step. The first steps usually I know who I am not. Process of elimination. Defining ourselves by what we are not is the first step that leads us to really knowing who we are. You know, that group of friends that you hang out with that really might not bring out the best in you. They gossip too much, they're kind of shady. They really aren't going to be there for you in a pinch. How about that bar that we keep going to that we always seem to have the worst hangover problem? Or that computer screen, right? The computer screen that keeps giving us an excuse not to get out of the house and engage with the world and get some real human interaction. How about that food that we keep eating? Stuff that tastes so good going down, makes us feel like crap the next week? We feel a thardic when we keep putting on weight? Well, those people, those places, those things, stop giving them your time and energy. Just don't go there. I mean, put them down. And when you do this, when you do put them down, you put them down there. When you put them in your time, you inadvertently find yourself spending more time and in more places that are healthy for you, that bring you more joy. Why? Because you just eliminated the who's, the where's, the what's and the wins that were keeping you from your identity. Look, trust me, too many options. I promise you, too many options will make a tyrant of us all. Alright, so get rid of the excess, the wasted time, decrease your options. If you do this, you will have accidentally, almost innocently, put in front of you what is important to you. By processing elimination. Knowing who we are is hard. It's hard. Give yourself a break. Eliminate who you are not first. And you're going to find yourself where you need to be.
[00:03:59,139 --> 00:04:26,939]
Instead of creating outcomes that take from us, let's create more outcomes that pay us back. Fill us up. Keep your fire lit. Turn you on for the most amount of time in your future. We try our best. We don't always do our best.
[00:04:16,899 --> 00:04:50,120]
Architecture is a verb as well. And since we are the architects of our own lives, let's study the habits, the practices, the routines that we have that lead to and feed our success. Our joy, our honest pain, our laughter, our earned tears. Let's dissect that and give thanks for those things. And when we do that, guess what happens? We get better at them. And we have more to dissect.
[00:04:43,980 --> 00:05:00,120]
Be discerning. Choose it because you want it. Do it because you want to.
[00:04:52,819 --> 00:05:23,519]
We're going to make mistakes. You got to own them. Then you got to make amends. And then you got to move on. Guilt and regret kills many a man before their time. So turn the page. Get off the ride. You are the author of the book of your life.
Thank you.
对应的完整代码可以参见 。
- 你可以从YouTube或其他网站上下载任何你想要的视频,然后用这个脚本来制作字幕哦。
比如说,你可以用 这个网站来下载视频。
- 下载好之后呢,还可以对视频进行一些处理。比如说用如下命令来裁剪视频
ffmpeg -i input.mp4 -ss 00:00:07 -c:v copy -c:a copy output_trimmed.mp4
- 或者用如下命令来转换视频的尺寸和码率
ffmpeg -i output_trimmed.mp4 -vf scale=320:180 -c:v libx264 -preset fast -crf 23 -c:a aac -b:a 192k output_180p.mp4
具体的代码可以参见 。怎么样,是不是很有趣呀,大家也赶紧去试试吧。
希望以上内容你能喜欢,你可以根据实际情况进行调整和修改, 并回馈到代码库, 让更多人受益。如果你还有其他问题,欢迎随时问我, 我会尽力回答。
