VTT字幕文件处理(vi + sed + awk)
Posted dingdingfish
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了VTT字幕文件处理(vi + sed + awk)相关的知识,希望对你有一定的参考价值。
收到一个字幕文件,vtt后缀,部分内容如下:
00:00:00.030 --> 00:00:01.670 align:start position:0%
in<00:00:00.359><c> this</c><00:00:00.539><c> episode</c><00:00:00.989><c> we're</c><00:00:01.319><c> going</c><00:00:01.410><c> to</c><00:00:01.469><c> take</c><00:00:01.620><c> a</c>
00:00:01.670 --> 00:00:01.680 align:start position:0%
in this episode we're going to take a
00:00:01.680 --> 00:00:05.599 align:start position:0%
in this episode we're going to take a
step<00:00:01.979><c> into</c><00:00:02.280><c> the</c><00:00:02.659><c> unknown</c><00:00:03.980><c> it's</c><00:00:04.980><c> time</c><00:00:05.250><c> to</c><00:00:05.400><c> talk</c>
00:00:05.599 --> 00:00:05.609 align:start position:0%
step into the unknown it's time to talk
00:00:05.609 --> 00:00:13.039 align:start position:0%
step into the unknown it's time to talk
about<00:00:10.099><c> the</c><00:00:11.099><c> non</c><00:00:11.309><c> value</c><00:00:11.730><c> is</c><00:00:11.910><c> a</c><00:00:11.940><c> placeholder</c><00:00:12.360><c> for</c>
00:00:13.039 --> 00:00:13.049 align:start position:0%
about the non value is a placeholder for
00:00:13.049 --> 00:00:14.690 align:start position:0%
about the non value is a placeholder for
missing<00:00:13.440><c> or</c><00:00:13.710><c> not</c><00:00:13.980><c> applicable</c><00:00:14.190><c> information</c>
00:00:14.690 --> 00:00:14.700 align:start position:0%
missing or not applicable information
先用vi删除00:
开始的行,命令为g/^00:/d
;然后用vi去掉以</c>
结束的行,命令为g/<\\/c>$/d
,输出如下:
in this episode we're going to take a
in this episode we're going to take a
step into the unknown it's time to talk
step into the unknown it's time to talk
about the non value is a placeholder for
about the non value is a placeholder for
missing or not applicable information
使用sed去掉空行,命令为sed -r '/^\\s*$/d'
,参考这里:
# sed -r '/^\\s*$/d' /tmp/1
in this episode we're going to take a
in this episode we're going to take a
step into the unknown it's time to talk
step into the unknown it's time to talk
about the non value is a placeholder for
about the non value is a placeholder for
missing or not applicable information
最后用awk删除连续的重复行,命令为awk '!x[$0]++'
,参考这里,这命令牛:
in this episode we're going to take a
step into the unknown it's time to talk
about the non value is a placeholder for
missing or not applicable information
以上是关于VTT字幕文件处理(vi + sed + awk)的主要内容,如果未能解决你的问题,请参考以下文章
如何在 tvOS 中将外部 .vtt 字幕文件添加到 AVPlayerViewController