javascript中字幕的正则表达式中的可变行数

Posted

技术标签:

【中文标题】javascript中字幕的正则表达式中的可变行数【英文标题】:Number of variable lines in regex for subtitles in javascript 【发布时间】:2020-12-06 19:09:08 【问题描述】:

我随时使用正则表达式创建一个包含[数字、开始、结束、文本]的字幕数组

/(\d+)\n([\d:,]+)\s+-2\>\s+([\d:,]+)\n([\s\S]*?(?=\n2|$))/gm

但问题是,在文本部分,如果行数为2或更多,则不会被读取。

Here you can see the relevant image

我不希望只将部分中文本的第一行视为文本,如果还有其他行,也应该考虑它们。

你帮了我一个大忙。感激不尽

let subtitle = document.getElementById('subtitle').value;
console.log(_subtitle(subtitle));

function _subtitle(text) 

        let Subtitle = text;
        let Pattern = /(\d+)\n([\d:,]+)\s+-2\>\s+([\d:,]+)\n([\s\S]*?(?=\n2|$))/gm;
        let _regExp = new RegExp(Pattern);
        let result = [];

        if (typeof (text) != "string") throw "Sorry, Parser accept string only.";
        if (Subtitle === null) return Subtitle;

        let Parse = Subtitle.replace(/\r\n|\r|\n/g, '\n');
        let Matches;

        while ((Matches = Pattern.exec(Parse)) != null) 

result.push(
                Line: Matches[1],
                Start: Matches[2],
                End: Matches[3],
                Text: Matches[4],
            )

        
        
        return result;

    
#warning
background-color:#e74e4e;
color:#fff;
font-family:Roboto;
padding:14px;
border-radius:4px;
margin-bottom:14px


textarea
width:100%;
min-height:100px;
<div id="warning">The output is on the console</div>

<textarea id="subtitle">1
00:00:00,000 --> 00:00:00,600
Hi my friends

2
00:00:00,610 --> 00:00:01,050
In the first line, everything works properly
But there is a problem in the second line that I could not solve :(

3
00:00:01,080 --> 00:00:03,080
But then everything is in order and good

4
00:00:03,280 --> 00:00:05,280
You do me a great favor by helping me. Thankful</textarea>

【问题讨论】:

如果你能匹配其余的,为什么你有 ([\s\S]*?(?=\n2|$)):(.*)? 【参考方案1】:

/gm 替换为/g,否则$ 指的是第一个“文本”行的结尾,并且正则表达式不会尝试匹配其后的任何内容:

let subtitle = document.getElementById('subtitle').value;
console.log(_subtitle(subtitle));

function _subtitle(text) 
  let Subtitle = text;
  let Pattern = /(\d+)\n([\d:,]+)\s+-2\>\s+([\d:,]+)\n([\s\S]*?(?=\n2|$))/g;
  let _regExp = new RegExp(Pattern);
  let result = [];

  if (typeof(text) != "string") throw "Sorry, Parser accept string only.";
  if (Subtitle === null) return Subtitle;

  let Parse = Subtitle.replace(/\r\n|\r|\n/g, '\n');
  let Matches;

  while ((Matches = Pattern.exec(Parse)) != null) 
    result.push(
      Line: Matches[1],
      Start: Matches[2],
      End: Matches[3],
      Text: Matches[4],
    )

  

  return result;
#warning 
  background-color: #e74e4e;
  color: #fff;
  font-family: Roboto;
  padding: 14px;
  border-radius: 4px;
  margin-bottom: 14px


textarea 
  width: 100%;
  min-height: 100px;
<div id="warning">The output is on the console</div>

<textarea id="subtitle">1
00:00:00,000 --> 00:00:00,600
Hi my friends

2
00:00:00,610 --> 00:00:01,050
In the first line, everything works properly
But there is a problem in the second line that I could not solve :(

3
00:00:01,080 --> 00:00:03,080
But then everything is in order and good

4
00:00:03,280 --> 00:00:05,280
You do me a great favor by helping me. Thankful</textarea>

【讨论】:

以上是关于javascript中字幕的正则表达式中的可变行数的主要内容,如果未能解决你的问题,请参考以下文章

使用正则表达式从字幕格式化文本的问题

正则表达式的可变长度lookbehind-assertion替代方案

.sub 字幕格式的正则表达式模式

使用正则表达式 C# 解析字幕文件

通过正则表达式提取泰坦尼克号字幕里的所有台词

对象拓展