后缀自动机广义后缀自动机备忘录

Posted RainbowCrown

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了后缀自动机广义后缀自动机备忘录相关的知识,希望对你有一定的参考价值。

前言

话说这玩意一两年前就会了。
然后太久没打了,都忘光了。

今天又有题要用这玩意儿,就再康了康。
话说稍微看了看记忆就又回了了。

还是要多复习啊。

后缀自动机

简称SAM。
以前还有个老话是“初三还不会SAM就退役把”

不多废话。(话说上面的不都是废话吗)

定义

后缀自动机就只是定义比较难理解而已。

  • 1、定义endpos数组表示某个子串在主串中出现的位置集合。
    • 举个例子就是abcabcabcaac,那么endpos(abc)=3,6,9
  • 1.5、定义一个自动机上节点的right表示当前点endpos集合相同的子串的集合。
  • 2、定义maxlen和minlen表示自动机上一个节点上right集合的长度最大最小值。
  • 3、定义fail指针(没有fail指针就是没有自动机的灵魂)
    • 设当前节点为y,fail指针指向的位置为x。
    • 那么x满足x的right集合真包含y的right集合。
    • 且若有多个x满足上面条件,则x选择maxlen最大的那个。
性质
  • 1、显然如果两个字串的endpos全部相等,那么必然其中一个是另一个后缀。
  • 2、那么我们发现在maxlen和minlen这个区间内的字符串都是互为后缀的且连续。
  • 3、

POJ3080 POJ3450Corporate Identity(广义后缀自动机||后缀数组||KMP)

Beside other services, ACM helps companies to clearly state their “corporate identity”, which includes company logo but also other signs, like trademarks. One of such companies is Internet Building Masters (IBM), which has recently asked ACM for a help with their new identity. IBM do not want to change their existing logos and trademarks completely, because their customers are used to the old ones. Therefore, ACM will only change existing trademarks instead of creating new ones.

After several other proposals, it was decided to take all existing trademarks and find the longest common sequence of letters that is contained in all of them. This sequence will be graphically emphasized to form a new logo. Then, the old trademarks may still be used while showing the new identity.

Your task is to find such a sequence.

Input

The input contains several tasks. Each task begins with a line containing a positive integer N, the number of trademarks (2 ≤ N ≤ 4000). The number is followed by N lines, each containing one trademark. Trademarks will be composed only from lowercase letters, the length of each trademark will be at least 1 and at most 200 characters.

After the last trademark, the next task begins. The last task is followed by a line containing zero.

Output

For each task, output a single line containing the longest string contained as a substring in all trademarks. If there are several strings of the same length, print the one that is lexicographically smallest. If there is no such non-empty string, output the words “IDENTITY LOST” instead.

Sample Input

3
aabbaabb
abbababb
bbbbbabb
2
xyz
abc
0

Sample Output

abb
IDENTITY LOST

 

题意:

求n个串的最长公共字串,如果有多个,输出最小字典序的一个。

思路:

KMP||后缀数组||广义后缀自动机,不说了,上高数课了。代码比较暴力。下课了再试一试优化。

对比:

后缀数组SA已经排序了,最小字典序好找。而后缀自动机则需要像字典树一样搜索。

#include<iostream>
#include<cstdio>
#include<algorithm>
#include<cstring>
#include<memory>
#include<cmath>
using namespace std;
int n,len,ans,Max,now;
const int maxn=4000010; 
char s[2100],cap[2100];
struct SAM
{
    int ch[maxn][26],fa[maxn],maxlen[maxn],Last,sz;
    int root,nxt[maxn],size[maxn];bool Flag;
    void init()
    {
        sz=0;Flag=false;
        root=++sz;
        memset(size,0,sizeof(size));
        memset(ch[1],0,sizeof(ch[1]));
        memset(nxt,0,sizeof(nxt));
    }
    void add(int x)
    {
        int np=++sz,p=Last;Last=np;
        memset(ch[np],0,sizeof(ch[np]));
        maxlen[np]=maxlen[p]+1;
        while(p&&!ch[p][x]) ch[p][x]=np,p=fa[p];
        if(!p) fa[np]=1;
        else {
            int q=ch[p][x];
            if(maxlen[p]+1==maxlen[q]) fa[np]=q;
            else {
                int nq=++sz;
                memcpy(ch[nq],ch[q],sizeof(ch[q]));size[nq]=size[q]; nxt[nq]=nxt[q];
                maxlen[nq]=maxlen[p]+1;
                fa[nq]=fa[q];
                fa[q]=fa[np]=nq;
                while(p&&ch[p][x]==q) ch[p][x]=nq,p=fa[p];
            }
        }
        for(;np;np=fa[np]) 
          if(nxt[np]!=now) {
              size[np]++;
              nxt[np]=now;
          }else break;
    }
   /* void dfs(int x,int d){//输出    
       if(Flag||d>n) return;
       if(d==n){ puts(cap); Flag=true; return; }
          for(int i=0;i<26;i++)
          if(ch[x][i]&&size[ch[x][i]]==n){ cap[d]=i+‘a‘; dfs(ch[x][i],d+1); cap[d]=0; }
    }*/
    void dfs(int x,int d){//输出    
       if(d!=maxlen[x]||d>ans||Flag) return;
        if(maxlen[x]==ans&&size[x]>=n) { puts(cap); Flag=true; return; }
          for(int i=0;i<26;++i)
          if(ch[x][i]){ cap[d]=i+a; dfs(ch[x][i],d+1); cap[d]=0; }
    }
};
SAM Sam;
int main()
{
    while(~scanf("%d",&n)&&n){     
        Sam.init();
        for(int i=1;i<=n;i++) {
            scanf("%s",s+1);
            Sam.Last=Sam.root;
            len=strlen(s+1);
            now=i;
            for(int j=1;j<=len;j++) Sam.add(s[j]-a);
        }
        ans=0;
        for(int i=1;i<=Sam.sz;i++) 
            if(Sam.size[i]==n&&Sam.maxlen[i]>ans) ans=Sam.maxlen[i];
        if(ans)  Sam.dfs(1,0);
        else printf("IDENTITY LOST\n");
    }
    return 0;
}

 

以上是关于后缀自动机广义后缀自动机备忘录的主要内容,如果未能解决你的问题,请参考以下文章

广义后缀自动机模板

MemSQL Start[c]UP 2.0 - Round 1 E - Three strings 广义后缀自动机

BZOJ 3926: [Zjoi2015]诸神眷顾的幻想乡 广义后缀自动机 后缀自动机 字符串

bzoj5084hashit 广义后缀自动机+树链的并+STL-set

POJ3080 POJ3450Corporate Identity(广义后缀自动机||后缀数组||KMP)

[P6139] 模板广义后缀自动机 - 广义SAM