POJ 2774 字符串哈希_解题报告

Posted jasmine-

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了POJ 2774 字符串哈希_解题报告相关的知识,希望对你有一定的参考价值。

题目:

 http://poj.org/problem?id=2774

A - Long Long Message

The little cat is majoring in physics in the capital of Byterland. A piece of sad news comes to him these days: his mother is getting ill. Being worried about spending so much on railway tickets (Byterland is such a big country, and he has to spend 16 shours on train to his hometown), he decided only to send SMS with his mother.

The little cat lives in an unrich family, so he frequently comes to the mobile service center, to check how much money he has spent on SMS. Yesterday, the computer of service center was broken, and printed two very long messages. The brilliant little cat soon found out:

1. All characters in messages are lowercase Latin letters, without punctuations and spaces.
2. All SMS has been appended to each other – (i+1)-th SMS comes directly after the i-th one – that is why those two messages are quite long.
3. His own SMS has been appended together, but possibly a great many redundancy characters appear leftwards and rightwards due to the broken computer.
E.g: if his SMS is “motheriloveyou”, either long message printed by that machine, would possibly be one of “hahamotheriloveyou”, “motheriloveyoureally”, “motheriloveyouornot”, “bbbmotheriloveyouaaa”, etc.
4. For these broken issues, the little cat has printed his original text twice (so there appears two very long messages). Even though the original text remains the same in two printed messages, the redundancy characters on both sides would be possibly different.

You are given those two very long messages, and you have to output the length of the longest possible original text written by the little cat.

Background:
The SMS in Byterland mobile service are charging in dollars-per-byte. That is why the little cat is worrying about how long could the longest original text be.

Why ask you to write a program? There are four resions:
1. The little cat is so busy these days with physics lessons;
2. The little cat wants to keep what he said to his mother seceret;
3. POJ is such a great Online Judge;
4. The little cat wants to earn some money from POJ, and try to persuade his mother to see the doctor :(

 

 

Input

Two strings with lowercase letters on two of the input lines individually. Number of characters in each one will never exceed 100000.

 

Output

A single line with a single integer number – what is the maximum length of the original text written by the little cat.

 

Sample input

yeshowmuchiloveyoumydearmotherreallyicannotbelieveit yeaphowmuchiloveyoumydearmother

 

sample output

27

 

题目大意:输出两个字符串中最长字串的长度,两个串长不超过100000.

思路:

  二分子串长度,通过哈希值验证两个字符串是否存在该长度的子串。

  首先比较两个字符串的长短,选取较短的长度作为r值,l值初始为0(注意是0,不是1!!可能没有公共子串!)。求出一个字符串前缀的哈希值,存储在数组里。

判断是否有长度为mid的字串:通过长为mid的哈希值,O(n)求出所有长为mid的哈希值,存储在数组里,排序。依次求出另一个字符串长为mid的所有字串的哈希值,binary_search查找是否存在字串。

复杂度:sort +  lower_bound  (n2logn)

出错的地方:

  这道题用了unsigned long long 到达264自然溢出,所以没有显式地用模运算,但是要注意所有用来计算哈希值的变量都应该是unsigned long long ,否则就会造成模数不同而出错!

  二分答案左值初始应设为0,可能没有公共子串!!

AC代码:

 1 #include <iostream>
 2 #include<algorithm>
 3 #include<cstring>
 4 #include<cstdio>
 5 using namespace std;
 6 typedef unsigned long long ull;
 7 const int maxn=1e5+10;
 8 char stra[maxn],strb[maxn],strc[maxn];
 9 ull hasa[maxn],hasb[maxn];
10 int lena,lenb;
11 ull mi[maxn]; 
12 ull hasat[maxn],hasbt;   //这里应该用ull,因为用来存储哈希值 
13 bool yes(int x)
14     hasat[0]=hasa[x-1];
15     for(int i=1;i<lena-x+1;i++)
16         hasat[i]=hasat[i-1]*31+stra[i+x-1]-a+1-(ull)(stra[i-1]-a+1)*mi[x];
17     
18     sort(hasat,hasat+lena-x+1);
19     hasbt=hasb[x-1];
20     for(int i=1;i<lenb-x+2;i++)
21         if(binary_search(hasat,hasat+lena-x+1,hasbt)) return true;
22         hasbt=hasbt*31+strb[i+x-1]-a+1-(ull)(strb[i-1]-a+1)*mi[x];
23     
24     
25     return false;
26 
27 int main(int argc, char** argv) 
28     mi[0]=1;     //预处理31的各个幂 
29     for(int i=1;i<maxn;i++) mi[i]=mi[i-1]*31;
30     while(scanf("%s%s",stra,strb)==2)
31         lena=strlen(stra),lenb=strlen(strb);
32         if(lena>lenb)    //短的是字符串a 
33             int temp=lena;
34             lena=lenb;
35             lenb=temp;
36             strcpy(strc,stra);
37             strcpy(stra,strb);
38             strcpy(strb,strc);
39         
40         hasb[0]=strb[0]-a+1;     //预处理字符串b 
41         for(int i=1;i<lenb;i++)         
42             hasb[i]=strb[i]-a+1+hasb[i-1]*31;   //为什么不从0开始?因为这样就容易出现不同字符串哈希值相等的情况  
43         
44         hasa[0]=stra[0]-a+1;     //预处理字符串b 
45         for(int i=1;i<lena;i++)         
46             hasa[i]=stra[i]-a+1+hasa[i-1]*31;   //为什么不从0开始?因为这样就容易出现不同字符串哈希值相等的情况  
47         
48     //    cout<<"ok"<<endl;
49         int l=0,r=lena,mid;    //l应该初始时0!!!可能有0的情况!! 
50         while(l<r-1)
51             mid=(l+r)>>1;
52             if(yes(mid)) l=mid;
53             else r=mid-1;
54         
55     //    cout<<"okk"<<endl;
56     //    cout<<l<<" "<<r<<endl; 
57         for(int i=r;i>=l;i--)
58             if(yes(i))
59                 cout<<i<<endl;
60                 break;
61              
62             
63         
64     
65 
66     return 0;
67 

 

以上是关于POJ 2774 字符串哈希_解题报告的主要内容,如果未能解决你的问题,请参考以下文章

poj 2774 字符串哈希求最长公共子串

POJ2774(二分+哈希)

POJ 2774 后缀数组 || 二分+哈希

POJ2774 后缀自动机&后缀数组

poj3274解题报告

POJ 1458(DP初步_B题)解题报告