第一弹HASH算法模板以及简单的入门题总结

Posted 2021-04-12 哈工程CCFCSP中心

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了第一弹HASH算法模板以及简单的入门题总结相关的知识，希望对你有一定的参考价值。

Hash算法简介

Hash算法模板

//暂时没用到双hash，用到会过来补充

//hash一般用来解决字符串判重/字符串匹配问题

//遇见不定长问题可通过二分+hash降低复杂度

//遇见定长字符串问题可通过尺取+hash来降低复杂度

//二维hash的时候尺取方法就是把之前不需要的都变为0再加上当前行，将匹配字符串整体下移，来验证hash值是否相等

#include<string.h>

typedef unsigned long long ull;

const int maxn=1e5+5;

ull hash_[maxn],xp[maxn];

void init()

{

xp[0]=1;

for(int i=1;i<maxn;i++)

xp[i]=xp[i-1]*13331;//这里13331玄学数字，大概可以随意换

return;

}

void make_hash(char str[])//处理出str的hash值

{

int len=strlen(str);

hash_[len]=0;

for(int i=len-1;i>=0;i--)

{

hash_[i]=hash_[i+1]*13331+str[i]-'A'+1;

}

return ;

}

ull Get_hash(int i,int L)//得到起点为i，长度为L的子串的hash值

{

return hash_[i]-hash_[i+L]*xp[L];

}

Hash第一题

题目描述

The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:

Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…

Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.

So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.

输入

The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:

One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).

One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.

输出

For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.

样例

输入

3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN

输出

1
3
0

题意就是找A串在B串中的出现次数（可重叠）。其实就是一个kmp模板题，但是我们可以用hash很容易的解决。

我们得到A串的hash值，然后在B中枚举起点，长度为lena的子串，检验hash值是否相同就可以了。

代码如下

#include<stdio.h>

#include<iostream>

#include<algorithm>

#include<string.h>

using namespace std;

typedef unsigned long long ull;

const int maxn = 1e6+5;

ull xp[maxn],hash_1[maxn],hash_2[maxn];

void init()

{

xp[0]=1;

for(int i=1;i<maxn;i++)

xp[i]=xp[i-1]*13331;

}

ull get_hash(int i,int L,ull hash_[])//get_hash(i,L)可以得到从位置i开始的,长度为L的子串的hash值.

{

return hash_[i]-hash_[i+L]*xp[L];

}

int make_hash(char str[],ull hash_[])

{

int len=strlen(str);

hash_[len]=0;

for(int i=len-1;i>=0;i--)

{

hash_[i]=hash_[i+1]*13331+(str[i]-'a'+1);

}

return len;

}

char str[maxn],str2[maxn];

int main()

{

init();

int t;

scanf("%d",&t);

while(t--)

{

int ans=0;

scanf("%s%s",str,str2);

int len1=make_hash(str,hash_1);

int len2=make_hash(str2,hash_2);

ull tmp=get_hash(0,len1,hash_1);

for(int i=0;i<len2-len1+1;i++)//注意枚举时的边界问题

{

if(get_hash(i,len1,hash_2)==tmp)

ans++;

}

printf("%d ",ans);

}

return 0;

}

CCF CSP软件能力认证

文案：csp认证中心技术部

编辑：王语尧

CCF CSP软件能力认证中心

哈尔滨工程大学认证中心

办公室地点：21A331

以上是关于第一弹HASH算法模板以及简单的入门题总结的主要内容，如果未能解决你的问题，请参考以下文章