第一弹HASH算法模板以及简单的入门题总结
Posted 哈工程CCFCSP中心
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了第一弹HASH算法模板以及简单的入门题总结相关的知识,希望对你有一定的参考价值。
Hash算法简介
Hash算法模板
//暂时没用到双hash,用到会过来补充
//hash一般用来解决字符串判重/字符串匹配问题
//遇见不定长问题可通过二分+hash降低复杂度
//遇见定长字符串问题可通过尺取+hash来降低复杂度
//二维hash的时候尺取方法就是把之前不需要的都变为0再加上当前行,将匹配字符串整体下移,来验证hash值是否相等
#include<string.h>
typedef unsigned long long ull;
const int maxn=1e5+5;
ull hash_[maxn],xp[maxn];
void init()
{
xp[0]=1;
for(int i=1;i<maxn;i++)
xp[i]=xp[i-1]*13331;//这里13331玄学数字,大概可以随意换
return;
}
void make_hash(char str[])//处理出str的hash值
{
int len=strlen(str);
hash_[len]=0;
for(int i=len-1;i>=0;i--)
{
hash_[i]=hash_[i+1]*13331+str[i]-'A'+1;
}
return ;
}
ull Get_hash(int i,int L)//得到起点为i,长度为L的子串的hash值
{
return hash_[i]-hash_[i+L]*xp[L];
}
Hash第一题
题目描述
The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:
Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…
Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.
So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.
输入
The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:
One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.
输出
For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.
样例
输入
3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN
输出
1
3
0
题意就是找A串在B串中的出现次数(可重叠)。其实就是一个kmp模板题,但是我们可以用hash很容易的解决。
我们得到A串的hash值,然后在B中枚举起点,长度为lena的子串,检验hash值是否相同就可以了。
代码如下
#include<stdio.h>
#include<iostream>
#include<algorithm>
#include<string.h>
using namespace std;
typedef unsigned long long ull;
const int maxn = 1e6+5;
ull xp[maxn],hash_1[maxn],hash_2[maxn];
void init()
{
xp[0]=1;
for(int i=1;i<maxn;i++)
xp[i]=xp[i-1]*13331;
}
ull get_hash(int i,int L,ull hash_[])//get_hash(i,L)可以得到从位置i开始的,长度为L的子串的hash值.
{
return hash_[i]-hash_[i+L]*xp[L];
}
int make_hash(char str[],ull hash_[])
{
int len=strlen(str);
hash_[len]=0;
for(int i=len-1;i>=0;i--)
{
hash_[i]=hash_[i+1]*13331+(str[i]-'a'+1);
}
return len;
}
char str[maxn],str2[maxn];
int main()
{
init();
int t;
scanf("%d",&t);
while(t--)
{
int ans=0;
scanf("%s%s",str,str2);
int len1=make_hash(str,hash_1);
int len2=make_hash(str2,hash_2);
ull tmp=get_hash(0,len1,hash_1);
for(int i=0;i<len2-len1+1;i++)//注意枚举时的边界问题
{
if(get_hash(i,len1,hash_2)==tmp)
ans++;
}
printf("%d ",ans);
}
return 0;
}
CCF CSP软件能力认证
文案:csp认证中心技术部
编辑:王语尧
CCF CSP软件能力认证中心
哈尔滨工程大学认证中心
办公室地点:21A331
以上是关于第一弹HASH算法模板以及简单的入门题总结的主要内容,如果未能解决你的问题,请参考以下文章