基于STL的字典生成模块-模拟搜索引擎算法的尝试
Posted savennist
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了基于STL的字典生成模块-模拟搜索引擎算法的尝试相关的知识,希望对你有一定的参考价值。
该课题来源于UVA中Searching the Web的题目:https://vjudge.net/problem/UVA-1597
按照题目的说法,我对按照特定格式输入的文章中的词语合成字典,以满足后期的快速查找。
针对于字典的合成途径,我利用了STL中的map与set的嵌套形成了一种特定的数据结构来解析文章中的单词
1 #include<map> 2 #include<iostream> 3 #include<set> 4 #include<algorithm> 5 #include<string> 6 #include<cctype> 7 #include<sstream> 8 using namespace std; 9 struct newpair 10 { 11 int article; 12 int line; 13 bool operator<(const newpair b) const 14 { 15 return this->line < b.line; 16 } 17 }; 18 typedef map<string,set<newpair> > BIGMAP; 19 typedef set<newpair>::iterator SET_pair_ITER; 20 typedef map<string,set<newpair> >::iterator BIGMAP_iter; 21 22 BIGMAP maper; 23 string psd[1600]; 24 int maxline; 25 26 int checkmaper() 27 { 28 BIGMAP_iter it; 29 for(it=maper.begin();it!=maper.end();++it) 30 { 31 cout<<(it->first);//string-type 32 set<newpair> cyc; 33 cyc=it->second;//set<newpair>-type 34 for(SET_pair_ITER iter=cyc.begin();iter!=cyc.end();++iter) 35 { 36 newpair ctn=*iter; 37 cout<<" article "<<ctn.article<<" line "<<ctn.line<<endl; 38 } 39 } 40 return 0; 41 } 42 43 void buildmaper(string aim,int articlenum,int linenum) 44 { 45 newpair m; 46 m.article=articlenum; 47 m.line=linenum; 48 maper[aim].insert(m); 49 } 50 51 int readin() 52 { 53 int n; 54 char c;//input the 55 cin>>n>>c; 56 int cur=0; 57 for(int i=0;i<n;cur++) 58 { 59 getline(cin,psd[cur]); 60 if((int)psd[cur].find("***")!=-1){i++;continue;}//the next article 61 for(string::iterator it=psd[cur].begin();it!=psd[cur].end();++it) 62 { 63 if(isalpha(*it)) *it=tolower(*it); 64 else *it=‘ ‘; 65 } 66 stringstream ss(psd[cur]); 67 string chr; 68 while(ss>>chr) buildmaper(chr,i,cur); 69 } 70 return cur; 71 } 72 73 int main() 74 { 75 freopen("input.txt","r",stdin); 76 freopen("ans.txt","w",stdout); 77 maxline=readin(); 78 checkmaper(); 79 return 0; 80 }
以上代码涉及了较多C++知识与个别底层知识,下面进行列举:
1、stringstream常用操作
2、基本STL之map与set
3、结构体中的运算符重载
4、迭代器的操作
5、RB树实现map与set的基本原理
有关详细的实现方法请参照我的其它博客和上述代码。
在上述代码中唯一一个容易出现bug的位置是set的实现:由于set对输入的元素需要进行排序,所以必须在newpair结构体中重载<(operator)。
下面是运行图片:
输入如下:
4 one repeat repeat repeat A manufacturer, importer, or seller of digital media devices may not (1) sell, or offer for sale, in interstate commerce, or (2) cause to be transported in, or in a manner affecting, interstate commerce, a digital media device unless the device includes and utilizes standard security technologies that adhere to the security system standards. ********** one two repeat repeat repeat repeat Of course, Lisa did not necessarily intend to read his books. She might want the computer only to write her midterm. But Dan knew she came from a middle-class family and could hardly afford the tuition, let alone her reading fees. Books might be the only way she could graduate ********** one two three repeat repeat repeat repeat repeat Research in analysis (i.e., the evaluation of the strengths and weaknesses of computer system) is essential to the development of effective security, both for works protected by copyright law and for information in general. Such research can progress only through the open publication and exchange of complete scientific results ********** one two three four repeat repeat repeat repeat repeat repeat I am very very very happy! What about you? **********
输出如下:
a article 0 line 1 article 0 line 4 article 0 line 6 article 1 line 16 about article 3 line 34 adhere article 0 line 8 affecting article 0 line 5 afford article 1 line 17 alone article 1 line 17 am article 3 line 33 analysis article 2 line 22 and article 0 line 7 article 1 line 16 article 2 line 23 article 2 line 27 article 2 line 29 be article 0 line 4 article 1 line 18 books article 1 line 13 article 1 line 18 both article 2 line 25 but article 1 line 15 by article 2 line 26 came article 1 line 15 can article 2 line 28 cause article 0 line 4 class article 1 line 16 commerce article 0 line 3 article 0 line 5 complete article 2 line 30 computer article 1 line 14 article 2 line 24 copyright article 2 line 26 could article 1 line 16 article 1 line 19 course article 1 line 12 dan article 1 line 15 development article 2 line 25 device article 0 line 6 devices article 0 line 2 did article 1 line 12 digital article 0 line 2 article 0 line 6 e article 2 line 22 effective article 2 line 25 essential article 2 line 24 evaluation article 2 line 22 exchange article 2 line 29 family article 1 line 16 fees article 1 line 18 for article 0 line 3 article 2 line 26 article 2 line 27 four article 3 line 32 from article 1 line 15 general article 2 line 27 graduate article 1 line 19 happy article 3 line 33 hardly article 1 line 16 her article 1 line 14 article 1 line 17
其余略。。。。。。。。。。
OK
以上是关于基于STL的字典生成模块-模拟搜索引擎算法的尝试的主要内容,如果未能解决你的问题,请参考以下文章
Python蒙特卡洛模拟 | LCG 算法 | 马特赛特旋转算法 | Random 模块