基于STL的字典生成模块-模拟搜索引擎算法的尝试

Posted savennist

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了基于STL的字典生成模块-模拟搜索引擎算法的尝试相关的知识,希望对你有一定的参考价值。

该课题来源于UVA中Searching the Web的题目:https://vjudge.net/problem/UVA-1597

按照题目的说法,我对按照特定格式输入的文章中的词语合成字典,以满足后期的快速查找。

针对于字典的合成途径,我利用了STL中的map与set的嵌套形成了一种特定的数据结构来解析文章中的单词

 1 #include<map>
 2 #include<iostream>
 3 #include<set>
 4 #include<algorithm>
 5 #include<string>
 6 #include<cctype>
 7 #include<sstream>
 8 using namespace std;
 9 struct newpair
10 {
11     int article;
12     int line;
13     bool operator<(const newpair b) const
14     {
15         return this->line < b.line;
16     }
17 };
18 typedef map<string,set<newpair> > BIGMAP;
19 typedef set<newpair>::iterator SET_pair_ITER;
20 typedef map<string,set<newpair> >::iterator BIGMAP_iter;
21 
22 BIGMAP maper;
23 string psd[1600];
24 int maxline;
25 
26 int checkmaper()
27 {
28     BIGMAP_iter it;
29     for(it=maper.begin();it!=maper.end();++it)
30     {
31         cout<<(it->first);//string-type
32         set<newpair> cyc;
33         cyc=it->second;//set<newpair>-type
34         for(SET_pair_ITER iter=cyc.begin();iter!=cyc.end();++iter)
35         {
36             newpair ctn=*iter;
37             cout<<"  article "<<ctn.article<<" line "<<ctn.line<<endl;
38         }
39     }
40     return 0;
41 }
42 
43 void buildmaper(string aim,int articlenum,int linenum)
44 {
45     newpair m;
46     m.article=articlenum;
47     m.line=linenum;
48     maper[aim].insert(m);
49 }
50 
51 int readin()
52 {
53     int n;
54     char c;//input the 

55     cin>>n>>c;
56     int cur=0;
57     for(int i=0;i<n;cur++)
58     {
59         getline(cin,psd[cur]);
60         if((int)psd[cur].find("***")!=-1){i++;continue;}//the next article
61         for(string::iterator it=psd[cur].begin();it!=psd[cur].end();++it)
62         {
63             if(isalpha(*it)) *it=tolower(*it);
64             else *it= ;
65         }
66         stringstream ss(psd[cur]);
67         string chr;
68         while(ss>>chr) buildmaper(chr,i,cur);
69     }
70     return cur;
71 }
72 
73 int main()
74 {
75     freopen("input.txt","r",stdin);
76     freopen("ans.txt","w",stdout);
77     maxline=readin();
78     checkmaper();
79     return 0;
80 }

以上代码涉及了较多C++知识与个别底层知识,下面进行列举:

1、stringstream常用操作

2、基本STL之map与set

3、结构体中的运算符重载

4、迭代器的操作

5、RB树实现map与set的基本原理

有关详细的实现方法请参照我的其它博客和上述代码。

在上述代码中唯一一个容易出现bug的位置是set的实现:由于set对输入的元素需要进行排序,所以必须在newpair结构体中重载<(operator)。

下面是运行图片:

输入如下:

4
one   repeat  repeat  repeat
A manufacturer, importer, or seller of
digital media devices may not (1) sell,
or offer for sale, in interstate commerce,
or (2) cause to be transported in, or in a
manner affecting, interstate commerce,
a digital media device unless the device
includes and utilizes standard security
technologies that adhere to the security
system standards.
**********
one two   repeat  repeat  repeat   repeat
Of course, Lisa did not necessarily
intend to read his books. She might
want the computer only to write her
midterm. But Dan knew she came from
a middle-class family and could hardly
afford the tuition, let alone her reading
fees. Books might be the only way she
could graduate
**********
one two three   repeat   repeat  repeat  repeat   repeat
Research in analysis (i.e., the evaluation
of the strengths and weaknesses of
computer system) is essential to the
development of effective security, both
for works protected by copyright law
and for information in general. Such
research can progress only through the
open publication and exchange of
complete scientific results
**********
one two three   four   repeat  repeat   repeat  repeat  repeat   repeat
I am very very very happy!
What about you?
**********

输出如下:

a  article 0 line 1
  article 0 line 4
  article 0 line 6
  article 1 line 16
about  article 3 line 34
adhere  article 0 line 8
affecting  article 0 line 5
afford  article 1 line 17
alone  article 1 line 17
am  article 3 line 33
analysis  article 2 line 22
and  article 0 line 7
  article 1 line 16
  article 2 line 23
  article 2 line 27
  article 2 line 29
be  article 0 line 4
  article 1 line 18
books  article 1 line 13
  article 1 line 18
both  article 2 line 25
but  article 1 line 15
by  article 2 line 26
came  article 1 line 15
can  article 2 line 28
cause  article 0 line 4
class  article 1 line 16
commerce  article 0 line 3
  article 0 line 5
complete  article 2 line 30
computer  article 1 line 14
  article 2 line 24
copyright  article 2 line 26
could  article 1 line 16
  article 1 line 19
course  article 1 line 12
dan  article 1 line 15
development  article 2 line 25
device  article 0 line 6
devices  article 0 line 2
did  article 1 line 12
digital  article 0 line 2
  article 0 line 6
e  article 2 line 22
effective  article 2 line 25
essential  article 2 line 24
evaluation  article 2 line 22
exchange  article 2 line 29
family  article 1 line 16
fees  article 1 line 18
for  article 0 line 3
  article 2 line 26
  article 2 line 27
four  article 3 line 32
from  article 1 line 15
general  article 2 line 27
graduate  article 1 line 19
happy  article 3 line 33
hardly  article 1 line 16
her  article 1 line 14
  article 1 line 17

其余略。。。。。。。。。。

OK

以上是关于基于STL的字典生成模块-模拟搜索引擎算法的尝试的主要内容,如果未能解决你的问题,请参考以下文章

Python蒙特卡洛模拟 | LCG 算法 | 马特赛特旋转算法 | Random 模块

基于java的迷宫游戏设计

目标搜索基于matlab运动编码粒子群算法优化 (MPSO) 无人机搜索丢失目标含Matlab源码 2254期

数学建模系列:遗传算法

垃圾回收算法分代回收

使用 STL 算法查找集合中的所有匹配项