ES IK拼音插件踩坑及填坑记录

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了ES IK拼音插件踩坑及填坑记录相关的知识,希望对你有一定的参考价值。

参考技术A 最近在给es插入文档的时候,忽然报错,提示如下:

ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards startOffset=0,endOffset=3,lastStartOffset=3 for field 'title.pinyin']]

感觉是在用ik拼音分词的时候报的错。

并不是所有的文档在插入的时候会报错,只有在title字段包含特殊符号的时候会报错,上网查了一下原因,说是ik拼音插件的错误,只需要升级一下插件的版本即可。

然而因为用的是阿里云的es服务,不能对插件进行升级,所以这个方法行不通。

也有说是因为,将参数“ignore_pinyin_offset”设置为false后,并向pinyin分词字段批量写入数据,即会出现“startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards”异常。所以将ignore_pinyin_offset设为true即可。

设置前的配置:

设置后的配置:

改完之后重新索引一遍文件,结果还是报这个错。

就这么折腾了两天之后,发现既然这是这个版本的本身的bug,然后阿里云那边又不给升级版本,那为什么不曲线救国呢?

前面一开始排查的时候发现,是标题含有特殊符号的时候,才会出现这个bug,那么可以在拼音分词之前,再加一个filter,把特殊符号过滤掉,自然就没问题了。

于是我先配置了一个filter:

然后再把这个filter配置到analyzer里:

这样,在用ik_pinyin_analyzer分词之前,会先通过specialCharactersFilter的正则表达式过滤,把所有的特殊符号都过滤掉,然后再用my_pinyin进行拼音分词。

这么处理后,就不会出现上面出现的错误了。

注:上述配置都是个性化配置,得根据自己的习惯和要求来配置,不用照搬,只要是加个新建一个filter然后加到pinyin前即可。

NHiberante从.net framework转移到.net standard(.net core 2.2)时遇到的坑及填坑

在.net framework中的创建session代码先贴一个

 

 1     public class SessionBuilder
 2     
 3         private static ISessionFactory _sessionFactory = null;
 4 
 5         public SessionBuilder()
 6         
 7             if (_sessionFactory == null)
 8             
 9                 //创建ISessionFactory
10                 _sessionFactory = GetSessionFactory();
11             
12         
13 
14         /// <summary>
15         /// 创建ISessionFactory
16         /// </summary>
17         /// <returns></returns>
18         public static ISessionFactory GetSessionFactory()
19         
20             //HibernatingRhinos.Profiler.Appender.NHibernate.NHibernateProfiler.Initialize();
21 
22             var mappers = new ModelMapper();
23             mappers.AddMappings(Assembly.GetExecutingAssembly().GetExportedTypes());
24 
25             var cfg = new Configuration().Configure();
26             cfg.AddDeserializedMapping(mappers.CompileMappingForAllExplicitlyAddedEntities(), "");
27 
28             return cfg.BuildSessionFactory();
29         
30 
31         /// <summary>
32         /// 打开ISession
33         /// </summary>
34         /// <returns></returns>
35         public static ISession GetSession()
36         
37             if (_sessionFactory == null || _sessionFactory.IsClosed)
38             
39                 //创建ISessionFactory
40                 _sessionFactory = GetSessionFactory();
41             
42         
43 
44 
45         #region 打开一个新的Session
46         public static ISession OpenSession()
47         
48             return _sessionFactory.OpenSession();
49
50         
51         #endregion
52 
53     

与数据库的交互时,需要先在web.config配置(数据库为sql server)

<configuration>
  <configSections>
    <section name="hibernate-configuration" type="NHibernate.Cfg.ConfigurationSectionHandler, NHibernate" />
  </configSections>
  <!--NHibernate配置开始-->
  <hibernate-configuration xmlns="urn:nhibernate-configuration-2.2">
    <session-factory>
      <property name="connection.provider">NHibernate.Connection.DriverConnectionProvider</property>
      <property name="connection.driver_class">NHibernate.Driver.SqlClientDriver</property>
      <property name="dialect">NHibernate.Dialect.MsSql2012Dialect</property>
      <property name="show_sql">false</property>
      <property name="connection.connection_string_name">ylsdai</property>
      <property name="adonet.batch_size">30</property>
      <property name="generate_statistics">false</property>
      <property name="format_sql">true</property>
      <property name="command_timeout">60</property>
      <property name="current_session_context_class">web</property>
      <!--<property name="cache.provider_class">NHibernate.Caches.SysCache2.SysCacheProvider,NHibernate.Caches.SysCache2</property>
      <property name="cache.default_expiration">120</property>
      <property name="cache.use_second_level_cache">true</property>
      <property name="cache.use_query_cache">true</property>-->
    </session-factory>
  </hibernate-configuration>
  <!--NHibernate配置结束-->
  <connectionStrings>
    <!--test_db-->
    <!--<add name="ylsdai" connectionString="data source=0.0.0.1,111;database=test_db;uid=sa;pwd=123456" providerName="System.Data.SqlClient" />
  </connectionStrings>
</configuration>

映射类

using NHibernate.Mapping.ByCode;
using NHibernate.Mapping.ByCode.Conformist;

namespace ClassMapping

    #region CityMap
    public class CityMap : ClassMapping<City>
    
        public CityMap()
        
            
            SelectBeforeUpdate(true);
            DynamicUpdate(true);
            //Cache(p => p.Usage(CacheUsage.ReadWrite));
            Id(p => p.CityId, map => map.Generator(Generators.Native));
            Property(p => p.OldCityId);
            Property(p => p.ParentId);
            Property(p => p.CityName);
            Property(p => p.EnCityName);
            Property(p => p.CityImgUrl);
            Property(p => p.LatLng);
            Property(p => p.Keywords);
            Property(p => p.IsRecommend);
            Property(p => p.IsDepart);
            Property(p => p.AreaId);
            Property(p => p.CityContent);
        
    
    #endregion

 

将实体写好,就可以进行实现了

 

但是在迁移到.net core的时候遇到的问题:

1. 创建Session,使用.net framework的方法将不可用

2.config中对于NHiberante的配置也读取不到

3.基于问题1,映射类也无法进行实现

好在是在.net core中有一个辅助的开源框架Fluent NHibernate,它可以帮我解决上面遇到的问题,但是在具体使用时也踩了不少坑

1.网上的文档基本都是映射类在xml中的,但是当实际项目在.cs文件中时,大量的文件映射从.cs文件迁到到.xml文件将变得特别繁琐

最后的解决办法,创建session方法中GetSessionFactory方法做以下修改

        /// <summary>
        /// 创建ISessionFactory
        /// </summary>
        /// <returns></returns>
        public static ISessionFactory GetSessionFactory()
        
            var assemblyName = Assembly.Load("ClassMapping");

            NHibernate.Cfg.Configuration setCfg(NHibernate.Cfg.Configuration c)
            
                c.Properties.Add("show_sql", "true");
                c.Properties.Add("adonet.batch_size", "1000");
                c.Properties.Add("generate_statistics", "false");
                c.Properties.Add("format_sql", "true");
                c.Properties.Add("command_timeout", "60");
                c.Properties.Add("current_session_context_class", "web");
                return c;
            

            return Fluently.Configure().Database(
FluentNHibernate.Cfg.Db.MsSqlConfiguration.MsSql2012.ConnectionString("Server=0.0.0.1,111;Database=test_db;Uid=sa;Pwd=123456;"))
                        .Mappings(m => m.FluentMappings.AddFromAssembly(assemblyName))
                        .ExposeConfiguration(c => new SchemaUpdate(c).Execute(true, false))
                        .ExposeConfiguration(c => setCfg(c))
                        //.ExposeConfiguration(f => f.SetInterceptor(new SqlStatementInterceptor()))
                        .BuildSessionFactory();

        

其中映射类的引入在

.Mappings(m => m.FluentMappings.AddFromAssembly(assemblyName))

这句话,是将命名空间引入,所以具体映射类可以重新新建一个项目,名字就叫做ClassMapping,具体的映射类做以下修改

 1 using Entity;
 2 using FluentNHibernate.Mapping;
 3 
 4 namespace ClassMapping
 5 
 6     #region CityMap
 7     public class CityMap : ClassMap<City>
 8     
 9         public CityMap()
10         
11             Table("City");
12 
13             //SelectBeforeUpdate(true);
14             //DynamicUpdate(true);
15             //Cache(p => p.Usage(CacheUsage.ReadWrite));
16             Id(p => p.CityId);
17             Map(p => p.OldCityId);
18             Map(p => p.ParentId);
19             Map(p => p.CityName);
20             Map(p => p.EnCityName);
21             Map(p => p.CityImgUrl);
22             Map(p => p.LatLng);
23             Map(p => p.Keywords);
24             Map(p => p.IsRecommend);
25             Map(p => p.IsDepart);
26             Map(p => p.AreaId);
27             Map(p => p.CityContent);
28         
29     
30     #endregion
31 
32 

 

在迁移过程中会碰到映射关系的迁移

以前的版本中为

            OneToOne(p => p.City, map =>
            
                map.Cascade(Cascade.All);
                map.Lazy(LazyRelation.Proxy);
            );//一对一

            ManyToOne(p => p.Province, map => map.Column("ProvinceId"));//多对一

            Bag(p => p.Counties, map =>
            
                map.Key(k => k.Column("CountyId"));
                map.OrderBy(k => k.SortId);
            , rel => rel.OneToMany());//一对多

.net core版本中对应为

HasOne(p => p.City).Cascade.All().LazyLoad();//一对一

//References<Province>(r => r.Province).Column("ProvinceId").ForeignKey("ProvinceId").Cascade.None();//多对一,在实际运用中会出现问题

HasMany(p => p.Counties).KeyColumn("CountyId").OrderBy("CountyId").LazyLoad();//一对多

 

city实体

 1     [Serializable]
 2     public class City : BaseEntity
 3     
 4          /// <summary>
 5          /// CityId
 6          /// </summary>
 7         public virtual int CityId  get; set; 
 8 
 9         /// <summary>
10         /// Version
11         /// </summary>
12         public virtual int Version  get; set; 
13 
14         /// <summary>
15         /// OldCityId
16         /// </summary>
17         public virtual int OldCityId  get; set;  
18 
19          /// <summary>
20          /// ParentId
21          /// </summary>
22         public virtual int ParentId  get; set;  
23 
24          /// <summary>
25          /// 城市名称
26          /// </summary>
27         public virtual string CityName  get; set;  
28 
29          /// <summary>
30          /// 城市英文名称
31          /// </summary>
32         public virtual string EnCityName  get; set;  
33 
34          /// <summary>
35          /// CityImgUrl
36          /// </summary>
37         public virtual string CityImgUrl  get; set;  
38 
39          /// <summary>
40          /// 经纬度
41          /// </summary>
42         public virtual string LatLng  get; set;  
43 
44          /// <summary>
45          /// 关键字
46          /// </summary>
47         public virtual string Keywords  get; set; 
48 
49         /// <summary>
50         /// 是否推荐
51         /// </summary>
52         public virtual bool IsRecommend  get; set; 
53 
54         /// <summary>
55         /// IsDepart
56         /// </summary>
57         public virtual bool IsDepart  get; set;  
58 
59          /// <summary>
60          /// 航区
61          /// </summary>
62         public virtual int AreaId  get; set;  
63 
64          /// <summary>
65          /// 城市介绍
66          /// </summary>
67         public virtual string CityContent  get; set;  
68 
69     

 

下面是操作session,进行数据库调用的方法

  1     /// <summary>
  2     /// Hibernate操作Helper 
  3     /// </summary>
  4     /// <typeparam name="T"></typeparam>
  5     public class DbHelper<T> where T : BaseEntity
  6     
  7         private readonly ISession _session = SessionBuilder.GetSession();
  8         //protected static readonly ILog Log = LogManager.GetLogger(MethodBase.GetCurrentMethod().DeclaringType);
  9 
 10         public DbHelper()  
 11 
 12         public DbHelper(ISession session)
 13         
 14             _session = session;
 15         
 16 
 17         #region 获取一个实体
 18         /// <summary>
 19         /// 获取一个实体(一级缓存)
 20         /// </summary>
 21         /// <param name="id"></param>
 22         /// <returns></returns>
 23         public T Load(int id)
 24         
 25             return _session.Load<T>(id);
 26         
 27 
 28         #endregion
 29 
 30         #region 获取一个实体(缓存)
 31         /// <summary>
 32         /// 获取一个实体(二级缓存)
 33         /// </summary>
 34         /// <param name="id"></param>
 35         /// <returns></returns>
 36         public T Get(int id)
 37         
 38             return _session.Get<T>(id);
 39         
 40         #endregion
 41 
 42         #region 销毁一个实体
 43         /// <summary>
 44         /// 销毁一个实体
 45         /// </summary>
 46         /// <param name="obj"></param>
 47         /// <returns></returns>
 48         public void Evict(object obj)
 49         
 50             _session.Evict(obj);
 51         
 52         #endregion
 53 
 54         /// <summary>
 55         /// 根据SQL语句获取
 56         /// </summary>
 57         /// <param name="sql"></param>
 58         public ISQLQuery CreateSqlQuery(string sql)
 59         
 60             return _session.CreateSQLQuery(sql);
 61         
 62 
 63         /// <summary>
 64         /// 获取集合
 65         /// </summary>
 66         /// <param name="hql"></param>
 67         /// <returns></returns>
 68         public ICriteria GetCriteria(string hql)
 69         
 70             return _session.CreateCriteria(hql);
 71         
 72 
 73         #region 获取全部数据
 74         /// <summary>
 75         /// 获取全部数据
 76         /// </summary>
 77         /// <param name="cacheable">是否缓存</param>
 78         /// <returns></returns>
 79         public IEnumerable<T> GetAll(bool cacheable)
 80         
 81             var ic = _session.CreateCriteria(typeof(T));
 82             IEnumerable<T> list = ic.SetCacheable(cacheable).List<T>() ?? new List<T>();
 83             return list;
 84         
 85         #endregion
 86 
 87         #region 插入或者更新数据
 88         /// <summary>
 89         /// 插入数据
 90         /// </summary>
 91         /// <param name="entity"></param>
 92         /// <returns></returns>
 93         public int Save(T entity)
 94         
 95             var id = 0;
 96             var session = this._session;
 97             ITransaction tan = session.BeginTransaction();
 98 
 99             try
100             
101                 entity = session.Merge(entity);
102             
103             catch (Exception e)
104             
105                 //Log.DebugFormat($"Save124,Exception:e.Message,entity:JsonConvert.SerializeObject(entity)");
106             
107 
108             try
109             
110                 tan.Begin();
111                 id = (int)session.Save(entity);
112                 session.Flush();
113                 tan.Commit();
114             
115             catch (Exception e)
116             
117                 //Log.DebugFormat($"Save136,Exception:e.Message,entity:JsonConvert.SerializeObject(entity)");
118                 tan.Rollback();
119                 throw;
120             
121 
122             return id;
123         
124         #endregion
125 
126         #region 更新数据
127         /// <summary>
128         /// 更新数据
129         /// </summary>
130         /// <param name="entity"></param>
131         /// <returns></returns>
132         public int Update(T entity)
133         
134             int result = 0;
135             ITransaction tan = _session.BeginTransaction();
136             var session = this._session;
137             //entity = (T)_session.Merge(entity);
138             try
139             
140                 entity = session.Merge(entity);
141             
142             catch (Exception e)
143             
144                 //Log.DebugFormat($"Update163,Exception:e.Message,entity:JsonConvert.SerializeObject(entity)");
145             
146             try
147             
148                 tan.Begin();
149                 _session.Update(entity);
150                 _session.Flush();
151                 tan.Commit();
152                 result++;
153             
154             catch (Exception e)
155             
156                 //Log.DebugFormat($"Update175,Exception:e.Message,entity:JsonConvert.SerializeObject(entity)");
157                 tan.Rollback();
158                 throw;
159             
160             return result;
161         
162         #endregion
163 
164         #region 删除一条数据
165         /// <summary>
166         /// 删除一条数据
167         /// </summary>
168         /// <param name="id"></param>
169         /// <returns></returns>
170         public int DeleteModelById(int id)
171         
172             int result = 0;
173             object item = Get(id);
174             //ITransaction tan = _session.BeginTransaction();
175             try
176             
177                 //tan.Begin();
178                 _session.Delete(item);
179                 _session.Flush();
180                 //tan.Commit();
181                 result++;
182             
183             catch (Exception)
184             
185                 //tan.Rollback();
186                 throw;
187             
188 
189             return result;
190         
191         #endregion
192 
193         #region 删除一个实体对象
194         /// <summary>
195         /// 删除一个实体对象
196         /// </summary>
197         /// <param name="entity"></param>
198         /// <returns></returns>
199         public int DeleteModel(BaseEntity entity)
200         
201             var result = 0;
202             //ITransaction tan = _session.BeginTransaction();
203             try
204             
205                 //tan.Begin();
206                 _session.Delete(entity);
207                 _session.Flush();
208                 //tan.Commit();
209                 result++;
210             
211             catch (Exception)
212             
213                 //tan.Rollback();
214                 throw;
215             
216             return result;
217         
218         #endregion
219 
220         /// <summary>
221         /// 根据SQL语句删除
222         /// </summary>
223         /// <param name="sql"></param>
224         public void DeleteList(string sql)
225         
226             _session.CreateSQLQuery(sql).UniqueResult();
227         
228 
229         /// <summary>
230         /// 删除泛型集合
231         /// </summary>
232         /// <param name="models"></param>
233         public void DeleteList(IList<T> models)
234         
235             foreach (var model in models)
236             
237                 DeleteModel(model);
238             
239         
240 
241         public bool IsExist(Expression<Func<T, bool>> keyWhere)
242         
243             var ss = _session.QueryOver<T>().Where(keyWhere);
244             return ss.RowCount() > 0;
245         
246 
247         #region GetQuery
248         /// <summary>
249         /// GetQuery
250         /// </summary>
251         /// <returns></returns>
252         public IQueryable<T> GetQuery()
253         
254             try
255             
256                 return _session.Query<T>();
257             
258             catch (Exception e)
259             
260                 var session = SessionBuilder.GetSession();
261                 return session.Query<T>();
262             
263         
264         #endregion
265 
266         #region GetQueryOver
267         /// <summary>
268         /// GetQueryOver
269         /// </summary>
270         /// <returns></returns>
271         public IQueryOver<T, T> GetQueryOver(Expression<Func<T, bool>> keyWhere)
272         
273             return _session.QueryOver<T>().Where(keyWhere);
274         
275 
276         #endregion
277 
278         #region GetQueryOver
279         /// <summary>
280         /// GetQueryOver
281         /// </summary>
282         /// <returns></returns>
283         public IQueryOver<T, T> GetQueryOver()
284         
285             return _session.QueryOver<T>();
286         
287         #endregion
288 
289         #region 获取集合ByHql
290         /// <summary>
291         /// 获取集合ByHql
292         /// </summary>
293         /// <param name="strHql"></param>
294         /// <returns></returns>
295         public IQuery GetQueryByHql(string strHql)
296         
297             return _session.CreateQuery(strHql);
298         
299         #endregion
300 
301         #region 获取集合BySql
302         /// <summary>
303         /// 获取集合BySql
304         /// </summary>
305         /// <param name="strSql"></param>
306         /// <returns></returns>
307         public IQuery GetQueryBySql(string strSql)
308         
309             return _session.CreateSQLQuery(strSql);
310         
311         #endregion
312     

 

以上是关于ES IK拼音插件踩坑及填坑记录的主要内容,如果未能解决你的问题,请参考以下文章

Ubuntu18.04安装cuda 11.3和TensorRT 8教程(碰到的坑及填坑方法,以及python和c++的TensorRT环境搭建)

Scrapy爬虫踩坑记录

Vite2 + React + Antd 踩坑指南

gitee开源程序kkFileView踩坑及解决方案

gitee开源程序kkFileView踩坑及解决方案

gitee开源程序kkFileView踩坑及解决方案