PostgreSQL数据库支持中文拼音和笔画排序

Posted 2023-05-06 chuangsi

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了PostgreSQL数据库支持中文拼音和笔画排序相关的知识，希望对你有一定的参考价值。

PostgreSQL数据库支持中文拼音和笔画排序

1.前言

默认安装，PG是不支持中文拼音和笔画排序的。

看看示例：

1postgres=# create table t(id int, col2 varchar(32));
2CREATE TABLE
3postgres=# insert into t values(1, \'东城\'), (2, \'西城\'), (3, \'石景山\'), (4, \'海淀\'), (5, \'朝阳\');
4INSERT 0 5
5postgres=# select * from t order by col2;
6 id |  col2
7----+--------
8  1 | 东城
9  5 | 朝阳
10  4 | 海淀
11  3 | 石景山
12  2 | 西城
13(5 rows)

这是collate=C的排序结果。

2.实现与实践

支持中文排序，需要配置依赖 --with-icu.这就需要提交安装依赖包：libicu-devel libicu

1）、编译安装：

1wget https://ftp.postgresql.org/pub/source/v15.2/postgresql-15.2.tar.gz
2
3sudo su -c "yum install libicu-devel libicu libxml2-devel libxslt-devel"
4
5./configure --prefix=/usr/pgsql-15-icu --with-icu --with-libxml --with-libxslt --with-openssl
6
7make -j 4 world-bin
8sudo su -c "make install-world-bin"

2)、环境变量配置

env15.sh

1export PGROOT=/usr/pgsql-15-icu
2export PGHOME=/var/lib/pgsql/15
3export PGPORT=5432
4export PGDATA=$PGHOME/data
5export PATH=$PGROOT/bin:$PATH
6export LD_LIBRARY_PATH=$PGROOT/lib:$LD_LIBRARY_PATH

1source env15.sh

3)、初始化db如下：

1initdb -D $PGDATA -U postgres -E UTF8 --lc-collate=C --lc-ctype=en_US.UTF8 --locale-provider=icu --icu-locale=C

4)、启动db进行验证

建表及数据：

重新排序：

按拼音：(collate "zh-x-icu")

1postgres=# select * from t order by col2 collate "zh-x-icu";
2 id |  col2
3----+--------
4  5 | 朝阳
5  1 | 东城
6  4 | 海淀
7  3 | 石景山
8  2 | 西城
9(5 rows)

按拼音：collate "zh-Hans-x-icu"

1postgres=# select * from t order by col2 collate "zh-Hans-x-icu";
2 id |  col2
3----+--------
4  5 | 朝阳
5  1 | 东城
6  4 | 海淀
7  3 | 石景山
8  2 | 西城
9(5 rows)

按笔画：collate "zh-Hant-x-icu";

1postgres=# select * from t order by col2 collate "zh-Hant-x-icu";
2 id |  col2
3----+--------
4  1 | 东城
5  3 | 石景山
6  2 | 西城
7  4 | 海淀
8  5 | 朝阳
9(5 rows)

提示：

zh, 按拼音排序
zh-Hant, 繁体, 按存储文字的笔画数排序
zh-Hans, 简体, 按拼音排序

注意上边这一块结果就好：

参考：

[1] https://github.com/digoal/blog/

[2] https://www.postgresql.org/docs/current/collation.html:
https://www.postgresql.org/docs/current/collation.html

PostgreSQL 索引

索引是加速搜索引擎检索数据的一种特殊表查询。简单地说，索引是一个指向表中数据的指针。一个数据库中的索引与一本书的索引目录是非常相似的。
拿汉语字典的目录页（索引）打比方，我们可以按拼音、笔画、偏旁部首等排序的目录（索引）快速查找到需要的字。
索引有助于加快 SELECT 查询和 WHERE 子句，但它会减慢使用 UPDATE 和 INSERT 语句时的数据输入。索引可以创建或删除，但不会影响数据。
使用 CREATE INDEX 语句创建索引，它允许命名索引，指定表及要索引的一列或多列，并指示索引是升序排列还是降序排列。
索引也可以是唯一的，与 UNIQUE 约束类似，在列上或列组合上防止重复条目。

CREATE INDEX 命令

CREATE INDEX （创建索引）的语法如下：

CREATE INDEX index_name ON table_name;

索引类型

单列索引
单列索引是一个只基于表的一个列上创建的索引，基本语法如下：

CREATE INDEX index_name
ON table_name (column_name);

组合索引
组合索引是基于表的多列上创建的索引，基本语法如下：

CREATE INDEX index_name
ON table_name (column1_name, column2_name);

不管是单列索引还是组合索引，该索引必须是在 WHEHE 子句的过滤条件中使用非常频繁的列。
如果只有一列被使用到，就选择单列索引，如果有多列就使用组合索引。
唯一索引
使用唯一索引不仅是为了性能，同时也为了数据的完整性。唯一索引不允许任何重复的值插入到表中。基本语法如下：

CREATE UNIQUE INDEX index_name
on table_name (column_name);

局部索引
局部索引是在表的子集上构建的索引；子集由一个条件表达式上定义。索引只包含满足条件的行。基础语法如下：

CREATE INDEX index_name
on table_name (conditional_expression);

隐式索引
隐式索引是在创建对象时，由数据库服务器自动创建的索引。索引自动创建为主键约束和唯一约束。

实例

下面实例将在 COMPANY 表的 SALARY 列上创建索引：

highgo=# CREATE INDEX salary_index ON COMPANY (salary);

现在，用 \\d company 命令列出 COMPANY 表的所有索引：

highgo=# \\d company

得到的结果如下，company_pkey 是隐式索引，是表创建表时创建的：

highgo=# \\d company
                  Table "public.company"
 Column  |     Type      | Collation | Nullable | Default ---------+---------------+-----------+----------+---------
 id      | integer       |           | not null | 
 name    | text          |           | not null | 
 age     | integer       |           | not null | 
 address | character(50) |           |          | 
 salary  | real          |           |          | Indexes:
    "company_pkey" PRIMARY KEY, btree (id)
    "salary_index" btree (salary)

你可以使用 \\di 命令列出数据库中所有索引：

highgo=# \\di
                    List of relations
 Schema |      Name       | Type  |  Owner   |   Table    --------+-----------------+-------+----------+------------
 public | company_pkey    | index | postgres | company
 public | department_pkey | index | postgres | department
 public | salary_index    | index | postgres | company(3 rows)

DROP INDEX （删除索引）

一个索引可以使用 PostgreSQL 的 DROP 命令删除。

DROP INDEX index_name;

您可以使用下面的语句来删除之前创建的索引：

highgo= DROP INDEX salary_index;

删除后，可以看到 salary_index 已经在索引的列表中被删除：

highgo=# \\di
                    List of relations
 Schema |      Name       | Type  |  Owner   |   Table    --------+-----------------+-------+----------+------------
 public | company_pkey    | index | postgres | company
 public | department_pkey | index | postgres | department(2 rows)

什么情况下要避免使用索引？
虽然索引的目的在于提高数据库的性能，但这里有几个情况需要避免使用索引。
使用索引时，需要考虑下列准则：

索引不应该使用在较小的表上。
索引不应该使用在有频繁的大批量的更新或插入操作的表上。
索引不应该使用在含有大量的 NULL 值的列上。
索引不应该使用在频繁操作的列上。

以上是关于PostgreSQL数据库支持中文拼音和笔画排序的主要内容，如果未能解决你的问题，请参考以下文章

PostgreSQL 索引

12.PostgreSQL索引

Oracle 中文排序 NLSSORT NLS_SORT

PostgreSQL对汉字按拼音排序

我的Android进阶之旅如何压缩Json格式数据，减少Json数据的体积？