Mysql DBA 高级运维学习笔记-Mysql数据库字符集知识

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Mysql DBA 高级运维学习笔记-Mysql数据库字符集知识相关的知识,希望对你有一定的参考价值。

1.2 mysql数据库字符集知识

1.2.1 MySQL数据库字符集介绍

简单的说,一套文字符号及其编码、比较规则的集合。

MySQL数据库字符集包括字符集(CHARACTER)和校对规则(COLLATION)两个概念。其中,字符集是用来定义MySQL数据字符串的存储方式,而校对规则则是定义比较字符串的方式。前面建库的语句中,CHARACTER SET latin1即为数据库字符集而COLLATE latin1_wedish_ci 为校对字符集,有关字符集详细内容参考mysql手册,第10张字符集章节。

1.2.2 MySQL数据库常见字符集介绍

使用MySQL时常用的字符集有

技术分享图片

1.2.3 MySQL如何选择合适的字符集

a.如果处理各种各样的文字,发布到不同国家和地区,应选Unicode字符集。对mysql来说就是UTF-8(每个汉字三个字节),如果应用需处理英文,有少量汉字UTF-8更好。

b.如果只需支持中文,并且数据量很大,性能要求也很高,可选GBK(定长,每个汉字占双字节,英文也占双字节),如果需要大量运算,比较顺序等定长字符集更快,性能高。

c.处理移动互联网业务,可能需要使用utf8mb4字符集。

1.2.4 查看当前MySQL系统支持的字符集

[[email protected] ~]# mysql -uroot -p123456 -e "SHOW CHARACTER SET"

最常用的有四种:

[[email protected] ~]# mysql -uroot -p123456 -e "SHOW CHARACTER SET;"|egrep "gbk|utf8|latin1"|awk ‘ {print $0}‘
latin1    cp1252 West European    latin1_swedish_ci    1
gbk    GBK Simplified Chinese    gbk_chinese_ci    2
utf8    UTF-8 Unicode    utf8_general_ci    3
utf8mb4    UTF-8 Unicode    utf8mb4_general_ci    4

查看mysql当前的字符集设置情况

mysql> show variables like ‘character_set%‘;
+--------------------------+----------------------------------+
| Variable_name| Value|
+--------------------------+----------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database   | latin1   |
| character_set_filesystem | binary   |
| character_set_results| utf8 |
| character_set_server | latin1   |
| character_set_system | utf8 |
| character_sets_dir   | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+

提示:默认情况下character_set_client,character_set_connection,character_set_results三者的字符集和系统的字符集是一致的,是同时修改的。即为:

[[email protected] ~]# cat /etc/sysconfig/i18n 
LANG="zh_CN.UTF-8"
[[email protected] ~]# echo $LANG
zh_CN.UTF-8

1.3 Mysql数据库默认设置的字符集是什么?

a.先看一下mysql默认情况下设置的字符集

mysql> show variables like ‘character_set%‘;
+--------------------------+----------------------------------+
| Variable_name| Value|
+--------------------------+----------------------------------+
| character_set_client | gb2312   |
| character_set_connection | gb2312   |
| character_set_database   | latin1   |
| character_set_filesystem | binary   |
| character_set_results| gb2312   |
| character_set_server | latin1   |
| character_set_system | utf8 |
| character_sets_dir   | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+

不同字符集参数的含义如下:

Variable_name  | Value  
| character_set_client | latin1  客户端字符集
| character_set_connection | latin1  连接字符集
| character_set_database   | latin1数据库字符集,配置文件指定或建库建表指定
| character_set_results| latin1  返回结果字符集
| character_set_server | latin1服务器字符集,配置文件指定或建库建表指定

更改linux系统字符集变量后,查看MySQL中字符集的变化

[[email protected] ~]# echo $LANG
zh_CN.UTF-8
[[email protected] ~]# mysql -uroot -p123456 -e "show variables like ‘character_set%‘;"
+--------------------------+----------------------------------+
| Variable_name| Value|
+--------------------------+----------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database   | latin1   |
| character_set_filesystem | binary   |
| character_set_results| utf8 |
| character_set_server | latin1   |
| character_set_system | utf8 |
| character_sets_dir   | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+

我们发现character_set_connection,character_set_client,character_set_server 三者的字符集和系统的一致也都改成utf8了。

1.4 执行set names latin1到底做了什么

无论linux系统的字符集是gb2312还是utf8默认情况下插入数据都是乱码的。

a.此时查看数据就是乱码

mysql> use cuizhong
Database changed
mysql> select * from student
-> ;
+----+---------------------+
| id | name|
+----+---------------------+
|  1 | zhangsan|
|  2 | lisi|
|  3 | wanger  |
|  4 | xiaozhang   |
|  5 | xiaowang|
|  6 | ??? |
|  7 | ?°?o¢  |
|  8 | ??è?¤èˉ?   |
|  9 | ?????  |
+----+---------------------+
9 rows in set (0.10 sec)

b.执行完set对应的字符集操作,就解决乱码问题了

(1)先查看一下库和表的字符集

mysql> show create database cuizhong\G
*************************** 1. row ***************************
   Database: cuizhong
Create Database: CREATE DATABASE `cuizhong` /*!40100 DEFAULT CHARACTER SET latin1 */
1 row in set (0.00 sec)
mysql> show create table student\G
*************************** 1. row ***************************
   Table: student
Create Table: CREATE TABLE `student` (
  `id` int(4) NOT NULL AUTO_INCREMENT,
  `name` char(20) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=10 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

(2)我们看库和表的字符集都是latin1,所以执行set names latin1保证字符集一样就不会乱码了。

mysql> set names latin1;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from student;
+----+-----------+
| id | name  |
+----+-----------+
|  1 | zhangsan  |
|  2 | lisi  |
|  3 | wanger|
|  4 | xiaozhang |
|  5 | xiaowang  |
|  6 | ???   |
|  7 | 小红  |
|  8 | 不认识|
|  9 | 李四  |
+----+-----------+

(3)执行完set字符集操作的结果改变了如下字三个字符集character_set_client,character_set_connection,character_set_results的参数。

mysql> show variables like ‘character_set%‘;
+--------------------------+----------------------------------+
| Variable_name| Value|
+--------------------------+----------------------------------+
| character_set_client | latin1   |
| character_set_connection | latin1   |
| character_set_database   | latin1   |
| character_set_filesystem | binary   |
| character_set_results| latin1   |
| character_set_server | latin1   |
| character_set_system | utf8 |
| character_sets_dir   | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+

1.5 Mysql命令参数—default-character-set=latin1在做什么?

(1)先查看一下mysql的字符集

[[email protected] ~]# mysql -uroot -p123456 -e "show variables like ‘character_set%‘;"
+--------------------------+----------------------------------+
| Variable_name| Value|
+--------------------------+----------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database   | latin1   |
| character_set_filesystem | binary   |
| character_set_results| utf8 |
| character_set_server | latin1   |
| character_set_system | utf8 |
| character_sets_dir   | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+

(2)带—default-character-set=latin1 参数登录mysql

[[email protected]~]# mysql -uroot -p123456 --default-character-set=latin1
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 5.5.32 Source distribution

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type ‘help;‘ or ‘\h‘ for help. Type ‘\c‘ to clear the current input statement.
mysql> 

(3)现在再查看mysql的字符集

mysql> show variables like ‘character_set%‘;
+--------------------------+----------------------------------+
| Variable_name| Value|
+--------------------------+----------------------------------+
| character_set_client | latin1   |
| character_set_connection | latin1   |
| character_set_database   | latin1   |
| character_set_filesystem | binary   |
| character_set_results| latin1   |
| character_set_server | latin1   |
| character_set_system | utf8 |
| character_sets_dir   | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+

(4)带参数登录也是临时修改不带参数登录又变回去了

[[email protected]~]# mysql -uroot -p123456 --default-character-set=latin1 -e "show variables like ‘character_set%‘;"
+--------------------------+----------------------------------+
| Variable_name| Value|
+--------------------------+----------------------------------+
| character_set_client | latin1   |
| character_set_connection | latin1   |
| character_set_database   | latin1   |
| character_set_filesystem | binary   |
| character_set_results| latin1   |
| character_set_server | latin1   |
| character_set_system | utf8 |
| character_sets_dir   | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+

[[email protected] ~]# mysql -uroot -p123456 -e "show variables like ‘character_set%‘;"
+--------------------------+----------------------------------+
| Variable_name| Value|
+--------------------------+----------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database   | latin1   |
| character_set_filesystem | binary   |
| character_set_results| utf8 |
| character_set_server | latin1   |
| character_set_system | utf8 |
| character_sets_dir   | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+

1.6 确保MySQL数据库插入数据不乱码解决方案

1.6.1统一MySQL数据库客户端及服务端的字符集

(1)MySQL数据库的下面几个字符集(客户端和服务端)统一成一个字符集才能确保插入的中文数据库可以正常输出。当然,linux系统的字符集也要尽可能和数据库字符集统一。

(2)show variables like ‘character_set%‘;命令输出结果如下

Variable_name| Value 
+--------------------------+--------------------------------+
①character_set_client | latin1  客户端字符集
②character_set_connection | latin1  连接字符集
③character_set_database   | latin1   数据库字符集
④character_set_results| latin1   返回结果字符集
⑤character_set_server | latin1   服务器字符集,配置文件制定或建库建表指定

其中,①②④三个参数默认情况采用linux系统字符集设置,人工登录数据库执行set names latin1以及mysql指定字符集登录操作,都是改变mysql客户端的client、connection、results3个参数的字符集都为latin1,从而解决插入乱码问题,这个操作可以在my.cnf配置文件里修改mysql客户端的字符集,配置方法如下:

[client]
Default-character-set=latin1

提示:不需要重启

[[email protected] ~]# sed -n "18,22p" /etc/my.cnf 
[client]
#password    = your_password
port        = 3306
socket        = /usr/local/mysql/tmp/mysql.sock
default-character-set = latin1
[[email protected] ~]# mysql -uroot -p123456 -e "show variables like ‘character_set%‘;"
+--------------------------+----------------------------------+
| Variable_name| Value|
+--------------------------+----------------------------------+
| character_set_client | latin1   |
| character_set_connection | latin1   |
| character_set_database   | latin1   |
| character_set_filesystem | binary   |
| character_set_results| latin1   |
| character_set_server | latin1   |
| character_set_system | utf8 |
| character_sets_dir   | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+

(3)修改完客户端字符自不用set查询表数据就不会乱码了

[[email protected] ~]# mysql -uroot -p123456 -e "select * from cuizhong.student;"
+----+-----------+
| id | name  |
+----+-----------+
|  1 | zhangsan  |
|  2 | lisi  |
|  3 | wanger|
|  4 | xiaozhang |
|  5 | xiaowang  |
|  6 | ???   |
|  7 | 小红|
|  8 | 不认识 |
|  9 | 李四|
+----+-----------+

1.6.2 更改MySQL服务端字符集

(1) 按下面要求修改my.cnf参数

[mysqld]
Default-character-set = latin1适合5.1及以前版本
Default-character-server=latin1 适合5.5

(2) 修改前查看当前字符集

[[email protected] ~]# mysql -uroot -p123456 -e "show variables like ‘character_set%‘;"
+--------------------------+----------------------------------+
| Variable_name| Value|
+--------------------------+----------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database   | latin1   |
| character_set_filesystem | binary   |
| character_set_results| utf8 |
| character_set_server | latin1   |
| character_set_system | utf8 |
| character_sets_dir   | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+

(3) 查看修改的参数

[[email protected] ~]# sed -n "26,27p" /etc/my.cnf 
[mysqld]
character-set-server = utf8
(4)     重启mysql服务(生产环境是不允许重启的)
[[email protected] ~]# /etc/init.d/mysqld restart
Shutting down MySQL.. SUCCESS! 
Starting MySQL.. SUCCESS!

(4) 查看更改后的字符集

[[email protected] ~]# mysql -uroot -p123456 -e "show variables like ‘character_set%‘;"
+--------------------------+----------------------------------+
| Variable_name| Value|
+--------------------------+----------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database   | utf8 |
| character_set_filesystem | binary   |
| character_set_results| utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir   | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+

提示:以上在[mysqld]下设置的参数会更改下面2个参数的字符集设置

| Variable_name| Value|
| character_set_database   | utf8 |
| character_set_server | utf8 |

这个时候我们再修改系统字符集mysql数据库字符集就不改了

[[email protected] ~]# cat /etc/sysconfig/i18n 
LANG="zh_CN.GB2312"
#LANG="zh_CN.UTF-8"
[[email protected] ~]# source /etc/sysconfig/i18n 
[[email protected] ~]# mysql -uroot -p123456 -e "show variables like ‘character_set%‘;"
+--------------------------+----------------------------------+
| Variable_name| Value|
+--------------------------+----------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database   | utf8 |
| character_set_filesystem | binary   |
| character_set_results| utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir   | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+

1.6.3 统一 mysql数据库客户端及服务端字符集总结

不乱码思想:建议中英文环境选择utf8 ,linux系统,客户端,服务端,库,表,程序字符集统一。

1.Linux系统字符集统一utf8

[[email protected] ~]# cat /etc/sysconfig/i18n 
LANG="zh_CN.UTF-8"

提示linux客户款也要更改字符集 例如:xshell

技术分享图片

例如:SecureCRT

技术分享图片

2.Mysql数据库客户端

临时:

Set names latin1
永久:

更改my.cnf客户端模块的参数,可以实现set names latin1效果,并永久生效。

3.服务端

更改my.cnf参数

[mysqld]
Default-character-set = latin1适合5.1及以前版本
character-set-server = latin1适合5.5

4.库表,程序 指定字符集建库

Create database cuizhong_utf8 DEFAULT CHARACTER SET UTF8 COLLATE后面加校对规则

我们可以show一下查看支持的校对规则

mysql> show character set;
+----------+-----------------------------+---------------------+--------+
| Charset  | Description | Default collation   | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5 | Big5 Traditional Chinese| big5_chinese_ci |  2 |
| dec8 | DEC West European   | dec8_swedish_ci |  1 |
| cp850| DOS West European   | cp850_general_ci|  1 |
| hp8  | HP West European| hp8_english_ci  |  1 |
| koi8r| KOI8-R Relcom Russian   | koi8r_general_ci|  1 |
| latin1   | cp1252 West European| latin1_swedish_ci   |  1 |
| latin2   | ISO 8859-2 Central European | latin2_general_ci   |  1 |
| swe7 | 7bit Swedish| swe7_swedish_ci |  1 |
| ascii| US ASCII| ascii_general_ci|  1 |
| ujis | EUC-JP Japanese | ujis_japanese_ci|  3 |
| sjis | Shift-JIS Japanese  | sjis_japanese_ci|  2 |
| hebrew   | ISO 8859-8 Hebrew   | hebrew_general_ci   |  1 |
| tis620   | TIS620 Thai | tis620_thai_ci  |  1 |
| euckr| EUC-KR Korean   | euckr_korean_ci |  2 |
| koi8u| KOI8-U Ukrainian| koi8u_general_ci|  1 |
| gb2312   | GB2312 Simplified Chinese   | gb2312_chinese_ci   |  2 |
| greek| ISO 8859-7 Greek| greek_general_ci|  1 |
| cp1250   | Windows Central European| cp1250_general_ci   |  1 |
| gbk  | GBK Simplified Chinese  | gbk_chinese_ci  |  2 |
| latin5   | ISO 8859-9 Turkish  | latin5_turkish_ci   |  1 |
| armscii8 | ARMSCII-8 Armenian  | armscii8_general_ci |  1 |
| utf8 | UTF-8 Unicode   | utf8_general_ci |  3 |
| ucs2 | UCS-2 Unicode   | ucs2_general_ci |  2 |
| cp866| DOS Russian | cp866_general_ci|  1 |
| keybcs2  | DOS Kamenicky Czech-Slovak  | keybcs2_general_ci  |  1 |
| macce| Mac Central European| macce_general_ci|  1 |
| macroman | Mac West European   | macroman_general_ci |  1 |
| cp852| DOS Central European| cp852_general_ci|  1 |
| latin7   | ISO 8859-13 Baltic  | latin7_general_ci   |  1 |
| utf8mb4  | UTF-8 Unicode   | utf8mb4_general_ci  |  4 |
| cp1251   | Windows Cyrillic| cp1251_general_ci   |  1 |
| utf16| UTF-16 Unicode  | utf16_general_ci|  4 |
| cp1256   | Windows Arabic  | cp1256_general_ci   |  1 |
| cp1257   | Windows Baltic  | cp1257_general_ci   |  1 |
| utf32| UTF-32 Unicode  | utf32_general_ci|  4 |
| binary   | Binary pseudo charset   | binary  |  1 |
| geostd8  | GEOSTD8 Georgian| geostd8_general_ci  |  1 |
| cp932| SJIS for Windows Japanese   | cp932_japanese_ci   |  2 |
| eucjpms  | UJIS for Windows Japanese   | eucjpms_japanese_ci |  3 |
+----------+-----------------------------+---------------------+--------+
39 rows in set (0.00 sec)

5.开发程序的字符集

简体UTF8

http://download.comsenz.com/Discuzx/3.2/Discuz_X3.2_SC_UTF8.zip

1.7 如何更改生产MySQL数据库库表的字符集

数据字符集的修改步骤

对于已有数据库想修改字符集不能直接通过“alter database character set ”或者”alter table tablename character set ”,这两个命令都没有更新已有数据的字符集。而只是对新创建的表或者数据生效。
已经有记录的字符集的调整必须将数据导出,经过修改字符集之后重新导入才可完成。

修改数据库默认编码

Alter database [your db name] charset [your character setting]

下面模拟将latin1字符集的数据库修改成GBK字符集的过程。

(1)导出表结构

Mysqldump –uroot –p123456 –-default-character-set=latin1 –d dbname>alltable.sql –-default-character-set=gbk 表示以GBK字符集进行连接 –d只导表结构

(2)然后编辑alltable.sql将latin1改成GBK

Set names GBK

(3)确保数据不在更新导出所有数据

Mysqldmup –uroot –p123456 –-quick –-no-create-info –-extended-insert –-default-character-set=latin1 dbname>alltable.sql

参数说明:

--quick:用于转储大的表,强制mysqldump从服务器一次一行的检索数据而不是检索所有行并输出前CACHE到内存中。

--no-create-info:不创建CREATE TABLE 语句。

--extended-insert:使用包括几个VALUES列表的多行INSERT语法,这样文件更小,IO也小导入数据是非常快。

--default-character-set=latin1按照原有字符集导出数据,这样导出的文件中,所有中文都是可见的,不会保存成乱码。

(4)打开alltable.sql将set names latin1修改成set names gbk(或者修改系统的服务端和客户端)。

(5)建库

Create database dbname default charset gbk;

(6)创建表执行,alltable.sql

Mysql –uroot –p123456 dbname<alltable.sql

(7)导入数据

Mysql –uroot –p123456 dbname<alltable.sql

总结:latin1改成utf8

  1. 建库及建表的语句导出,sed批量修改为utf8。

  2. 导出所有数据。

  3. 修改mysql服务端和客户端编码为utf8。

  4. 删除原有的库表及数据。

  5. 导入新的建库建表的语句。

  6. 导入mysql的所有数据。

以上是关于Mysql DBA 高级运维学习笔记-Mysql数据库字符集知识的主要内容,如果未能解决你的问题,请参考以下文章

Mysql DBA 高级运维学习笔记-Mysql常用基础命令实战

Mysql DBA 高级运维学习笔记-MySQL5.5编译方式安装实战

Mysql DBA 高级运维学习笔记-MySQL主从复制故障解决

Mysql DBA 高级运维学习笔记-删除表中数据

Mysql DBA 高级运维学习笔记-Mysql常见多实例配置方案及多实例安装

Mysql DBA 高级运维学习笔记-创建mysql用户及授权的多种方法实战