常见编码格式总结，与代码的互相转换

Posted 2020-08-31

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了常见编码格式总结，与代码的互相转换相关的知识，希望对你有一定的参考价值。

　　作为计算机交流的语言，字体编码在文字的显示方面发挥着重大的作用。本次就介绍一下不同的字体格式以及之间的转换方法。

一、ANSII编码

　　作为最早的编码方法，ASCII是最基础的编码方法。ASCII码最早只有后7位可以使用，被编码成指令、标点、数字与英文字母，因此这种编码方法叫做"American Standard Code for Infomation Intechange"。由于其它国家使用ASCII码时不存在本国家的字母符号，因此ASCII码发生了扩展，最高一位也用来编码。于是将128到255的编码称为“扩展字符集”。

二、GB2312

　　GB2312是对ASCII编码的中文扩展。为了表示汉字，决定不使用ASCII码的扩展字符集，而将128到255的内容进行重新编码，并用两个字节来表示汉字。因此0~127的意义不变，当两个大于128的字节放在一起时就表示一个中文，其中高字节用0xA1~0xF7编码，低字节用0xA1~0xFE。在这些编码中，还包括数字符号、罗马字母、希腊字母以及日语的假名。对于标点符号也用了两个字节编码，形成了全角符号。后来为了增大容量，同样要求高字节用大于127的字符，但后一个字节没有了限制，这种编码方法称为GBK编码，其保留了GB2312的所有码字匹配。

三、Unicode

　　由于不同字体符号各有各的编码方法，因此难于交流。ISO（国际标准化组织）决定解决这个问题，提出一种统一的编码方式。这种编码方法使用两个字节，也就是一个unicode字符占用两个字节。对于ASCII编码，Uincode保留了下来，并用两个字节来表示，因此高位字节为0。由于是重新编码，因此GBK与Unicode编码是不兼容的，必须通过查表进行转换。

四、UTF-8

　　由于计算机的兴起，需要一种编码方法能够对Unicode码进行高效的传输，因此就提出了UTF-8的编码方法，一次传输8位，其实Unicode编码的一种对应编码，编码方法如下：如果一个Unicode在0x0000-007F之间，则编码成0xxxxxxxx，其中x为原来的Unicode码，其它的编码方法为0x0080~0x07FF(Unicode)->110xxxxx 10xxxxxx(UTF-8)，0x0800~0xFFFF(Unicode)->(1110xxxx 10xxxxxx 10xxxxxx)(UTF-8)。因此对于不同的Unicode编码，UTF-8用1个，2个或3个字节表示。

五、使用C++实现编码转换

static std::wstring MBytesToWString(const char* lpcszString);

static std::string WStringToMBytes(const wchar_t* lpwcszWString);

static std::wstring UTF8ToWString(const char* lpcszString);

static std::string WStringToUTF8(const wchar_t* lpwcszWString);

std::wstring KKLogObject::MBytesToWString(const char* lpcszString)//ascii码转unicode编码

{
    int len = strlen(lpcszString);
    int unicodeLen = ::MultiByteToWideChar(CP_ACP, 0, lpcszString, -1, NULL, 0);
    wchar_t* pUnicode = new wchar_t[unicodeLen + 1];
    memset(pUnicode, 0, (unicodeLen + 1) * sizeof(wchar_t));
    ::MultiByteToWideChar(CP_ACP, 0, lpcszString, -1, (LPWSTR)pUnicode, unicodeLen);
    wstring wString = (wchar_t*)pUnicode;
    delete [] pUnicode;
    return wString;
}

std::string KKLogObject::WStringToMBytes(const wchar_t* lpwcszWString)//unicode转ascii编码
{
    char* pElementText;
    int iTextLen;
    // wide char to multi char
    iTextLen = ::WideCharToMultiByte(CP_ACP, 0, lpwcszWString, -1, NULL, 0, NULL, NULL);
    pElementText = new char[iTextLen + 1];
    memset((void*)pElementText, 0, (iTextLen + 1) * sizeof(char));
    ::WideCharToMultiByte(CP_ACP, 0, lpwcszWString, -1, pElementText, iTextLen, NULL, NULL);
    std::string strReturn(pElementText);
    delete [] pElementText;
    return strReturn;
}

std::wstring KKLogObject::UTF8ToWString(const char* lpcszString)//utf-8转unicode
{
    int len = strlen(lpcszString);
    int unicodeLen = ::MultiByteToWideChar(CP_UTF8, 0, lpcszString, -1, NULL, 0);
    wchar_t* pUnicode;
    pUnicode = new wchar_t[unicodeLen + 1];
    memset((void*)pUnicode, 0, (unicodeLen + 1) * sizeof(wchar_t));
    ::MultiByteToWideChar(CP_UTF8, 0, lpcszString, -1, (LPWSTR)pUnicode, unicodeLen);
    wstring wstrReturn(pUnicode);
    delete [] pUnicode;
    return wstrReturn;
}

std::string KKLogObject::WStringToUTF8(const wchar_t* lpwcszWString)//unicode转utf-8
{
    char* pElementText;
    int iTextLen = ::WideCharToMultiByte(CP_UTF8, 0, (LPWSTR)lpwcszWString, -1, NULL, 0, NULL, NULL);
    pElementText = new char[iTextLen + 1];
    memset((void*)pElementText, 0, (iTextLen + 1) * sizeof(char));
    ::WideCharToMultiByte(CP_UTF8, 0, (LPWSTR)lpwcszWString, -1, pElementText, iTextLen, NULL, NULL);
    std::string strReturn(pElementText);
    delete [] pElementText;
    return strReturn;
}

以上是关于常见编码格式总结，与代码的互相转换的主要内容，如果未能解决你的问题，请参考以下文章

编码格式转换--常见编码间的转换以及中文简繁互换

python基础——编码、bytes与str转换及格式化

常见的数字证书格式与格式转换

在linux系统下使用lazarus，ansi和utf8编码格式怎么互相转换？我需要做一个可以读取windows系统发送来的t

Windows编程下的字符串格式及编码问题

wav格式编码dsd转换pcm编码