C#十六进制值0x12,是无效字符

Posted

技术标签:

【中文标题】C#十六进制值0x12,是无效字符【英文标题】:C# hexadecimal value 0x12, is an invalid character 【发布时间】:2014-01-29 22:43:32 【问题描述】:

我正在加载很多 xml 文档,其中一些返回错误,例如“十六进制值 0x12,是无效字符”,并且有不同的字符。如何删除它们?

【问题讨论】:

几乎可以肯定,您的文档开头有 BOM,请参阅 en.wikipedia.org/wiki/Byte_order_mark 【参考方案1】:

我在这里做了一个小研究。

这是 ASCII 表。有128个符号 这是一些小的测试代码,它添加了 ASCII 表中的每个符号并尝试将其加载为 XML 文档。

static public void RegexTry()

    StreamReader stream = new StreamReader(@"test.xml");
    string xmlfile = stream.ReadToEnd();
    stream.Close();

    string text = "";

    for (int i = 0; i < 128; i++ )
    
        char t = (char) i;

        text = xmlfile.Replace('П', t);

        XmlDocument xml = new XmlDocument();
        try
        
            xml.LoadXml(text);
        
        catch (Exception ex)
        
            Console.WriteLine("Char("+i.ToString() +"): " + t + " => error! " + ex.Message);
            continue;
        

        Console.WriteLine("Char(" + i.ToString() + "): " + t + " => fine!");
    

    Console.ReadKey();

结果它返回:

Char(0): => error! '.', hexadecimal value 0x00, is an invalid character. Line 5, position 7.
Char(1): => error! '', hexadecimal value 0x01, is an invalid character. Line 5, position 7.
Char(2): => error! '', hexadecimal value 0x02, is an invalid character. Line 5, position 7.
Char(3): => error! '', hexadecimal value 0x03, is an invalid character. Line 5, position 7.
Char(4): => error! '', hexadecimal value 0x04, is an invalid character. Line 5, position 7.
Char(5): => error! '', hexadecimal value 0x05, is an invalid character. Line 5, position 7.
Char(6): => error! '', hexadecimal value 0x06, is an invalid character. Line 5, position 7.
Char(7): => error! '', hexadecimal value 0x07, is an invalid character. Line 5, position 7.
Char(8): => error! '', hexadecimal value 0x08, is an invalid character. Line 5, position 7.
Char(9):     => fine!
Char(10): 
 => fine!
Char(11): => error! '', hexadecimal value 0x0B, is an invalid character. Line 5, position 7.
Char(12): => error! '', hexadecimal value 0x0C, is an invalid character. Line 5, position 7.
Char(13): 
 => fine!
Char(14): => error! '', hexadecimal value 0x0E, is an invalid character. Line 5, position 7.
Char(15): => error! '', hexadecimal value 0x0F, is an invalid character. Line 5, position 7.
Char(16): => error! '', hexadecimal value 0x10, is an invalid character. Line 5, position 7.
Char(17): => error! '', hexadecimal value 0x11, is an invalid character. Line 5, position 7.
Char(18): => error! '', hexadecimal value 0x12, is an invalid character. Line 5, position 7.
Char(19): => error! '', hexadecimal value 0x13, is an invalid character. Line 5, position 7.
Char(20): => error! '', hexadecimal value 0x14, is an invalid character. Line 5, position 7.
Char(21): => error! '', hexadecimal value 0x15, is an invalid character. Line 5, position 7.
Char(22): => error! '', hexadecimal value 0x16, is an invalid character. Line 5, position 7.
Char(23): => error! '', hexadecimal value 0x17, is an invalid character. Line 5, position 7.
Char(24): => error! '', hexadecimal value 0x18, is an invalid character. Line 5, position 7.
Char(25): => error! '', hexadecimal value 0x19, is an invalid character. Line 5, position 7.
Char(26): => error! '', hexadecimal value 0x1A, is an invalid character. Line 5, position 7.
Char(27): => error! '', hexadecimal value 0x1B, is an invalid character. Line 5, position 7.
Char(28): => error! '', hexadecimal value 0x1C, is an invalid character. Line 5, position 7.
Char(29): => error! '', hexadecimal value 0x1D, is an invalid character. Line 5, position 7.
Char(30): => error! '', hexadecimal value 0x1E, is an invalid character. Line 5, position 7.
Char(31): => error! '', hexadecimal value 0x1F, is an invalid character. Line 5, position 7.
Char(32):   => fine!
Char(33): ! => fine!
Char(34): " => fine!
Char(35): # => fine!
Char(36): $ => fine!
Char(37): % => fine!
Char(38): => error! An error occurred while parsing EntityName. Line 5, position 8.
Char(39): ' => fine!
Char(40): ( => fine!
Char(41): ) => fine!
Char(42): * => fine!
Char(43): + => fine!
Char(44): , => fine!
Char(45): - => fine!
Char(46): . => fine!
Char(47): / => fine!
Char(48): 0 => fine!
Char(49): 1 => fine!
Char(50): 2 => fine!
Char(51): 3 => fine!
Char(52): 4 => fine!
Char(53): 5 => fine!
Char(54): 6 => fine!
Char(55): 7 => fine!
Char(56): 8 => fine!
Char(57): 9 => fine!
Char(58): : => fine!
Char(59): ; => fine!
Char(60): => error! The '<' character, hexadecimal value 0x3C, cannot be included in a name. Line 5, position 13.
Char(61): = => fine!
Char(62): > => fine!
Char(63): ? => fine!
Char(64): @ => fine!
Char(65): A => fine!
Char(66): B => fine!
Char(67): C => fine!
Char(68): D => fine!
Char(69): E => fine!
Char(70): F => fine!
Char(71): G => fine!
Char(72): H => fine!
Char(73): I => fine!
Char(74): J => fine!
Char(75): K => fine!
Char(76): L => fine!
Char(77): M => fine!
Char(78): N => fine!
Char(79): O => fine!
Char(80): P => fine!
Char(81): Q => fine!
Char(82): R => fine!
Char(83): S => fine!
Char(84): T => fine!
Char(85): U => fine!
Char(86): V => fine!
Char(87): W => fine!
Char(88): X => fine!
Char(89): Y => fine!
Char(90): Z => fine!
Char(91): [ => fine!
Char(92): \ => fine!
Char(93): ] => fine!
Char(94): ^ => fine!
Char(95): _ => fine!
Char(96): ` => fine!
Char(97): a => fine!
Char(98): b => fine!
Char(99): c => fine!
Char(100): d => fine!
Char(101): e => fine!
Char(102): f => fine!
Char(103): g => fine!
Char(104): h => fine!
Char(105): i => fine!
Char(106): j => fine!
Char(107): k => fine!
Char(108): l => fine!
Char(109): m => fine!
Char(110): n => fine!
Char(111): o => fine!
Char(112): p => fine!
Char(113): q => fine!
Char(114): r => fine!
Char(115): s => fine!
Char(116): t => fine!
Char(117): u => fine!
Char(118): v => fine!
Char(119): w => fine!
Char(120): x => fine!
Char(121): y => fine!
Char(122): z => fine!
Char(123):  => fine!
Char(124): | => fine!
Char(125):  => fine!
Char(126): ~ => fine!
Char(127):  => fine!  

您可以看到有很多符号不能出现在 XML 代码中。要替换它们,我们可以使用 Reqex.Replace

static string ReplaceHexadecimalSymbols(string txt)

    string r = "[\x00-\x08\x0B\x0C\x0E-\x1F\x26]";
    return Regex.Replace(txt, r,"",RegexOptions.Compiled);

PS。对不起,如果每个人都知道。

【讨论】:

最大的问题可能是它们最初是如何进入 XML 文档的。 你不应该在这里使用试错法。查阅标准。我的回答包含相关部分。反复试验导致您编写了一个从 XML 文档中删除所有 &amp; 字符的正则表达式。这不会有好的结局!【参考方案2】:

XML specification 定义了这样的有效字符:

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

如您所见,#x12 在 XML 文档中不是有效字符。

您问如何删除它们,但我认为这不是您应该问的问题。他们根本不应该在场。您应该拒绝任何格式错误的文件。简单地删除无效字符就可以解决真正的问题。

如果您正在创建有问题的文档,那么您需要修复生成它的代码,以便它生成有效的 XML。

【讨论】:

【参考方案3】:

我认为 x26 "&" 是一个有效字符,它可以被 XML 反序列化。

所以要替换非法字符,我们应该使用:

// Replace illegal character in XML documents with blank
// See here for reference http://www.w3.org/TR/xml/#charsets
var regex = "[\x00-\x08\x0B\x0C\x0E-\x1F]";
xml = Regex.Replace(xml, regex, String.Empty, RegexOptions.Compiled);

【讨论】:

【参考方案4】:

即使在 100MB 的 XML 文档上,Regex 解决方案的运行速度也非常快。

下面的表达式字符串可以完成这项工作。

"[\x00-\x08\x0B\x0C\x0E-\x1F]"

【讨论】:

【参考方案5】:

这本质上是this question 的一个特例。我建议您使用那里的答案之一。

【讨论】:

【参考方案6】:

只需使用 jhon 提供的上述修复更新这些函数,您必须在代码中检查这些函数的更新位置。我已经测试过它会为你工作。

  private static void WriteDataTableToExcelWorksheet(DataTable dt, WorksheetPart worksheetPart)
    
        var worksheet = worksheetPart.Worksheet;
        var sheetData = worksheet.GetFirstChild<SheetData>();

        string cellValue = "";

        //  Create a Header Row in our Excel file, containing one header for each Column of data in our DataTable.
        //
        //  We'll also create an array, showing which type each column of data is (Text or Numeric), so when we come to write the actual
        //  cells of data, we'll know if to write Text values or Numeric cell values.
        int numberOfColumns = dt.Columns.Count;
        bool[] IsNumericColumn = new bool[numberOfColumns];

        string[] excelColumnNames = new string[numberOfColumns];
        for (int n = 0; n < numberOfColumns; n++)
            excelColumnNames[n] = GetExcelColumnName(n);

        //
        //  Create the Header row in our Excel Worksheet
        //
        uint rowIndex = 1;

        var headerRow = new Row  RowIndex = rowIndex ;  // add a row at the top of spreadsheet
        sheetData.Append(headerRow);

        for (int colInx = 0; colInx < numberOfColumns; colInx++)
        
            DataColumn col = dt.Columns[colInx];
            AppendTextCell(excelColumnNames[colInx] + "1", col.ColumnName, headerRow);
            IsNumericColumn[colInx] = (col.DataType.FullName == "System.Decimal") || (col.DataType.FullName == "System.Int32");
        

        //
        //  Now, step through each row of data in our DataTable...
        //
        double cellNumericValue = 0;
        foreach (DataRow dr in dt.Rows)
        
            // ...create a new row, and append a set of this row's data to it.
            ++rowIndex;
            var newExcelRow = new Row  RowIndex = rowIndex ;  // add a row at the top of spreadsheet
            sheetData.Append(newExcelRow);

            for (int colInx = 0; colInx < numberOfColumns; colInx++)
            
                cellValue = dr.ItemArray[colInx].ToString();

                // Create cell with data
                if (IsNumericColumn[colInx])
                
                    //  For numeric cells, make sure our input data IS a number, then write it out to the Excel file.
                    //  If this numeric value is NULL, then don't write anything to the Excel file.
                    cellNumericValue = 0;
                    if (double.TryParse(cellValue, out cellNumericValue))
                    
                        cellValue = ReplaceHexadecimalSymbols(cellNumericValue.ToString());
                        AppendNumericCell(excelColumnNames[colInx] + rowIndex.ToString(), cellValue, newExcelRow);
                    
                
                else
                
                    //  For text cells, just write the input data straight out to the Excel file.
                    AppendTextCell(excelColumnNames[colInx] + rowIndex.ToString(), cellValue, newExcelRow);
                
            
        
    
    static string ReplaceHexadecimalSymbols(string txt)
    
        string r = "[\x00-\x08\x0B\x0C\x0E-\x1F\x26]";
        return Regex.Replace(txt, r, "", RegexOptions.Compiled);
    

    private static void AppendTextCell(string cellReference, string cellStringValue, Row excelRow)
    
        //  Add a new Excel Cell to our Row 
        Cell cell = new Cell()  CellReference = cellReference, DataType = CellValues.String ;
        CellValue cellValue = new CellValue();
        cellValue.Text = ReplaceHexadecimalSymbols(cellStringValue);
        cell.Append(cellValue);
        excelRow.Append(cell);
    

    private static void AppendNumericCell(string cellReference, string cellStringValue, Row excelRow)
    
        //  Add a new Excel Cell to our Row 
        Cell cell = new Cell()  CellReference = cellReference ;
        CellValue cellValue = new CellValue();
        cellValue.Text = ReplaceHexadecimalSymbols(cellStringValue);
        cell.Append(cellValue);
        excelRow.Append(cell);
    

如果您需要进一步的帮助,请告诉我。

【讨论】:

【参考方案7】:

对我有用的是检查特殊字符。失败的特殊字符具有类似于 '&' 的字符代码,也用于 '

public static string CleanInvalidEscapedXmlCharacters(string s)
    
        if(s == null ) return null;

        StringBuilder sbOutput = new StringBuilder();
        char ch;

        //keeps track of which character the previous character was.
        bool hitAmp = false;
        bool hitPound = false;
        bool hitX = false;

        string escapedHold = "";
        
        for( int i = 0; i < s.Length; i++ ) 
        
            ch = s[i];
               
            //check this first so that the x gets ignored.
            if(hitX)
            
                //found the end of the escaped portion
                if(ch == ';')
                
                    ch = (char) Int32.Parse(escapedHold, NumberStyles.AllowHexSpecifier);

                    escapedHold = "";
                    hitX = false;
                    hitPound = false;
                
                else
                
                    //found another digit in the escaped portion
                    escapedHold += ch;
                    continue;
                
            

            if(hitPound)
            
                if(ch == 'x')
                
                    //found &#x
                    hitX = true;
                    continue;
                
                else
                
                    //found &# but no x
                    //reset hits and output &# and current character.
                    hitAmp = false;
                    hitPound = false;
                    sbOutput.Append('&');
                    sbOutput.Append('#');
                    sbOutput.Append(ch);
                    continue;
                
            

            if(ch == '&')
            
                //found an initial &
                hitAmp = true;
                continue;
            
            if (hitAmp)
            
                if (ch == '#')
                
                    //found &#
                    hitPound = true;
                    hitAmp = false;
                    continue;
                
                else
                
                    //found & but no # so this is something like &lt;
                    //reset hits and output the & and current character
                    hitAmp = false;
                    hitPound = false;
                    sbOutput.Append('&');
                    sbOutput.Append(ch);
                    continue;
                
            

            if(!hitAmp && !hitPound && !hitX)
            
                if ((ch >= 0x0020 && ch <= 0xD7FF) ||
                    (ch >= 0xE000 && ch <= 0xFFFD) ||
                    ch == 0x0009 || ch == 0x000A || ch == 0x000D)
                
                    sbOutput.Append(ch);
                
            
        

        return sbOutput.ToString();
    

【讨论】:

以上是关于C#十六进制值0x12,是无效字符的主要内容,如果未能解决你的问题,请参考以下文章

调用webservice接口,报错:(十六进制值0x01)是无效的字符

莫名其妙的异常003:“.”(十六进制值 0x00)是无效的字符

名称不能以''字符开头,在c#中使用xml字符串的十六进制值0x20 [重复]

JavaScript 数值Number类型详解

PAT-BCD解密

C#如何把16进制字符串转成值相等的byte数组?