如何解释 Excel 数字格式字符串以确定值是不是应由 DateTime.FromOADate 解析

Posted

技术标签:

【中文标题】如何解释 Excel 数字格式字符串以确定值是不是应由 DateTime.FromOADate 解析【英文标题】:How to interpret an Excel number format string to determine whether the value should be parsed by DateTime.FromOADate如何解释 Excel 数字格式字符串以确定值是否应由 DateTime.FromOADate 解析 【发布时间】:2011-08-02 17:43:14 【问题描述】:

如何创建一个函数“bool IsDateTime”,该函数将可靠地确定 Excel 数字格式字符串(如“[$-409]h:mm:ss AM/PM;@”)是否指示数值是 DateTime应该传递给 DateTime.FromOADate?

我已经弄清楚 [$-409] 是什么:Excel Number Format: What is "[$-409]"?。这只是一个语言环境代码。

我还阅读了一些关于数字格式字符串被分号分隔为四个格式部分的内容:http://office.microsoft.com/en-us/excel-help/create-or-delete-a-custom-number-format-HP005199500.aspx?CTT=5&origin=HP005198679 和这里http://www.ozgrid.com/Excel/excel-custom-number-formats.htm

例如,简单地搜索日期/时间格式字符(如 h、m、s、y、d)的出现是否可靠? Excel 会如何解释它?

如果问题不清楚...当您读取 Excel 文件并查看日期/时间值时,您实际上是在查看一个普通的旧双精度值,因为这就是它在 Excel 中的存储方式。要确定应该传递给 DateTime.FromOADate 的是普通双精度数还是双精度数,您必须解释自定义数字格式字符串。所以我想问如何解释这样一个字符串,它可能引用或不引用日期/时间值,以确定双精度值是否应通过 DateTime.FromOADate 转换为 DateTime 值。此外,如果成功转换为 DateTime 值,则我需要将 Excel 数字格式字符串转换为等效的 .NET DateTime 格式字符串,以便我可以像 Excel 一样通过 DateTime.ToString(converted_format_string) 显示日期/时间值。

【问题讨论】:

【参考方案1】:

您可以通过使用 CELL 函数并返回格式来检查单元格是否包含任何内置日期格式。如果它使用的是内置格式,它将返回“D”后跟一个数字。

例如:

=IF(LEFT(CELL("format", A1),1)="D",TRUE,FALSE)

对于更一般的情况,我会首先检查单元格是否为数字 (ISNUMBER()) 并在日期范围内(即,在 0 和 TODAY() 之间 - 今天是 39296)。然后,我会检查数字格式中是否存在至少一个 d、m、y、h、M 或 s,因为这应该表示单元格中有日期。

希望这会有所帮助,

戴夫

【讨论】:

谢谢,但也许我的问题不清楚。 DateTime.FromOADate 是一种 .NET 框架方法,所以我说的是在 .NET 应用程序中解释 Excel 数字格式字符串。我使用 EPPlus 读取 XLSX 文件,对于内置格式字符串,它会自动创建一个 DateTime 值。但是,当使用自定义格式字符串时,该值将保留为双精度值,与任何其他数字无法区分。确定是否应调用 DateTime.FromOADate 的唯一方法是查看数字格式字符串并确定它是否用于日期/时间值。 @Trinyko,是的,抱歉,我认为您可以在导出之前在 Excel 中进行更多检查。您的解决方案看起来不错。我能想到的唯一一次你的数字格式可能包含这些字符而不是日期是如果文本被添加到格式中。例如,要将数以百万计的数字显示为 £100 万,您可以使用数字格式 £0,,"M"。希望这会有所帮助,戴夫【参考方案2】:

我实现了一个类来解析 Excel 数字格式字符串。它查看第一部分(格式字符串中四个可能的部分),并使用正则表达式来捕获特定于日期/时间的自定义格式字符,例如“y”、“m”、“d”、“h”、“s "、"AM/PM",如果没有找到则返回 null。第一步只是确定​​格式字符串是否用于日期/时间值,并为我们留下一个面向对象的逻辑日期/时间格式说明符的有序列表以供进一步处理。

假设已确定格式字符串用于日期/时间值,则捕获和分类的值将按照它们在原始格式字符串中的顺序进行排序。

接下来,它应用 Excel 特定的格式怪癖,例如确定“m”是指月还是分钟,仅当它紧跟在“h”之后或“s”之前才将其解释为“分钟”(文字文本是它们之间是允许的,所以它不完全是“立即”之前/之后)。如果未指定“AM/PM”,Excel 还会强制“h”字符使用 24 小时制,因此如果未找到“AM/PM”,它会使用小写 m(.NET 中的 24 小时制),否则它将其转换为大写 M(.NET 中的 12 小时时间)。它还将“AM/PM”转换为 .NET 等效的“tt”,并清除不能包含在纯 .NET DateTime 格式字符串中的条件表达式。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Collections;

namespace utilities.data

    public enum NumberFormatCaptureType
    
        Condition,
        LiteralText,
        Year,
        Month,
        Day,
        Hour,
        Minute,
        Second,
        AMPM
    

    public class NumberFormatTypedCapture
    
        private class ClassificationPair
        
            public string Name;
            public NumberFormatCaptureType Type;
            public bool IndicatesDateTimeValue;
        

        private static readonly Regex regex = new Regex( @"(?<c>\[[^]]*])*((?<y>yyyy|yy)|(?<m>mmmm|mmm|mm|m)|(?<d>dddd|ddd|dd|d)|(?<h>hh|h)|(?<s>ss|s)|(?<t>AM/PM)|(?<t>am/pm)|(?<l>.))*", RegexOptions.Singleline | RegexOptions.ExplicitCapture | RegexOptions.Compiled );
        private static readonly ClassificationPair[] classifications = new ClassificationPair[] 
            new ClassificationPair() Name="c", Type=NumberFormatCaptureType.Condition, IndicatesDateTimeValue=false,
            new ClassificationPair() Name="y", Type=NumberFormatCaptureType.Year, IndicatesDateTimeValue=true,
            new ClassificationPair() Name="m", Type=NumberFormatCaptureType.Month, IndicatesDateTimeValue=true,
            new ClassificationPair() Name="d", Type=NumberFormatCaptureType.Day, IndicatesDateTimeValue=true,
            new ClassificationPair() Name="h", Type=NumberFormatCaptureType.Hour, IndicatesDateTimeValue=true,
            new ClassificationPair() Name="s", Type=NumberFormatCaptureType.Second, IndicatesDateTimeValue=true,
            new ClassificationPair() Name="t", Type=NumberFormatCaptureType.AMPM, IndicatesDateTimeValue=true,
            new ClassificationPair() Name="l", Type=NumberFormatCaptureType.LiteralText, IndicatesDateTimeValue=false
        ;
        private Capture Capture;
        private string mutable_value;
        public NumberFormatCaptureType Type;

        public NumberFormatTypedCapture( Capture c, NumberFormatCaptureType t )
        
            this.Capture = c;
            this.Type = t;
            mutable_value = c.Value;
        

        public int Index
        
            get return Capture.Index;
        

        public string Value
        
            get return mutable_value;
            set mutable_value = value;
        

        public int Length
        
            get return mutable_value.Length;
        

        public static string ConvertToDotNetDateTimeFormat( string number_format )
        
            string[] number_formats = number_format.Split( ';' );
            Match m = regex.Match( number_formats[0] );
            bool date_time_formatting_encountered = false;
            bool am_pm_encountered = false;

            //Classify the catured values into typed NumberFormatTypedCapture instances
            List<NumberFormatTypedCapture> segments = new List<NumberFormatTypedCapture>();
            foreach (ClassificationPair classification in classifications)
            
                CaptureCollection captures = m.Groups[classification.Name].Captures;
                if (classification.IndicatesDateTimeValue && captures.Count > 0)
                
                    date_time_formatting_encountered = true;
                    if (classification.Type == NumberFormatCaptureType.AMPM)
                        am_pm_encountered = true;
                
                segments.AddRange( captures.Cast<Capture>().Select<Capture,NumberFormatTypedCapture>( (capture) => new NumberFormatTypedCapture( capture, classification.Type ) ) );
            

            //Not considered a date time format unless it has at least one instance of a date/time format character
            if (!date_time_formatting_encountered)
                return null;

            //Sort the captured values in the order they were found in the original string.
            Comparison<NumberFormatTypedCapture> comparison = (x,y) => (x.Index < y.Index) ? -1 : ((x.Index > y.Index) ? 1 : 0);
            segments.Sort( comparison );

            //Begin conversion of the captured Excel format characters to .NET DateTime format characters
            StringComparer sc = StringComparer.CurrentCultureIgnoreCase;
            for (int i = 0; i < segments.Count; i++)
            
                NumberFormatTypedCapture c = segments[i];
                switch (c.Type)
                
                    case NumberFormatCaptureType.Hour: //In the absense of an the AM/PM, Excel forces hours to display in 24-hour time
                        if (am_pm_encountered)
                            c.Value = c.Value.ToLower(); //.NET lowercase "h" formats hourse in 24-hour time
                        else
                            c.Value = c.Value.ToUpper(); //.NET uppercase "H" formats hours in 12-hour time
                        break;
                    case NumberFormatCaptureType.Month: //The "m" (month) designator is interpretted as minutes by Excel when found after an Hours indicator or before a Seconds indicator.
                        NumberFormatTypedCapture prev_format_character = GetAdjacentDateTimeVariable( segments, i, -1 );
                        NumberFormatTypedCapture next_format_character = GetAdjacentDateTimeVariable( segments, i, 1 );
                        if ((prev_format_character != null && prev_format_character.Type == NumberFormatCaptureType.Hour) || (next_format_character != null && next_format_character.Type == NumberFormatCaptureType.Second))
                            c.Type = NumberFormatCaptureType.Minute; //Format string is already lowercase (Excel seems to force it to lowercase), so just leave it lowercase and set the type to Minute
                        else
                            c.Value = c.Value.ToUpper(); //Month indicator is uppercase in .NET framework
                        break;
                    case NumberFormatCaptureType.AMPM: //AM/PM indicator is "tt" in .NET framework
                        c.Value = "tt";
                        break;
                    case NumberFormatCaptureType.Condition: //Conditional formatting is not supported in .NET framework
                        c.Value = String.Empty;
                        break;
                    //case NumberFormatCaptureType.Text: //Merge adjacent text elements
                        //break;
                
            

            //Now that the individual captures have been blanked out or converted to the .NET DateTime format string, concatenate it all together than return the final format string.
            StringBuilder sb = new StringBuilder();
            foreach (NumberFormatTypedCapture c in segments)
                sb.Append( c.Value );
            return sb.ToString();
        

        private static NumberFormatTypedCapture GetAdjacentDateTimeVariable( List<NumberFormatTypedCapture> captures, int current, int direction )
        
        check_next:
            current += direction;
            if (current >= 0 && current < captures.Count)
            
                NumberFormatTypedCapture capture = captures[current];
                if (capture.Type == NumberFormatCaptureType.Condition || capture.Type == NumberFormatCaptureType.LiteralText)
                    goto check_next;
                return capture;
            
            return null;
        
    

上述类可用于以下上下文,从 Excel 文件中具有非空标题的列中将字符串值读入 DataTable。具体来说,它会尝试获取一个有效的 DateTime 实例,如果找到,它会尝试从 Excel 数字格式字符串构造一个有效的 .NET DateTime 格式字符串。如果前面的两个步骤都成功,它将格式化的日期时间字符串存储在数据表中,否则它将存在的任何值转换为字符串(确保首先去除富文本格式(如果存在):

using (ExcelPackage package = new ExcelPackage( fileUpload.FileContent ))

    Dictionary<string,string> converted_dt_format_strings = new Dictionary<string,string>();
    ExcelWorksheet sheet = package.Workbook.Worksheets.First();
    int end_column = sheet.Dimension.End.Column;
    int end_row = sheet.Dimension.End.Row;

    DataTable datatable = new DataTable();

    //Construct columns
    int i_row = 1;
    List<int> valid_columns = new List<int>();
    for (int i_col = 1; i_col <= end_column; i_col++)
    
        ExcelRange range = sheet.Cells[i_row, i_col];
        string field_name_text = range.IsRichText ? range.RichText.Text : (range.Value ?? String.Empty).ToString();
        if (field_name_text != null)
        
            valid_columns.Add( i_col );
            datatable.Columns.Add( field_name_text, typeof(string) );
        
    

    int valid_column_count = valid_columns.Count;
    for (i_row = 2; i_row <= end_row; i_row++)
    
        DataRow row = datatable.NewRow();
        for (int i_col = 0; i_col < valid_column_count; i_col++)
        
            ExcelRange range = sheet.Cells[i_row, valid_columns[i_col]];

            //Attempt to acquire a DateTime value from the cell
            DateTime? d = null;
            try
            
                if (range.Value is DateTime)
                    d = (DateTime)range.Value;
                else if (range.Value is double)
                    d = DateTime.FromOADate( (double)range.Value );
                else
                    d = null;
            
            catch
            
                d = null;
            

            string field_value_text = range.IsRichText ? (range.RichText.Text ?? String.Empty) : (range.Value ?? String.Empty).ToString(); //Acquire plain text string version of the object, which will be used if a formatted DateTime string cannot be produced
            string field_value_dt_text = null;

            if (d.HasValue)
            
                try
                
                    string excel_number_format = range.Style.Numberformat.Format;
                    string date_time_format = null;
                    if (excel_number_format != null)
                    
                        if (!converted_dt_format_strings.TryGetValue( excel_number_format, out date_time_format ))
                        
                            date_time_format = NumberFormatTypedCapture.ConvertToDotNetDateTimeFormat( excel_number_format );
                            converted_dt_format_strings.Add( excel_number_format, date_time_format );
                        
                        if (date_time_format != null) //Appears to have Date/Time formatting applied to it
                            field_value_dt_text = d.Value.ToString( date_time_format );
                       
                
                catch
                
                    field_value_dt_text = null;
                
            

            row[i_col] = (field_value_dt_text == null) ? field_value_text : field_value_dt_text;
        
        datatable.Rows.Add( row );
    
    return datatable;

【讨论】:

以上是关于如何解释 Excel 数字格式字符串以确定值是不是应由 DateTime.FromOADate 解析的主要内容,如果未能解决你的问题,请参考以下文章

使用python以两位小数格式化excel数字

excel2010,如何让会计专用格式负数显示负号,而不是括号?

excel中长数字字符串会默认变成科学计数法,如何去除?

如何强制Excel使用Apache POI以西班牙语语言环境显示值

java读取excel时间格式出现数字怎么处理

如何excel在数字后批量加上逗号