使用 .NET,如何根据文件签名而不是扩展名找到文件的 mime 类型

Posted

技术标签:

【中文标题】使用 .NET,如何根据文件签名而不是扩展名找到文件的 mime 类型【英文标题】:Using .NET, how can you find the mime type of a file based on the file signature not the extension 【发布时间】:2010-09-08 16:20:09 【问题描述】:

我正在寻找一种简单的方法来获取文件扩展名不正确或未给出的 mime 类型,类似于 this question 仅在 .Net 中。

【问题讨论】:

这听起来类似于this question。 当要求明确指出不要使用扩展名时,我希望我可以删除所有仍在使用文件扩展名的“假答案”! 这可能是一个老问题,但问题仍然存在。我会在这里对每个答案投反对票,因为他们只通过内容检查 Windows 可执行文件; Linux 或 ios 可执行文件或危险文件呢? @PhillipH 为这些写一个答案。 【参考方案1】:

我写了一个 mime 类型的验证器。欢迎分享给大家。

private readonly Dictionary<string, byte[]> _mimeTypes = new Dictionary<string, byte[]>
    
        "image/jpeg", new byte[] 255, 216, 255,
        "image/jpg", new byte[] 255, 216, 255,
        "image/pjpeg", new byte[] 255, 216, 255,
        "image/apng", new byte[] 137, 80, 78, 71, 13, 10, 26, 10, 0, 0, 0, 13, 73, 72, 68, 82,
        "image/png", new byte[] 137, 80, 78, 71, 13, 10, 26, 10, 0, 0, 0, 13, 73, 72, 68, 82,
        "image/bmp", new byte[] 66, 77,
        "image/gif", new byte[] 71, 73, 70, 56,
    ;

private bool ValidateMimeType(byte[] file, string contentType)
    
        var imageType = _mimeTypes.SingleOrDefault(x => x.Key.Equals(contentType));

        return file.Take(imageType.Value.Length).SequenceEqual(imageType.Value);
    

【讨论】:

【参考方案2】:

如果您想在非 Windows 环境中托管 ASP.NET 解决方案,来自 Nuget 的HeyRed.Mime.MimeGuesser.GuessMimeType 将是终极解决方案。

文件扩展名映射非常不安全。如果攻击者上传无效扩展名,映射字典将例如允许在 .jpg 文件中分发可执行文件。 因此,始终使用内容嗅探库来了解真正的内容类型。

 public  static string MimeTypeFrom(byte[] dataBytes, string fileName)
 
        var contentType = HeyRed.Mime.MimeGuesser.GuessMimeType(dataBytes);
        if (string.IsNullOrEmpty(contentType))
        
            return HeyRed.Mime.MimeTypesMap.GetMimeType(fileName);
        
  return contentType;

【讨论】:

到目前为止,我尝试过的最好的库。找到我放在文件夹中的每个文件的内容类型。 + .net 核心支持! 简直太棒了。我还尝试了许多库(Nuget 包、自定义类...)。这个是 UNIX 系统中最接近 File -bi [filename] 的。【参考方案3】:

您好,我已将 Winista.MimeDetect 项目改编为 .net 核心/框架,并回退到 urlmon.dll 可以随意使用它:nuget package。

   //init
   var mimeTypes = new MimeTypes();

   //usage by filepath
   var mimeType1 = mimeTypes.GetMimeTypeFromFile(filePath);

【讨论】:

gihub 代码示例在这里是错误的 github.com/GetoXs/MimeDetect 。没有过载mimeTypes.GetMimeTypeFromFile(bytes);【参考方案4】:

我发现运行这段代码有几个问题:

UInt32 mimetype;
FindMimeFromData(0, null, buffer, 256, null, 0, out mimetype, 0);

如果你尝试用 x64/Win10 运行它,你会得到

AccessViolationException "Attempted to read or write protected memory.
This is often an indication that other memory is corrupt"

感谢PtrToStringUni doesnt work in windows 10 和 @xanatos 这篇帖子

我修改了我的解决方案以在 x64 和 .NET Core 2.1 下运行:

   [DllImport("urlmon.dll", CharSet = CharSet.Unicode, ExactSpelling = true, 
    SetLastError = false)]
    static extern int FindMimeFromData(IntPtr pBC,
        [MarshalAs(UnmanagedType.LPWStr)] string pwzUrl,
        [MarshalAs(UnmanagedType.LPArray, ArraySubType=UnmanagedType.I1, 
        SizeParamIndex=3)]
        byte[] pBuffer,
        int cbSize,
        [MarshalAs(UnmanagedType.LPWStr)] string pwzMimeProposed,
        int dwMimeFlags,
        out IntPtr ppwzMimeOut,
        int dwReserved);

   string getMimeFromFile(byte[] fileSource)
   
            byte[] buffer = new byte[256];
            using (Stream stream = new MemoryStream(fileSource))
            
                if (stream.Length >= 256)
                    stream.Read(buffer, 0, 256);
                else
                    stream.Read(buffer, 0, (int)stream.Length);
            

            try
            
                IntPtr mimeTypePtr;
                FindMimeFromData(IntPtr.Zero, null, buffer, buffer.Length,
                    null, 0, out mimeTypePtr, 0);

                string mime = Marshal.PtrToStringUni(mimeTypePtr);
                Marshal.FreeCoTaskMem(mimeTypePtr);
                return mime;
            
            catch (Exception ex)
            
                return "unknown/unknown";
            
   

谢谢

【讨论】:

【参考方案5】:

如果您使用的是 .NET Framework 4.5 或更高版本,现在有一个 MimeMapping.GetMimeMapping(filename) 方法,该方法将返回一个字符串,其中包含传递的文件名的正确 Mime 映射。请注意,这使用文件扩展名,而不是文件本身中的数据。

文档位于http://msdn.microsoft.com/en-us/library/system.web.mimemapping.getmimemapping

【讨论】:

这对我有用,只需要一行代码。 var mimetype = System.Web.MimeMapping.GetMimeMapping(&lt;pathToFile&gt;); 这没有回答原始问题“如果文件扩展名不正确或丢失”。 GetMimeMapping 仅使用扩展和 mime 条目的静态字典。 如果这门课很有用的话我发现了:) 我建议编辑您的评论,注意这在内部使用文件扩展名,很容易伪造。 通常情况下,我不会对答案投反对票,但作为这个误导性的答案,我这样做了。问题是关于不信任文件扩展名【参考方案6】:

我最终使用了 Netomatix 的 Winista MimeDetector。创建账号后即可免费下载源代码:http://www.netomatix.com/Products/DocumentManagement/MimeDetector.aspx

MimeTypes g_MimeTypes = new MimeTypes("mime-types.xml");
sbyte [] fileData = null;

using (System.IO.FileStream srcFile = new System.IO.FileStream(strFile, System.IO.FileMode.Open))

    byte [] data = new byte[srcFile.Length];
    srcFile.Read(data, 0, (Int32)srcFile.Length);
    fileData = Winista.Mime.SupportUtil.ToSByteArray(data);


MimeType oMimeType = g_MimeTypes.GetMimeType(fileData);

这是此处回答的另一个问题的一部分:Alternative to FindMimeFromData method in Urlmon.dll one which has more MIME types 我认为这个问题的最佳解决方案。

【讨论】:

【参考方案7】:

当使用 Windows Azure Web 角色或任何其他以有限信任运行您的应用程序的主机时,请不要忘记您将无权访问注册表或非托管代码。混合方法 - try-catch-for-registry 和内存字典的组合看起来像是一个很好的解决方案,它包含了所有的东西。

我用这段代码来做:

public class DefaultMimeResolver : IMimeResolver

    private readonly IFileRepository _fileRepository;

    public DefaultMimeResolver(IFileRepository fileRepository)
    
        _fileRepository = fileRepository;
    

    [DllImport(@"urlmon.dll", CharSet = CharSet.Auto)]
    private static extern System.UInt32 FindMimeFromData(
        System.UInt32 pBC, [MarshalAs(UnmanagedType.LPStr)] System.String pwzUrl,
         [MarshalAs(UnmanagedType.LPArray)] byte[] pBuffer,
         System.UInt32 cbSize,
         [MarshalAs(UnmanagedType.LPStr)] System.String pwzMimeProposed,
         System.UInt32 dwMimeFlags,
         out System.UInt32 ppwzMimeOut,
         System.UInt32 dwReserverd);


    public string GetMimeTypeFromFileExtension(string fileExtension)
    
        if (string.IsNullOrEmpty(fileExtension))
        
            throw new ArgumentNullException("fileExtension");
        

        string mimeType = GetMimeTypeFromList(fileExtension);

        if (String.IsNullOrEmpty(mimeType))
        
            mimeType = GetMimeTypeFromRegistry(fileExtension);
        

        return mimeType;
    

    public string GetMimeTypeFromFile(string filePath)
    
        if (string.IsNullOrEmpty(filePath))
        
            throw new ArgumentNullException("filePath");
        

        if (!File.Exists(filePath))
        
            throw new FileNotFoundException("File not found : ", filePath);
        

        string mimeType = GetMimeTypeFromList(Path.GetExtension(filePath).ToLower());

        if (String.IsNullOrEmpty(mimeType))
        
            mimeType = GetMimeTypeFromRegistry(Path.GetExtension(filePath).ToLower());

            if (String.IsNullOrEmpty(mimeType))
            
                mimeType = GetMimeTypeFromFileInternal(filePath);
            
        

        return mimeType;
    

    private string GetMimeTypeFromList(string fileExtension)
    
        string mimeType = null;

        if (fileExtension.StartsWith("."))
        
            fileExtension = fileExtension.TrimStart('.');
        

        if (!String.IsNullOrEmpty(fileExtension) && _mimeTypes.ContainsKey(fileExtension))
        
            mimeType = _mimeTypes[fileExtension];
        

        return mimeType;
    

    private string GetMimeTypeFromRegistry(string fileExtension)
    
        string mimeType = null;
        try
        
            RegistryKey key = Registry.ClassesRoot.OpenSubKey(fileExtension);

            if (key != null && key.GetValue("Content Type") != null)
            
                mimeType = key.GetValue("Content Type").ToString();
            
        
        catch (Exception)
        
            // Empty. When this code is running in limited mode accessing registry is not allowed.
        

        return mimeType;
    

    private string GetMimeTypeFromFileInternal(string filePath)
    
        string mimeType = null;

        if (!File.Exists(filePath))
        
            return null;
        

        byte[] byteBuffer = new byte[256];

        using (FileStream fileStream = _fileRepository.Get(filePath))
        
            if (fileStream.Length >= 256)
            
                fileStream.Read(byteBuffer, 0, 256);
            
            else
            
                fileStream.Read(byteBuffer, 0, (int)fileStream.Length);
            
        

        try
        
            UInt32 MimeTypeNum;

            FindMimeFromData(0, null, byteBuffer, 256, null, 0, out MimeTypeNum, 0);

            IntPtr mimeTypePtr = new IntPtr(MimeTypeNum);
            string mimeTypeFromFile = Marshal.PtrToStringUni(mimeTypePtr);

            Marshal.FreeCoTaskMem(mimeTypePtr);

            if (!String.IsNullOrEmpty(mimeTypeFromFile) && mimeTypeFromFile != "text/plain" && mimeTypeFromFile != "application/octet-stream")
            
                mimeType = mimeTypeFromFile;
            
        
        catch
        
            // Empty. 
        

        return mimeType;
    

    private readonly Dictionary<string, string> _mimeTypes = new Dictionary<string, string>
        
            "ai", "application/postscript",
            "aif", "audio/x-aiff",
            "aifc", "audio/x-aiff",
            "aiff", "audio/x-aiff",
            "asc", "text/plain",
            "atom", "application/atom+xml",
            "au", "audio/basic",
            "avi", "video/x-msvideo",
            "bcpio", "application/x-bcpio",
            "bin", "application/octet-stream",
            "bmp", "image/bmp",
            "cdf", "application/x-netcdf",
            "cgm", "image/cgm",
            "class", "application/octet-stream",
            "cpio", "application/x-cpio",
            "cpt", "application/mac-compactpro",
            "csh", "application/x-csh",
            "css", "text/css",
            "dcr", "application/x-director",
            "dif", "video/x-dv",
            "dir", "application/x-director",
            "djv", "image/vnd.djvu",
            "djvu", "image/vnd.djvu",
            "dll", "application/octet-stream",
            "dmg", "application/octet-stream",
            "dms", "application/octet-stream",
            "doc", "application/msword",
            "docx", "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
            "dotx", "application/vnd.openxmlformats-officedocument.wordprocessingml.template",
            "docm", "application/vnd.ms-word.document.macroEnabled.12",
            "dotm", "application/vnd.ms-word.template.macroEnabled.12",
            "dtd", "application/xml-dtd",
            "dv", "video/x-dv",
            "dvi", "application/x-dvi",
            "dxr", "application/x-director",
            "eps", "application/postscript",
            "etx", "text/x-setext",
            "exe", "application/octet-stream",
            "ez", "application/andrew-inset",
            "gif", "image/gif",
            "gram", "application/srgs",
            "grxml", "application/srgs+xml",
            "gtar", "application/x-gtar",
            "hdf", "application/x-hdf",
            "hqx", "application/mac-binhex40",
            "htc", "text/x-component",
            "htm", "text/html",
            "html", "text/html",
            "ice", "x-conference/x-cooltalk",
            "ico", "image/x-icon",
            "ics", "text/calendar",
            "ief", "image/ief",
            "ifb", "text/calendar",
            "iges", "model/iges",
            "igs", "model/iges",
            "jnlp", "application/x-java-jnlp-file",
            "jp2", "image/jp2",
            "jpe", "image/jpeg",
            "jpeg", "image/jpeg",
            "jpg", "image/jpeg",
            "js", "application/x-javascript",
            "kar", "audio/midi",
            "latex", "application/x-latex",
            "lha", "application/octet-stream",
            "lzh", "application/octet-stream",
            "m3u", "audio/x-mpegurl",
            "m4a", "audio/mp4a-latm",
            "m4b", "audio/mp4a-latm",
            "m4p", "audio/mp4a-latm",
            "m4u", "video/vnd.mpegurl",
            "m4v", "video/x-m4v",
            "mac", "image/x-macpaint",
            "man", "application/x-troff-man",
            "mathml", "application/mathml+xml",
            "me", "application/x-troff-me",
            "mesh", "model/mesh",
            "mid", "audio/midi",
            "midi", "audio/midi",
            "mif", "application/vnd.mif",
            "mov", "video/quicktime",
            "movie", "video/x-sgi-movie",
            "mp2", "audio/mpeg",
            "mp3", "audio/mpeg",
            "mp4", "video/mp4",
            "mpe", "video/mpeg",
            "mpeg", "video/mpeg",
            "mpg", "video/mpeg",
            "mpga", "audio/mpeg",
            "ms", "application/x-troff-ms",
            "msh", "model/mesh",
            "mxu", "video/vnd.mpegurl",
            "nc", "application/x-netcdf",
            "oda", "application/oda",
            "ogg", "application/ogg",
            "pbm", "image/x-portable-bitmap",
            "pct", "image/pict",
            "pdb", "chemical/x-pdb",
            "pdf", "application/pdf",
            "pgm", "image/x-portable-graymap",
            "pgn", "application/x-chess-pgn",
            "pic", "image/pict",
            "pict", "image/pict",
            "png", "image/png",
            "pnm", "image/x-portable-anymap",
            "pnt", "image/x-macpaint",
            "pntg", "image/x-macpaint",
            "ppm", "image/x-portable-pixmap",
            "ppt", "application/vnd.ms-powerpoint",
            "pptx", "application/vnd.openxmlformats-officedocument.presentationml.presentation",
            "potx", "application/vnd.openxmlformats-officedocument.presentationml.template",
            "ppsx", "application/vnd.openxmlformats-officedocument.presentationml.slideshow",
            "ppam", "application/vnd.ms-powerpoint.addin.macroEnabled.12",
            "pptm", "application/vnd.ms-powerpoint.presentation.macroEnabled.12",
            "potm", "application/vnd.ms-powerpoint.template.macroEnabled.12",
            "ppsm", "application/vnd.ms-powerpoint.slideshow.macroEnabled.12",
            "ps", "application/postscript",
            "qt", "video/quicktime",
            "qti", "image/x-quicktime",
            "qtif", "image/x-quicktime",
            "ra", "audio/x-pn-realaudio",
            "ram", "audio/x-pn-realaudio",
            "ras", "image/x-cmu-raster",
            "rdf", "application/rdf+xml",
            "rgb", "image/x-rgb",
            "rm", "application/vnd.rn-realmedia",
            "roff", "application/x-troff",
            "rtf", "text/rtf",
            "rtx", "text/richtext",
            "sgm", "text/sgml",
            "sgml", "text/sgml",
            "sh", "application/x-sh",
            "shar", "application/x-shar",
            "silo", "model/mesh",
            "sit", "application/x-stuffit",
            "skd", "application/x-koan",
            "skm", "application/x-koan",
            "skp", "application/x-koan",
            "skt", "application/x-koan",
            "smi", "application/smil",
            "smil", "application/smil",
            "snd", "audio/basic",
            "so", "application/octet-stream",
            "spl", "application/x-futuresplash",
            "src", "application/x-wais-source",
            "sv4cpio", "application/x-sv4cpio",
            "sv4crc", "application/x-sv4crc",
            "svg", "image/svg+xml",
            "swf", "application/x-shockwave-flash",
            "t", "application/x-troff",
            "tar", "application/x-tar",
            "tcl", "application/x-tcl",
            "tex", "application/x-tex",
            "texi", "application/x-texinfo",
            "texinfo", "application/x-texinfo",
            "tif", "image/tiff",
            "tiff", "image/tiff",
            "tr", "application/x-troff",
            "tsv", "text/tab-separated-values",
            "txt", "text/plain",
            "ustar", "application/x-ustar",
            "vcd", "application/x-cdlink",
            "vrml", "model/vrml",
            "vxml", "application/voicexml+xml",
            "wav", "audio/x-wav",
            "wbmp", "image/vnd.wap.wbmp",
            "wbmxl", "application/vnd.wap.wbxml",
            "wml", "text/vnd.wap.wml",
            "wmlc", "application/vnd.wap.wmlc",
            "wmls", "text/vnd.wap.wmlscript",
            "wmlsc", "application/vnd.wap.wmlscriptc",
            "wrl", "model/vrml",
            "xbm", "image/x-xbitmap",
            "xht", "application/xhtml+xml",
            "xhtml", "application/xhtml+xml",
            "xls", "application/vnd.ms-excel",
            "xml", "application/xml",
            "xpm", "image/x-xpixmap",
            "xsl", "application/xml",
            "xlsx", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
            "xltx", "application/vnd.openxmlformats-officedocument.spreadsheetml.template",
            "xlsm", "application/vnd.ms-excel.sheet.macroEnabled.12",
            "xltm", "application/vnd.ms-excel.template.macroEnabled.12",
            "xlam", "application/vnd.ms-excel.addin.macroEnabled.12",
            "xlsb", "application/vnd.ms-excel.sheet.binary.macroEnabled.12",
            "xslt", "application/xslt+xml",
            "xul", "application/vnd.mozilla.xul+xml",
            "xwd", "image/x-xwindowdump",
            "xyz", "chemical/x-xyz",
            "zip", "application/zip"
        ;

【讨论】:

真诚地感谢任何关于否决票的评论 - 我真的很想了解此代码的任何潜在不当行为。 当您使用 GetMimeTypeFromFileInternal 时,我在您的代码中看不到任何 try-catch 块。因此,看起来默认情况下您只需检查文件扩展名,如果您不确定文件中的实际内容,这并没有真正的帮助。而且我仍然无法理解,GetMimeTypeFromFileInternal 在 Azure 中是否以有限的信任工作?如果不是,那为什么它还在代码中? 在执行上下文受限的情况下,可以限制代码只使用列表。是的,还有更多的扩展,但只有开发人员知道应用程序的上下文并且可以添加更多到列表中。可以肯定的是,try-catch 是一个很好的补充。【参考方案8】:

编辑:只需使用Mime Detective

我使用字节数组序列来确定给定文件的正确 MIME 类型。与仅查看文件名的文件扩展名相比,这样做的好处是,如果用户要重命名文件以绕过某些文件类型上传限制,则文件扩展名将无法捕捉到这一点。另一方面,通过字节数组获取文件签名将阻止这种恶作剧行为的发生。

这是一个 C# 示例:

public class MimeType

    private static readonly byte[] BMP =  66, 77 ;
    private static readonly byte[] DOC =  208, 207, 17, 224, 161, 177, 26, 225 ;
    private static readonly byte[] EXE_DLL =  77, 90 ;
    private static readonly byte[] GIF =  71, 73, 70, 56 ;
    private static readonly byte[] ICO =  0, 0, 1, 0 ;
    private static readonly byte[] JPG =  255, 216, 255 ;
    private static readonly byte[] MP3 =  255, 251, 48 ;
    private static readonly byte[] OGG =  79, 103, 103, 83, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0 ;
    private static readonly byte[] PDF =  37, 80, 68, 70, 45, 49, 46 ;
    private static readonly byte[] PNG =  137, 80, 78, 71, 13, 10, 26, 10, 0, 0, 0, 13, 73, 72, 68, 82 ;
    private static readonly byte[] RAR =  82, 97, 114, 33, 26, 7, 0 ;
    private static readonly byte[] SWF =  70, 87, 83 ;
    private static readonly byte[] TIFF =  73, 73, 42, 0 ;
    private static readonly byte[] TORRENT =  100, 56, 58, 97, 110, 110, 111, 117, 110, 99, 101 ;
    private static readonly byte[] TTF =  0, 1, 0, 0, 0 ;
    private static readonly byte[] WAV_AVI =  82, 73, 70, 70 ;
    private static readonly byte[] WMV_WMA =  48, 38, 178, 117, 142, 102, 207, 17, 166, 217, 0, 170, 0, 98, 206, 108 ;
    private static readonly byte[] ZIP_DOCX =  80, 75, 3, 4 ;

    public static string GetMimeType(byte[] file, string fileName)
    

        string mime = "application/octet-stream"; //DEFAULT UNKNOWN MIME TYPE

        //Ensure that the filename isn't empty or null
        if (string.IsNullOrWhiteSpace(fileName))
        
            return mime;
        

        //Get the file extension
        string extension = Path.GetExtension(fileName) == null
                               ? string.Empty
                               : Path.GetExtension(fileName).ToUpper();

        //Get the MIME Type
        if (file.Take(2).SequenceEqual(BMP))
        
            mime = "image/bmp";
        
        else if (file.Take(8).SequenceEqual(DOC))
        
            mime = "application/msword";
        
        else if (file.Take(2).SequenceEqual(EXE_DLL))
        
            mime = "application/x-msdownload"; //both use same mime type
        
        else if (file.Take(4).SequenceEqual(GIF))
        
            mime = "image/gif";
        
        else if (file.Take(4).SequenceEqual(ICO))
        
            mime = "image/x-icon";
        
        else if (file.Take(3).SequenceEqual(JPG))
        
            mime = "image/jpeg";
        
        else if (file.Take(3).SequenceEqual(MP3))
        
            mime = "audio/mpeg";
        
        else if (file.Take(14).SequenceEqual(OGG))
        
            if (extension == ".OGX")
            
                mime = "application/ogg";
            
            else if (extension == ".OGA")
            
                mime = "audio/ogg";
            
            else
            
                mime = "video/ogg";
            
        
        else if (file.Take(7).SequenceEqual(PDF))
        
            mime = "application/pdf";
        
        else if (file.Take(16).SequenceEqual(PNG))
        
            mime = "image/png";
        
        else if (file.Take(7).SequenceEqual(RAR))
        
            mime = "application/x-rar-compressed";
        
        else if (file.Take(3).SequenceEqual(SWF))
        
            mime = "application/x-shockwave-flash";
        
        else if (file.Take(4).SequenceEqual(TIFF))
        
            mime = "image/tiff";
        
        else if (file.Take(11).SequenceEqual(TORRENT))
        
            mime = "application/x-bittorrent";
        
        else if (file.Take(5).SequenceEqual(TTF))
        
            mime = "application/x-font-ttf";
        
        else if (file.Take(4).SequenceEqual(WAV_AVI))
        
            mime = extension == ".AVI" ? "video/x-msvideo" : "audio/x-wav";
        
        else if (file.Take(16).SequenceEqual(WMV_WMA))
        
            mime = extension == ".WMA" ? "audio/x-ms-wma" : "video/x-ms-wmv";
        
        else if (file.Take(4).SequenceEqual(ZIP_DOCX))
        
            mime = extension == ".DOCX" ? "application/vnd.openxmlformats-officedocument.wordprocessingml.document" : "application/x-zip-compressed";
        

        return mime;
    



请注意,我处理 DOCX 文件类型的方式不同,因为 DOCX 实际上只是一个 ZIP 文件。在这种情况下,一旦我验证它具有该序列,我只需检查文件扩展名。对于某些人来说,这个示例还远未完成,但您可以轻松添加自己的示例。

如果你想添加更多的MIME类型,你可以得到很多不同文件类型的字节数组序列from here。另外,here is another good resource 涉及文件签名。

如果所有其他方法都失败了,我经常做的是逐步浏览我正在寻找的特定类型的多个文件,并在文件的字节序列中寻找一个模式。归根结底,这仍然是基本的验证,不能用于 100% 证明确定文件类型。

【讨论】:

感谢@ROFLwTIME - 当我们有一个字节数组但没有文件名/扩展名时,我已经对此进行了一些改进。 (当然,对于某些 mime 类型,它需要默认设置,或者需要进一步增强才能正确识别)。但如果有人想让我发布代码,请告诉我。 +1 用于使用字节。现在,甚至可以获取特定 mime 类型的预期字节数来测试它(不是默认值 256)。但是,我会在这里选择一个结构,将扩展名、字节和 mime/type 作为属性,并可能保留一个预定义结构的字典。这将为我省去无休止的 if-else 检查 这种方法的问题是,例如,以“MZ”开头的文本文件将被解释为.EXE 文件。换句话说,您至少应该考虑在所有情况下进行扩展,加上更长的签名或每个格式的启发式方法以避免误报。 @Nutshell 我相信 XLS 末尾有一个 0 字节,而 DOC 没有,所以先检查 XLS,然后检查 DOC。至于 XLSX/DOCX,它们确实共享相同的签名 (ZIP),因此要区分它们,您需要比阅读标题更深入。例如,XLSX 文件在标题附近有字符串“xl/_rels/workbook.xml.rels”,而 DOCX 文件在标题附近有字符串“word/_rels/document.xml.rels”。这只是尝试区分这些特定类型的众多方法之一,它肯定不会涵盖 100% 的场景。 (例如,包含 DOCX/XLSX 文件的 Zip 文件) 大家好。我是如何在 github 上将原始 FileTypeDetective 分叉到 MimeDetective 的人。如果有用,我很高兴。我已经与开发人员 trailmax 进行了交谈。我们已将许可证更改为 MIT!【参考方案9】:

@Steve Morgan 和@Richard Gourlay 这是一个很好的解决方案,谢谢。一个小缺点是,当文件中的字节数为 255 或更低时,mime 类型有时会产生“application/octet-stream”,这对于预期会产生“text/plain”的文件来说有点不准确。我已更新您的原始方法以解决这种情况,如下所示:

如果文件中的字节数小于或等于 255,并且推断出的 mime 类型为“application/octet-stream”,则创建一个新的字节数组,该数组由重复 n 次的原始文件字节组成,直到总字节数 >= 256。然后重新检查该新字节数组的 mime 类型。

修改方法:

Imports System.Runtime.InteropServices

<DllImport("urlmon.dll", CharSet:=CharSet.Auto)> _
Private Shared Function FindMimeFromData(pBC As System.UInt32, <MarshalAs(UnmanagedType.LPStr)> pwzUrl As System.String, <MarshalAs(UnmanagedType.LPArray)> pBuffer As Byte(), cbSize As System.UInt32, <MarshalAs(UnmanagedType.LPStr)> pwzMimeProposed As System.String, dwMimeFlags As System.UInt32, _
ByRef ppwzMimeOut As System.UInt32, dwReserverd As System.UInt32) As System.UInt32
End Function
Private Function GetMimeType(ByVal f As FileInfo) As String
    'See http://***.com/questions/58510/using-net-how-can-you-find-the-mime-type-of-a-file-based-on-the-file-signature
    Dim returnValue As String = ""
    Dim fileStream As FileStream = Nothing
    Dim fileStreamLength As Long = 0
    Dim fileStreamIsLessThanBByteSize As Boolean = False

    Const byteSize As Integer = 255
    Const bbyteSize As Integer = byteSize + 1

    Const ambiguousMimeType As String = "application/octet-stream"
    Const unknownMimeType As String = "unknown/unknown"

    Dim buffer As Byte() = New Byte(byteSize) 
    Dim fnGetMimeTypeValue As New Func(Of Byte(), Integer, String)(
        Function(_buffer As Byte(), _bbyteSize As Integer) As String
            Dim _returnValue As String = ""
            Dim mimeType As UInt32 = 0
            FindMimeFromData(0, Nothing, _buffer, _bbyteSize, Nothing, 0, mimeType, 0)
            Dim mimeTypePtr As IntPtr = New IntPtr(mimeType)
            _returnValue = Marshal.PtrToStringUni(mimeTypePtr)
            Marshal.FreeCoTaskMem(mimeTypePtr)
            Return _returnValue
        End Function)

    If (f.Exists()) Then
        Try
            fileStream = New FileStream(f.FullName(), FileMode.Open, FileAccess.Read, FileShare.ReadWrite)
            fileStreamLength = fileStream.Length()

            If (fileStreamLength >= bbyteSize) Then
                fileStream.Read(buffer, 0, bbyteSize)
            Else
                fileStreamIsLessThanBByteSize = True
                fileStream.Read(buffer, 0, CInt(fileStreamLength))
            End If

            returnValue = fnGetMimeTypeValue(buffer, bbyteSize)

            If (returnValue.Equals(ambiguousMimeType, StringComparison.OrdinalIgnoreCase) AndAlso fileStreamIsLessThanBByteSize AndAlso fileStreamLength > 0) Then
                'Duplicate the stream content until the stream length is >= bbyteSize to get a more deterministic mime type analysis.
                Dim currentBuffer As Byte() = buffer.Take(fileStreamLength).ToArray()
                Dim repeatCount As Integer = Math.Floor((bbyteSize / fileStreamLength) + 1)
                Dim bBufferList As List(Of Byte) = New List(Of Byte)
                While (repeatCount > 0)
                    bBufferList.AddRange(currentBuffer)
                    repeatCount -= 1
                End While
                Dim bbuffer As Byte() = bBufferList.Take(bbyteSize).ToArray()
                returnValue = fnGetMimeTypeValue(bbuffer, bbyteSize)
            End If
        Catch ex As Exception
            returnValue = unknownMimeType
        Finally
            If (fileStream IsNot Nothing) Then fileStream.Close()
        End Try
    End If
    Return returnValue
End Function

【讨论】:

这是我遇到的问题,您的想法很棒,可以复制字节。我必须在 c# 中实现它,但是使用文件的长度和具有文件第一个字节的缓冲区,我能够遍历所有丢失的字节并复制数组中的字节以重复文件(我只是从 idx 中复制了数组中较早文件长度的字节)。【参考方案10】:

此答案是作者答案 (Richard Gourlay) 的副本,但根据 Rohland 指向 http://www.pinvoke.net/default.aspx/urlmon.findmimefromdata 的评论进行了改进以解决 IIS 8 / win2012 上的问题(函数会导致应用程序池崩溃)

using System.Runtime.InteropServices;

...

public static string GetMimeFromFile(string filename)


    if (!File.Exists(filename))
        throw new FileNotFoundException(filename + " not found");

    const int maxContent = 256;

    var buffer = new byte[maxContent];
    using (var fs = new FileStream(filename, FileMode.Open))
    
        if (fs.Length >= maxContent)
            fs.Read(buffer, 0, maxContent);
        else
            fs.Read(buffer, 0, (int) fs.Length);
    

    var mimeTypePtr = IntPtr.Zero;
    try
    
        var result = FindMimeFromData(IntPtr.Zero, null, buffer, maxContent, null, 0, out mimeTypePtr, 0);
        if (result != 0)
        
            Marshal.FreeCoTaskMem(mimeTypePtr);
            throw Marshal.GetExceptionForHR(result);
        

        var mime = Marshal.PtrToStringUni(mimeTypePtr);
        Marshal.FreeCoTaskMem(mimeTypePtr);
        return mime;
    
    catch (Exception e)
    
        if (mimeTypePtr != IntPtr.Zero)
        
            Marshal.FreeCoTaskMem(mimeTypePtr);
        
        return "unknown/unknown";
    


[DllImport("urlmon.dll", CharSet = CharSet.Unicode, ExactSpelling = true, SetLastError = false)]
private static extern int FindMimeFromData(IntPtr pBC,
    [MarshalAs(UnmanagedType.LPWStr)] string pwzUrl,
    [MarshalAs(UnmanagedType.LPArray, ArraySubType = UnmanagedType.I1, SizeParamIndex = 3)] byte[] pBuffer,
    int cbSize,
    [MarshalAs(UnmanagedType.LPWStr)] string pwzMimeProposed,
    int dwMimeFlags,
    out IntPtr ppwzMimeOut,
    int dwReserved);

【讨论】:

不错的 C+P 答案。克里斯【参考方案11】:

此类使用以前的答案尝试 3 种不同的方式:基于扩展的硬编码、FindMimeFromData API 和使用注册表。

using System;
using System.Collections.Generic;
using System.IO;
using System.Runtime.InteropServices;

using Microsoft.Win32;

namespace YourNamespace

    public static class MimeTypeParser
    
        [DllImport(@"urlmon.dll", CharSet = CharSet.Auto)]
        private extern static System.UInt32 FindMimeFromData(
                System.UInt32 pBC,
                [MarshalAs(UnmanagedType.LPStr)] System.String pwzUrl,
                [MarshalAs(UnmanagedType.LPArray)] byte[] pBuffer,
                System.UInt32 cbSize,
                [MarshalAs(UnmanagedType.LPStr)] System.String pwzMimeProposed,
                System.UInt32 dwMimeFlags,
                out System.UInt32 ppwzMimeOut,
                System.UInt32 dwReserverd
        );

        public static string GetMimeType(string sFilePath)
        
            string sMimeType = GetMimeTypeFromList(sFilePath);

            if (String.IsNullOrEmpty(sMimeType))
            
                sMimeType = GetMimeTypeFromFile(sFilePath);

                if (String.IsNullOrEmpty(sMimeType))
                
                    sMimeType = GetMimeTypeFromRegistry(sFilePath);
                
            

            return sMimeType;
        

        public static string GetMimeTypeFromList(string sFileNameOrPath)
        
            string sMimeType = null;
            string sExtensionWithoutDot = Path.GetExtension(sFileNameOrPath).Substring(1).ToLower();

            if (!String.IsNullOrEmpty(sExtensionWithoutDot) && spDicMIMETypes.ContainsKey(sExtensionWithoutDot))
            
                sMimeType = spDicMIMETypes[sExtensionWithoutDot];
            

            return sMimeType;
        

        public static string GetMimeTypeFromRegistry(string sFileNameOrPath)
        
            string sMimeType = null;
            string sExtension = Path.GetExtension(sFileNameOrPath).ToLower();
            RegistryKey pKey = Registry.ClassesRoot.OpenSubKey(sExtension);

            if (pKey != null && pKey.GetValue("Content Type") != null)
            
                sMimeType = pKey.GetValue("Content Type").ToString();
            

            return sMimeType;
        

        public static string GetMimeTypeFromFile(string sFilePath)
        
            string sMimeType = null;

            if (File.Exists(sFilePath))
            
                byte[] abytBuffer = new byte[256];

                using (FileStream pFileStream = new FileStream(sFilePath, FileMode.Open))
                
                    if (pFileStream.Length >= 256)
                    
                        pFileStream.Read(abytBuffer, 0, 256);
                    
                    else
                    
                        pFileStream.Read(abytBuffer, 0, (int)pFileStream.Length);
                    
                

                try
                
                    UInt32 unMimeType;

                    FindMimeFromData(0, null, abytBuffer, 256, null, 0, out unMimeType, 0);

                    IntPtr pMimeType = new IntPtr(unMimeType);
                    string sMimeTypeFromFile = Marshal.PtrToStringUni(pMimeType);

                    Marshal.FreeCoTaskMem(pMimeType);

                    if (!String.IsNullOrEmpty(sMimeTypeFromFile) && sMimeTypeFromFile != "text/plain" && sMimeTypeFromFile != "application/octet-stream")
                    
                        sMimeType = sMimeTypeFromFile;
                    
                
                catch 
            

            return sMimeType;
        

        private static readonly Dictionary<string, string> spDicMIMETypes = new Dictionary<string, string>
        
            "ai", "application/postscript",
            "aif", "audio/x-aiff",
            "aifc", "audio/x-aiff",
            "aiff", "audio/x-aiff",
            "asc", "text/plain",
            "atom", "application/atom+xml",
            "au", "audio/basic",
            "avi", "video/x-msvideo",
            "bcpio", "application/x-bcpio",
            "bin", "application/octet-stream",
            "bmp", "image/bmp",
            "cdf", "application/x-netcdf",
            "cgm", "image/cgm",
            "class", "application/octet-stream",
            "cpio", "application/x-cpio",
            "cpt", "application/mac-compactpro",
            "csh", "application/x-csh",
            "css", "text/css",
            "dcr", "application/x-director",
            "dif", "video/x-dv",
            "dir", "application/x-director",
            "djv", "image/vnd.djvu",
            "djvu", "image/vnd.djvu",
            "dll", "application/octet-stream",
            "dmg", "application/octet-stream",
            "dms", "application/octet-stream",
            "doc", "application/msword",
            "docx","application/vnd.openxmlformats-officedocument.wordprocessingml.document",
            "dotx", "application/vnd.openxmlformats-officedocument.wordprocessingml.template",
            "docm","application/vnd.ms-word.document.macroEnabled.12",
            "dotm","application/vnd.ms-word.template.macroEnabled.12",
            "dtd", "application/xml-dtd",
            "dv", "video/x-dv",
            "dvi", "application/x-dvi",
            "dxr", "application/x-director",
            "eps", "application/postscript",
            "etx", "text/x-setext",
            "exe", "application/octet-stream",
            "ez", "application/andrew-inset",
            "gif", "image/gif",
            "gram", "application/srgs",
            "grxml", "application/srgs+xml",
            "gtar", "application/x-gtar",
            "hdf", "application/x-hdf",
            "hqx", "application/mac-binhex40",
            "htc", "text/x-component",
            "htm", "text/html",
            "html", "text/html",
            "ice", "x-conference/x-cooltalk",
            "ico", "image/x-icon",
            "ics", "text/calendar",
            "ief", "image/ief",
            "ifb", "text/calendar",
            "iges", "model/iges",
            "igs", "model/iges",
            "jnlp", "application/x-java-jnlp-file",
            "jp2", "image/jp2",
            "jpe", "image/jpeg",
            "jpeg", "image/jpeg",
            "jpg", "image/jpeg",
            "js", "application/x-javascript",
            "kar", "audio/midi",
            "latex", "application/x-latex",
            "lha", "application/octet-stream",
            "lzh", "application/octet-stream",
            "m3u", "audio/x-mpegurl",
            "m4a", "audio/mp4a-latm",
            "m4b", "audio/mp4a-latm",
            "m4p", "audio/mp4a-latm",
            "m4u", "video/vnd.mpegurl",
            "m4v", "video/x-m4v",
            "mac", "image/x-macpaint",
            "man", "application/x-troff-man",
            "mathml", "application/mathml+xml",
            "me", "application/x-troff-me",
            "mesh", "model/mesh",
            "mid", "audio/midi",
            "midi", "audio/midi",
            "mif", "application/vnd.mif",
            "mov", "video/quicktime",
            "movie", "video/x-sgi-movie",
            "mp2", "audio/mpeg",
            "mp3", "audio/mpeg",
            "mp4", "video/mp4",
            "mpe", "video/mpeg",
            "mpeg", "video/mpeg",
            "mpg", "video/mpeg",
            "mpga", "audio/mpeg",
            "ms", "application/x-troff-ms",
            "msh", "model/mesh",
            "mxu", "video/vnd.mpegurl",
            "nc", "application/x-netcdf",
            "oda", "application/oda",
            "ogg", "application/ogg",
            "pbm", "image/x-portable-bitmap",
            "pct", "image/pict",
            "pdb", "chemical/x-pdb",
            "pdf", "application/pdf",
            "pgm", "image/x-portable-graymap",
            "pgn", "application/x-chess-pgn",
            "pic", "image/pict",
            "pict", "image/pict",
            "png", "image/png", 
            "pnm", "image/x-portable-anymap",
            "pnt", "image/x-macpaint",
            "pntg", "image/x-macpaint",
            "ppm", "image/x-portable-pixmap",
            "ppt", "application/vnd.ms-powerpoint",
            "pptx","application/vnd.openxmlformats-officedocument.presentationml.presentation",
            "potx","application/vnd.openxmlformats-officedocument.presentationml.template",
            "ppsx","application/vnd.openxmlformats-officedocument.presentationml.slideshow",
            "ppam","application/vnd.ms-powerpoint.addin.macroEnabled.12",
            "pptm","application/vnd.ms-powerpoint.presentation.macroEnabled.12",
            "potm","application/vnd.ms-powerpoint.template.macroEnabled.12",
            "ppsm","application/vnd.ms-powerpoint.slideshow.macroEnabled.12",
            "ps", "application/postscript",
            "qt", "video/quicktime",
            "qti", "image/x-quicktime",
            "qtif", "image/x-quicktime",
            "ra", "audio/x-pn-realaudio",
            "ram", "audio/x-pn-realaudio",
            "ras", "image/x-cmu-raster",
            "rdf", "application/rdf+xml",
            "rgb", "image/x-rgb",
            "rm", "application/vnd.rn-realmedia",
            "roff", "application/x-troff",
            "rtf", "text/rtf",
            "rtx", "text/richtext",
            "sgm", "text/sgml",
            "sgml", "text/sgml",
            "sh", "application/x-sh",
            "shar", "application/x-shar",
            "silo", "model/mesh",
            "sit", "application/x-stuffit",
            "skd", "application/x-koan",
            "skm", "application/x-koan",
            "skp", "application/x-koan",
            "skt", "application/x-koan",
            "smi", "application/smil",
            "smil", "application/smil",
            "snd", "audio/basic",
            "so", "application/octet-stream",
            "spl", "application/x-futuresplash",
            "src", "application/x-wais-source",
            "sv4cpio", "application/x-sv4cpio",
            "sv4crc", "application/x-sv4crc",
            "svg", "image/svg+xml",
            "swf", "application/x-shockwave-flash",
            "t", "application/x-troff",
            "tar", "application/x-tar",
            "tcl", "application/x-tcl",
            "tex", "application/x-tex",
            "texi", "application/x-texinfo",
            "texinfo", "application/x-texinfo",
            "tif", "image/tiff",
            "tiff", "image/tiff",
            "tr", "application/x-troff",
            "tsv", "text/tab-separated-values",
            "txt", "text/plain",
            "ustar", "application/x-ustar",
            "vcd", "application/x-cdlink",
            "vrml", "model/vrml",
            "vxml", "application/voicexml+xml",
            "wav", "audio/x-wav",
            "wbmp", "image/vnd.wap.wbmp",
            "wbmxl", "application/vnd.wap.wbxml",
            "wml", "text/vnd.wap.wml",
            "wmlc", "application/vnd.wap.wmlc",
            "wmls", "text/vnd.wap.wmlscript",
            "wmlsc", "application/vnd.wap.wmlscriptc",
            "wrl", "model/vrml",
            "xbm", "image/x-xbitmap",
            "xht", "application/xhtml+xml",
            "xhtml", "application/xhtml+xml",
            "xls", "application/vnd.ms-excel",                                                
            "xml", "application/xml",
            "xpm", "image/x-xpixmap",
            "xsl", "application/xml",
            "xlsx","application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
            "xltx","application/vnd.openxmlformats-officedocument.spreadsheetml.template",
            "xlsm","application/vnd.ms-excel.sheet.macroEnabled.12",
            "xltm","application/vnd.ms-excel.template.macroEnabled.12",
            "xlam","application/vnd.ms-excel.addin.macroEnabled.12",
            "xlsb","application/vnd.ms-excel.sheet.binary.macroEnabled.12",
            "xslt", "application/xslt+xml",
            "xul", "application/vnd.mozilla.xul+xml",
            "xwd", "image/x-xwindowdump",
            "xyz", "chemical/x-xyz",
            "zip", "application/zip"
        ;
    

【讨论】:

不要忘记在注册表周围进行 Try-Catch - 您将不被允许在受限模式下访问它,在受限信任中的 Azure Web 角色或具有受限信任的所有其他主机就是这种情况。【参考方案12】:

我找到了一个硬编码的解决方案,我希望我能帮助别人:

public static class MIMEAssistant

  private static readonly Dictionary<string, string> MIMETypesDictionary = new Dictionary<string, string>
  
    "ai", "application/postscript",
    "aif", "audio/x-aiff",
    "aifc", "audio/x-aiff",
    "aiff", "audio/x-aiff",
    "asc", "text/plain",
    "atom", "application/atom+xml",
    "au", "audio/basic",
    "avi", "video/x-msvideo",
    "bcpio", "application/x-bcpio",
    "bin", "application/octet-stream",
    "bmp", "image/bmp",
    "cdf", "application/x-netcdf",
    "cgm", "image/cgm",
    "class", "application/octet-stream",
    "cpio", "application/x-cpio",
    "cpt", "application/mac-compactpro",
    "csh", "application/x-csh",
    "css", "text/css",
    "dcr", "application/x-director",
    "dif", "video/x-dv",
    "dir", "application/x-director",
    "djv", "image/vnd.djvu",
    "djvu", "image/vnd.djvu",
    "dll", "application/octet-stream",
    "dmg", "application/octet-stream",
    "dms", "application/octet-stream",
    "doc", "application/msword",
    "docx","application/vnd.openxmlformats-officedocument.wordprocessingml.document",
    "dotx", "application/vnd.openxmlformats-officedocument.wordprocessingml.template",
    "docm","application/vnd.ms-word.document.macroEnabled.12",
    "dotm","application/vnd.ms-word.template.macroEnabled.12",
    "dtd", "application/xml-dtd",
    "dv", "video/x-dv",
    "dvi", "application/x-dvi",
    "dxr", "application/x-director",
    "eps", "application/postscript",
    "etx", "text/x-setext",
    "exe", "application/octet-stream",
    "ez", "application/andrew-inset",
    "gif", "image/gif",
    "gram", "application/srgs",
    "grxml", "application/srgs+xml",
    "gtar", "application/x-gtar",
    "hdf", "application/x-hdf",
    "hqx", "application/mac-binhex40",
    "htm", "text/html",
    "html", "text/html",
    "ice", "x-conference/x-cooltalk",
    "ico", "image/x-icon",
    "ics", "text/calendar",
    "ief", "image/ief",
    "ifb", "text/calendar",
    "iges", "model/iges",
    "igs", "model/iges",
    "jnlp", "application/x-java-jnlp-file",
    "jp2", "image/jp2",
    "jpe", "image/jpeg",
    "jpeg", "image/jpeg",
    "jpg", "image/jpeg",
    "js", "application/x-javascript",
    "kar", "audio/midi",
    "latex", "application/x-latex",
    "lha", "application/octet-stream",
    "lzh", "application/octet-stream",
    "m3u", "audio/x-mpegurl",
    "m4a", "audio/mp4a-latm",
    "m4b", "audio/mp4a-latm",
    "m4p", "audio/mp4a-latm",
    "m4u", "video/vnd.mpegurl",
    "m4v", "video/x-m4v",
    "mac", "image/x-macpaint",
    "man", "application/x-troff-man",
    "mathml", "application/mathml+xml",
    "me", "application/x-troff-me",
    "mesh", "model/mesh",
    "mid", "audio/midi",
    "midi", "audio/midi",
    "mif", "application/vnd.mif",
    "mov", "video/quicktime",
    "movie", "video/x-sgi-movie",
    "mp2", "audio/mpeg",
    "mp3", "audio/mpeg",
    "mp4", "video/mp4",
    "mpe", "video/mpeg",
    "mpeg", "video/mpeg",
    "mpg", "video/mpeg",
    "mpga", "audio/mpeg",
    "ms", "application/x-troff-ms",
    "msh", "model/mesh",
    "mxu", "video/vnd.mpegurl",
    "nc", "application/x-netcdf",
    "oda", "application/oda",
    "ogg", "application/ogg",
    "pbm", "image/x-portable-bitmap",
    "pct", "image/pict",
    "pdb", "chemical/x-pdb",
    "pdf", "application/pdf",
    "pgm", "image/x-portable-graymap",
    "pgn", "application/x-chess-pgn",
    "pic", "image/pict",
    "pict", "image/pict",
    "png", "image/png", 
    "pnm", "image/x-portable-anymap",
    "pnt", "image/x-macpaint",
    "pntg", "image/x-macpaint",
    "ppm", "image/x-portable-pixmap",
    "ppt", "application/vnd.ms-powerpoint",
    "pptx","application/vnd.openxmlformats-officedocument.presentationml.presentation",
    "potx","application/vnd.openxmlformats-officedocument.presentationml.template",
    "ppsx","application/vnd.openxmlformats-officedocument.presentationml.slideshow",
    "ppam","application/vnd.ms-powerpoint.addin.macroEnabled.12",
    "pptm","application/vnd.ms-powerpoint.presentation.macroEnabled.12",
    "potm","application/vnd.ms-powerpoint.template.macroEnabled.12",
    "ppsm","application/vnd.ms-powerpoint.slideshow.macroEnabled.12",
    "ps", "application/postscript",
    "qt", "video/quicktime",
    "qti", "image/x-quicktime",
    "qtif", "image/x-quicktime",
    "ra", "audio/x-pn-realaudio",
    "ram", "audio/x-pn-realaudio",
    "ras", "image/x-cmu-raster",
    "rdf", "application/rdf+xml",
    "rgb", "image/x-rgb",
    "rm", "application/vnd.rn-realmedia",
    "roff", "application/x-troff",
    "rtf", "text/rtf",
    "rtx", "text/richtext",
    "sgm", "text/sgml",
    "sgml", "text/sgml",
    "sh", "application/x-sh",
    "shar", "application/x-shar",
    "silo", "model/mesh",
    "sit", "application/x-stuffit",
    "skd", "application/x-koan",
    "skm", "application/x-koan",
    "skp", "application/x-koan",
    "skt", "application/x-koan",
    "smi", "application/smil",
    "smil", "application/smil",
    "snd", "audio/basic",
    "so", "application/octet-stream",
    "spl", "application/x-futuresplash",
    "src", "application/x-wais-source",
    "sv4cpio", "application/x-sv4cpio",
    "sv4crc", "application/x-sv4crc",
    "svg", "image/svg+xml",
    "swf", "application/x-shockwave-flash",
    "t", "application/x-troff",
    "tar", "application/x-tar",
    "tcl", "application/x-tcl",
    "tex", "application/x-tex",
    "texi", "application/x-texinfo",
    "texinfo", "application/x-texinfo",
    "tif", "image/tiff",
    "tiff", "image/tiff",
    "tr", "application/x-troff",
    "tsv", "text/tab-separated-values",
    "txt", "text/plain",
    "ustar", "application/x-ustar",
    "vcd", "application/x-cdlink",
    "vrml", "model/vrml",
    "vxml", "application/voicexml+xml",
    "wav", "audio/x-wav",
    "wbmp", "image/vnd.wap.wbmp",
    "wbmxl", "application/vnd.wap.wbxml",
    "wml", "text/vnd.wap.wml",
    "wmlc", "application/vnd.wap.wmlc",
    "wmls", "text/vnd.wap.wmlscript",
    "wmlsc", "application/vnd.wap.wmlscriptc",
    "wrl", "model/vrml",
    "xbm", "image/x-xbitmap",
    "xht", "application/xhtml+xml",
    "xhtml", "application/xhtml+xml",
    "xls", "application/vnd.ms-excel",                        
    "xml", "application/xml",
    "xpm", "image/x-xpixmap",
    "xsl", "application/xml",
    "xlsx","application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
    "xltx","application/vnd.openxmlformats-officedocument.spreadsheetml.template",
    "xlsm","application/vnd.ms-excel.sheet.macroEnabled.12",
    "xltm","application/vnd.ms-excel.template.macroEnabled.12",
    "xlam","application/vnd.ms-excel.addin.macroEnabled.12",
    "xlsb","application/vnd.ms-excel.sheet.binary.macroEnabled.12",
    "xslt", "application/xslt+xml",
    "xul", "application/vnd.mozilla.xul+xml",
    "xwd", "image/x-xwindowdump",
    "xyz", "chemical/x-xyz",
    "zip", "application/zip"
  ;

  public static string GetMIMEType(string fileName)
  
    //get file extension
    string extension = Path.GetExtension(fileName).ToLowerInvariant();

    if (extension.Length > 0 && 
        MIMETypesDictionary.ContainsKey(extension.Remove(0, 1)))
    
      return MIMETypesDictionary[extension.Remove(0, 1)];
    
    return "unknown/unknown";
  

【讨论】:

这是基于文件名的。它可能对希望通过文件内容完成的人(而不是 OP)有用。 这个列表的一个子集也便于将 WebImage.ImageFormat 映射回 mime 类型。谢谢! 根据您的目标,您可能希望返回“application/octet-stream”而不是“unknown/unknown”。 由于我的编辑被拒绝了,我将在这里发布:扩展名必须全部小写,否则将无法在字典中找到。 @JalalAldeenSaa'd - 恕我直言,更好的解决方法是将StringComparer.OrdinalIgnoreCase 用于字典构造函数。序数比较比不变量快,你会摆脱.ToLower() 及其变体。【参考方案13】:

IIS 7 或更高版本

使用此代码,但您需要成为服务器上的管理员

public bool CheckMimeMapExtension(string fileExtension)
        
            try
            

                using (
                ServerManager serverManager = new ServerManager())
                   
                    // connects to default app.config
                    var config = serverManager.GetApplicationHostConfiguration();
                    var staticContent = config.GetSection("system.webServer/staticContent");
                    var mimeMap = staticContent.GetCollection();

                    foreach (var mimeType in mimeMap)
                    

                        if (((String)mimeType["fileExtension"]).Equals(fileExtension, StringComparison.OrdinalIgnoreCase))
                            return true;

                    

                
                return false;
            
            catch (Exception ex)
             
                Console.WriteLine("An exception has occurred: \n0", ex.Message);
                Console.Read();
            

            return false;

        

【讨论】:

欺骗呢?【参考方案14】:

如果有人愿意,他们可以将出色的 perl 模块 File::Type 移植到 .NET。在代码中是一组文件头幻数查找每个文件类型或正则表达式匹配。

这是一个 .NET 文件类型检测库 http://filetypedetective.codeplex.com/,但它目前只检测到少量文件。

【讨论】:

【参考方案15】:

最后我确实使用了 urlmon.dll。我认为会有一种更简单的方法,但这很有效。我包含代码以帮助其他人,并允许我在需要时再次找到它。

using System.Runtime.InteropServices;

...

    [DllImport(@"urlmon.dll", CharSet = CharSet.Auto)]
    private extern static System.UInt32 FindMimeFromData(
        System.UInt32 pBC,
        [MarshalAs(UnmanagedType.LPStr)] System.String pwzUrl,
        [MarshalAs(UnmanagedType.LPArray)] byte[] pBuffer,
        System.UInt32 cbSize,
        [MarshalAs(UnmanagedType.LPStr)] System.String pwzMimeProposed,
        System.UInt32 dwMimeFlags,
        out System.UInt32 ppwzMimeOut,
        System.UInt32 dwReserverd
    );

    public static string getMimeFromFile(string filename)
    
        if (!File.Exists(filename))
            throw new FileNotFoundException(filename + " not found");

        byte[] buffer = new byte[256];
        using (FileStream fs = new FileStream(filename, FileMode.Open))
        
            if (fs.Length >= 256)
                fs.Read(buffer, 0, 256);
            else
                fs.Read(buffer, 0, (int)fs.Length);
        
        try
        
            System.UInt32 mimetype;
            FindMimeFromData(0, null, buffer, 256, null, 0, out mimetype, 0);
            System.IntPtr mimeTypePtr = new IntPtr(mimetype);
            string mime = Marshal.PtrToStringUni(mimeTypePtr);
            Marshal.FreeCoTaskMem(mimeTypePtr);
            return mime;
        
        catch (Exception e)
        
            return "unknown/unknown";
        
    

【讨论】:

可能是注册表中映射的任何内容。 @flq, @mkmurray msdn.microsoft.com/en-us/library/… 我在 Windows 8 上的 IIS 中托管此代码时遇到了问题。使用 pinvoke.net 上的实现(有细微差别)解决了这个问题:pinvoke.net/default.aspx/urlmon.findmimefromdata 我一直在用 IIS 7 测试这段代码,但它并没有为我工作。我有一个正在测试的 CSV 文件。我一直在更改 CSV 的扩展名(更改为 .png、.jpeg 等),并且 mimetype 随扩展名(image/png、image/jpeg)而变化。我可能是错的,但我的理解是 Urlmon.dll 使用文件的元数据来确定 mimetype,而不仅仅是它的扩展名 不适用于 64 位应用程序,请查看此处:***.com/questions/18358548/…【参考方案16】:

在 Urlmon.dll 中,有一个名为 FindMimeFromData 的函数。

来自文档

MIME 类型检测或“数据嗅探”是指从二进制数据中确定适当的 MIME 类型的过程。最终结果取决于服务器提供的 MIME 类型标头、文件扩展名和/或数据本身的组合。通常,只有前 256 个字节的数据是重要的。

因此,从文件中读取前(最多)256 个字节并将其传递给FindMimeFromData

【讨论】:

这种方法的可靠性如何? 根据***.com/questions/4833113/…,该函数只能确定26种类型,所以我认为它不可靠。例如。 '*.docx' 文件被确定为'application/x-zip-compressed'。 我想这是因为 docx 表面上是一个 zip 文件。 Docx 一个 zip 文件,.docx 的 mimetype 是“application/vnd.openxmlformats-officedocument.wordprocessingml.document”。虽然这可以通过仅二进制检查来确定,但这可能不是最有效的方法,而且在大多数情况下,您必须读取前 256 个字节以上的内容。 我认为这个问题在 20 年代仍然很重要。查看this answer。 FileSignatures Project 似乎更可靠一些,并且可以让您准确控制要匹配的文件类型。【参考方案17】:

我认为正确的答案是 Steve Morgan 和 Serguei 的答案的结合。这就是 Internet Explorer 的工作方式。对FindMimeFromData 的 pinvoke 调用仅适用于 26 种硬编码的 mime 类型。此外,即使可能存在更具体、更合适的 mime 类型,它也会给出模棱两可的 mime 类型(例如 text/plainapplication/octet-stream)。如果它没有给出好的 mime 类型,您可以去注册表获取更具体的 mime 类型。服务器注册表可以有更多最新的 mime 类型。

参考:http://msdn.microsoft.com/en-us/library/ms775147(VS.85).aspx

【讨论】:

【参考方案18】:

我使用混合解决方案:

    using System.Runtime.InteropServices;

    [DllImport (@"urlmon.dll", CharSet = CharSet.Auto)]
    private extern static System.UInt32 FindMimeFromData(
        System.UInt32 pBC, 
        [MarshalAs(UnmanagedType.LPStr)] System.String pwzUrl,
        [MarshalAs(UnmanagedType.LPArray)] byte[] pBuffer,
        System.UInt32 cbSize,
        [MarshalAs(UnmanagedType.LPStr)] System.String pwzMimeProposed,
        System.UInt32 dwMimeFlags,
        out System.UInt32 ppwzMimeOut,
        System.UInt32 dwReserverd
    );

    private string GetMimeFromRegistry (string Filename)
    
        string mime = "application/octetstream";
        string ext = System.IO.Path.GetExtension(Filename).ToLower();
        Microsoft.Win32.RegistryKey rk = Microsoft.Win32.Registry.ClassesRoot.OpenSubKey(ext);
        if (rk != null && rk.GetValue("Content Type") != null)
            mime = rk.GetValue("Content Type").ToString();
        return mime;
    

    public string GetMimeTypeFromFileAndRegistry (string filename)
    
        if (!File.Exists(filename))
        
           return GetMimeFromRegistry (filename);
        

        byte[] buffer = new byte[256];

        using (FileStream fs = new FileStream(filename, FileMode.Open))
        
            if (fs.Length >= 256)
                fs.Read(buffer, 0, 256);
            else
                fs.Read(buffer, 0, (int)fs.Length);
        

        try
                    
            System.UInt32 mimetype;

            FindMimeFromData(0, null, buffer, 256, null, 0, out mimetype, 0);

            System.IntPtr mimeTypePtr = new IntPtr(mimetype);

            string mime = Marshal.PtrToStringUni(mimeTypePtr);

            Marshal.FreeCoTaskMem(mimeTypePtr);

            if (string.IsNullOrWhiteSpace (mime) || 
                mime =="text/plain" || mime == "application/octet-stream")                    
            
                return GetMimeFromRegistry (filename);
            

            return mime;
        
        catch (Exception e)
        
            return GetMimeFromRegistry (filename);
        
    

【讨论】:

感谢您的代码。它部分工作。对于“doc”和“tif”文件,它返回“application/octet-stream”。还有其他选择吗? 如果能看到上述扩展字典和 urlmon 的混合解决方案,那就太好了。 @PranavShah,请注意,服务器对 mime 类型(注册表查找返回的类型)的了解取决于服务器上安装的软件。基本的 Windows 安装或专用 Web 服务器不应该可靠地知道不需要安装的第 3 方软件的 mime 类型。不过,它应该知道.doc 文件是什么。【参考方案19】:

我发现这个很有用。 对于 VB.NET 开发人员:

    Public Shared Function GetFromFileName(ByVal fileName As String) As String
        Return GetFromExtension(Path.GetExtension(fileName).Remove(0, 1))
    End Function

    Public Shared Function GetFromExtension(ByVal extension As String) As String
        If extension.StartsWith("."c) Then
            extension = extension.Remove(0, 1)
        End If

        If MIMETypesDictionary.ContainsKey(extension) Then
            Return MIMETypesDictionary(extension)
        End If

        Return "unknown/unknown"
    End Function

    Private Shared ReadOnly MIMETypesDictionary As New Dictionary(Of String, String)() From  _
         "ai", "application/postscript", _
         "aif", "audio/x-aiff", _
         "aifc", "audio/x-aiff", _
         "aiff", "audio/x-aiff", _
         "asc", "text/plain", _
         "atom", "application/atom+xml", _
         "au", "audio/basic", _
         "avi", "video/x-msvideo", _
         "bcpio", "application/x-bcpio", _
         "bin", "application/octet-stream", _
         "bmp", "image/bmp", _
         "cdf", "application/x-netcdf", _
         "cgm", "image/cgm", _
         "class", "application/octet-stream", _
         "cpio", "application/x-cpio", _
         "cpt", "application/mac-compactpro", _
         "csh", "application/x-csh", _
         "css", "text/css", _
         "dcr", "application/x-director", _
         "dif", "video/x-dv", _
         "dir", "application/x-director", _
         "djv", "image/vnd.djvu", _
         "djvu", "image/vnd.djvu", _
         "dll", "application/octet-stream", _
         "dmg", "application/octet-stream", _
         "dms", "application/octet-stream", _
         "doc", "application/msword", _
         "dtd", "application/xml-dtd", _
         "dv", "video/x-dv", _
         "dvi", "application/x-dvi", _
         "dxr", "application/x-director", _
         "eps", "application/postscript", _
         "etx", "text/x-setext", _
         "exe", "application/octet-stream", _
         "ez", "application/andrew-inset", _
         "gif", "image/gif", _
         "gram", "application/srgs", _
         "grxml", "application/srgs+xml", _
         "gtar", "application/x-gtar", _
         "hdf", "application/x-hdf", _
         "hqx", "application/mac-binhex40", _
         "htm", "text/html", _
         "html", "text/html", _
         "ice", "x-conference/x-cooltalk", _
         "ico", "image/x-icon", _
         "ics", "text/calendar", _
         "ief", "image/ief", _
         "ifb", "text/calendar", _
         "iges", "model/iges", _
         "igs", "model/iges", _
         "jnlp", "application/x-java-jnlp-file", _
         "jp2", "image/jp2", _
         "jpe", "image/jpeg", _
         "jpeg", "image/jpeg", _
         "jpg", "image/jpeg", _
         "js", "application/x-javascript", _
         "kar", "audio/midi", _
         "latex", "application/x-latex", _
         "lha", "application/octet-stream", _
         "lzh", "application/octet-stream", _
         "m3u", "audio/x-mpegurl", _
         "m4a", "audio/mp4a-latm", _
         "m4b", "audio/mp4a-latm", _
         "m4p", "audio/mp4a-latm", _
         "m4u", "video/vnd.mpegurl", _
         "m4v", "video/x-m4v", _
         "mac", "image/x-macpaint", _
         "man", "application/x-troff-man", _
         "mathml", "application/mathml+xml", _
         "me", "application/x-troff-me", _
         "mesh", "model/mesh", _
         "mid", "audio/midi", _
         "midi", "audio/midi", _
         "mif", "application/vnd.mif", _
         "mov", "video/quicktime", _
         "movie", "video/x-sgi-movie", _
         "mp2", "audio/mpeg", _
         "mp3", "audio/mpeg", _
         "mp4", "video/mp4", _
         "mpe", "video/mpeg", _
         "mpeg", "video/mpeg", _
         "mpg", "video/mpeg", _
         "mpga", "audio/mpeg", _
         "ms", "application/x-troff-ms", _
         "msh", "model/mesh", _
         "mxu", "video/vnd.mpegurl", _
         "nc", "application/x-netcdf", _
         "oda", "application/oda", _
         "ogg", "application/ogg", _
         "pbm", "image/x-portable-bitmap", _
         "pct", "image/pict", _
         "pdb", "chemical/x-pdb", _
         "pdf", "application/pdf", _
         "pgm", "image/x-portable-graymap", _
         "pgn", "application/x-chess-pgn", _
         "pic", "image/pict", _
         "pict", "image/pict", _
         "png", "image/png", _
         "pnm", "image/x-portable-anymap", _
         "pnt", "image/x-macpaint", _
         "pntg", "image/x-macpaint", _
         "ppm", "image/x-portable-pixmap", _
         "ppt", "application/vnd.ms-powerpoint", _
         "ps", "application/postscript", _
         "qt", "video/quicktime", _
         "qti", "image/x-quicktime", _
         "qtif", "image/x-quicktime", _
         "ra", "audio/x-pn-realaudio", _
         "ram", "audio/x-pn-realaudio", _
         "ras", "image/x-cmu-raster", _
         "rdf", "application/rdf+xml", _
         "rgb", "image/x-rgb", _
         "rm", "application/vnd.rn-realmedia", _
         "roff", "application/x-troff", _
         "rtf", "text/rtf", _
         "rtx", "text/richtext", _
         "sgm", "text/sgml", _
         "sgml", "text/sgml", _
         "sh", "application/x-sh", _
         "shar", "application/x-shar", _
         "silo", "model/mesh", _
         "sit", "application/x-stuffit", _
         "skd", "application/x-koan", _
         "skm", "application/x-koan", _
         "skp", "application/x-koan", _
         "skt", "application/x-koan", _
         "smi", "application/smil", _
         "smil", "application/smil", _
         "snd", "audio/basic", _
         "so", "application/octet-stream", _
         "spl", "application/x-futuresplash", _
         "src", "application/x-wais-source", _
         "sv4cpio", "application/x-sv4cpio", _
         "sv4crc", "application/x-sv4crc", _
         "svg", "image/svg+xml", _
         "swf", "application/x-shockwave-flash", _
         "t", "application/x-troff", _
         "tar", "application/x-tar", _
         "tcl", "application/x-tcl", _
         "tex", "application/x-tex", _
         "texi", "application/x-texinfo", _
         "texinfo", "application/x-texinfo", _
         "tif", "image/tiff", _
         "tiff", "image/tiff", _
         "tr", "application/x-troff", _
         "tsv", "text/tab-separated-values", _
         "txt", "text/plain", _
         "ustar", "application/x-ustar", _
         "vcd", "application/x-cdlink", _
         "vrml", "model/vrml", _
         "vxml", "application/voicexml+xml", _
         "wav", "audio/x-wav", _
         "wbmp", "image/vnd.wap.wbmp", _
         "wbmxl", "application/vnd.wap.wbxml", _
         "wml", "text/vnd.wap.wml", _
         "wmlc", "application/vnd.wap.wmlc", _
         "wmls", "text/vnd.wap.wmlscript", _
         "wmlsc", "application/vnd.wap.wmlscriptc", _
         "wrl", "model/vrml", _
         "xbm", "image/x-xbitmap", _
         "xht", "application/xhtml+xml", _
         "xhtml", "application/xhtml+xml", _
         "xls", "application/vnd.ms-excel", _
         "xml", "application/xml", _
         "xpm", "image/x-xpixmap", _
         "xsl", "application/xml", _
         "xslt", "application/xslt+xml", _
         "xul", "application/vnd.mozilla.xul+xml", _
         "xwd", "image/x-xwindowdump", _
         "xyz", "chemical/x-xyz", _
         "zip", "application/zip" _
        

【讨论】:

看起来可能是旧列表...没有 .docx、.xlsx 等。 某处有在线列表吗?上面的列表看起来更完整一些,在这里找到了一些缺失的列表:***.com/questions/4212861/…——但似乎应该有一个 Web 服务,你可以发送一个文件名和一些字节也可以做到最好猜猜剩下的…… 我会为此使用配置,所以我可以选择我需要的 mime 类型并相应地修改它们,而无需更改任何一行代码【参考方案20】:

您也可以查看注册表。

    using System.IO;
    using Microsoft.Win32;

    string GetMimeType(FileInfo fileInfo)
    
        string mimeType = "application/unknown";

        RegistryKey regKey = Registry.ClassesRoot.OpenSubKey(
            fileInfo.Extension.ToLower()
            );

        if(regKey != null)
        
            object contentType = regKey.GetValue("Content Type");

            if(contentType != null)
                mimeType = contentType.ToString();
        

        return mimeType;
    

您将不得不以一种或另一种方式进入 MIME 数据库 - 无论它们是从扩展名还是从幻数映射的有点微不足道 - Windows 注册表就是这样一个地方。 对于独立于平台的解决方案,尽管必须将这个数据库与代码一起提供(或作为独立库)。

【讨论】:

@Rabbi 尽管这个问题是针对文件内容而不是扩展名的,但这个答案可能对其他路过的人(比如我自己)仍然有用。即使这样的答案不太可能被接受,拥有这些信息仍然很好。 这不是简单地根据文件名的扩展名获取mime吗?如果文件是 .docx 并且某个小丑决定将其重命名为 .doc 怎么办?你肯定弄错了 MIME 类型。 @kolin,您说的完全正确,但俗话说“做傻事,总有人会做一个更好的傻瓜”。 :) 当使用 Windows Azure Web 角色或任何其他以有限信任运行您的应用程序的主机时 - 不要忘记您将不被允许访问注册表。 try-catch-for-registry 和内存字典(如 Anykey 的答案)的组合看起来是一个很好的解决方案,两者兼而有之。

以上是关于使用 .NET,如何根据文件签名而不是扩展名找到文件的 mime 类型的主要内容,如果未能解决你的问题,请参考以下文章

使用 .NET,如何根据文件签名而不是扩展名找到文件的 mime 类型

使用 .NET,如何根据文件签名而不是扩展名找到文件的 mime 类型

使用 .NET,如何根据文件签名而不是扩展名找到文件的 mime 类型

如何根据 APK 文件获取应用签名时间

提交到 App Store 时,未找到 WatchKit 扩展的匹配配置文件

如何在 HTML 文本区域中找到光标位置(X/Y,而不是行/列)? [复制]