读取 JPG 文件，直到出现某些字节

Posted 2023-02-16

技术标签:

【中文标题】读取 JPG 文件，直到出现某些字节【英文标题】：Read a JPG file until certain bytes occur 【发布时间】：2014-10-09 00:08:53 【问题描述】：

所以我正在阅读 JPG 文件，并且我已经完成了“标题”数据的阅读，现在我正在阅读实际的图像数据。问题是，我事先不知道图像的大小，所以我无法创建一个数组来读取。不过，我可以做的是从“标题”的末尾读取到图像的末尾（两个字节：FF 和 D9），ByteArrayOutputStream 在读取时保存每个值，直到我遇到字节 D9 之后字节 FF。我该怎么做呢？

到目前为止，我的代码，包括 JPG 识别，只是为了让您了解上下文：

// check header data, assign header data to important fields

    // Start Of Image (SOI) must be FFD8 and the next marker must be FF
    if(!(bData[0] == (byte) 0xFF && bData[1] == (byte) 0xD8
            && this.bData[2] == (byte) 0xFF))
        this.isValid = false;

    // check if file is not valid
    if(!isValid) 
        System.err.printf("ERROR: File %s is not"
                        + " registered as a bitmap!\n", filename);
        Logger.getLogger(Bitmap.class.getName()).log(Level.SEVERE, null, new IllegalArgumentException());
    

    // If the next values are correct, then the data stream starts at SOI
    // If not, the data stream is raw
    this.isRawDataStream = !(bData[3] == (byte) 0xE0
            && bData[6]  == (byte) 0x4A
            && bData[7]  == (byte) 0x46
            && bData[8]  == (byte) 0x49
            && bData[9]  == (byte) 0x46
            && bData[10] == (byte) 0x00);

    // get size of image
    ByteArrayOutputStream iData = new ByteArrayOutputStream();

    // start at index 20 of the file (end of 'header')
    // read until End of Image
    /* while(!(iData at i is FF and iData at i+1 is D9)) 
        ???
    
    */

编辑我这样做是为了更好地理解文件格式等，我可能严重误解了 JFIF。如果我是，请不要犹豫告诉我。

【问题讨论】：

我建议使用 ByteArrayOutputStream 或 ByteBuffer 或类似名称。 ArrayList<Byte> 将占用大量内存，因为将每个 byte 装箱为 Byte 对象。同样byteData[0].equals("FF") 应该是byteData[0] == (byte) 0xFF，假设byteData 是一个byte[] 数组。将 bytes 与 Strings 进行比较是行不通的。那么最好切换到byte[] 数组。使用Strings 会导致不必要的低效。我没有意识到您可以将十六进制值转换为字节，谢谢！这甚至使我的 Bitmap 类也更有效率！（：@JohnKugelman 让我们continue this discussion in chat。 【参考方案1】：

图像的大小位于 SOF（帧开始）标记中。

在重读中，我认为原始发布者对 JPEG 流的结构有误。

它必须以 SOI 市场开头并以 EOI 标记结尾。除此之外，标记的顺序可能会有所不同。

还有一些其他限制： SOF 标记必须位于 SOS 标记之前。 DHT 和 DQT 标记必须出现在任何使用它们的 SOS 标记之前。

最重要的是，有多种 JPEG 文件格式需要在流的开头使用 APPn 标记。

上述代码和问题并未反映 JPEG 流的可变性质。

【讨论】：

以上是关于读取 JPG 文件，直到出现某些字节的主要内容，如果未能解决你的问题，请参考以下文章