逐字节读写以进行压缩
Posted
技术标签:
【中文标题】逐字节读写以进行压缩【英文标题】:Reading and writing byte-by-byte for compression 【发布时间】:2016-04-28 14:04:53 【问题描述】:我正在尝试使用 python 实现 Lempel-Ziv-Welch 算法,但在以二进制格式编写文件时遇到问题。
action = sys.argv[3]
if action == "compress":
# initialize dictionary
dictionary =
for i in range(0,256):
# for single characters, the value is the same as the key
# in the compressed file, these would appear as is
dictionary[chr(i)] = i
input_file = open(sys.argv[1], 'rb+')
output_file = open(sys.argv[2], 'wb')
data = input_file.read()
# current_data is one byte
current_data = input_file.read(1)
i = 0
j = 1
current_data = data[i:j]
# look for the shortest string not in the dictionary
while i < len(data) - 2:
while current_data in dictionary.keys():
if j < len(data) + 1:
j = j + 1
current_data = data[i:j]
else:
break
# once the shortest string is found, add it to the dictionary
if current_data not in dictionary.keys():
dictionary[current_data] = len(dictionary)
thing_to_write = dictionary[current_data[:-1]]
i = j - 1
current_data = data[i:j]
else:
thing_to_write = dictionary[current_data]
i = i + 1
j = i + 1
# then write to the output file the found string - one character from the end (the longest string that is in the dictionary)\
mylist = []
thing_to_write = format(thing_to_write,'x')
thing_to_write = thing_to_write
for char in thing_to_write:
mylist.append(char.encode('hex'))
for elem in mylist:
output_file.write(elem)
input_file.close()
output_file.close()
print >> sys.stderr, "The size of " + sys.argv[1] + " is " + str(os.path.getsize(sys.argv[1])) + " bytes." + "\n" + "The size of " + sys.argv[2] + " is " + str(os.path.getsize(sys.argv[2])) + " bytes."
我尝试过用许多不同的格式编写,例如十六进制、二进制等,但我认为我只是将它们编写为 8 位字符。如何用原始二进制编写?
【问题讨论】:
“我遇到麻烦”是什么意思?你收到错误信息吗?然后为问题添加完整消息。 How to create a Minimal, Complete, and Verifiable example 【参考方案1】:不清楚您要写什么。您获得的数据最终可能大于 256,所以我假设您想要将 2 字节无符号整数写入输出文件?
如果是这种情况,那么我建议您研究 Python 的 struct.pack
函数,该函数旨在将数据从 Python 的类型转换为二进制表示。如果您的数据是字节大小的,您可以只使用output_file.write(chr(x))
来写入每个字符。
以下使用Python的struct.pack()
:
import os
os.chdir(os.path.dirname(os.path.abspath(__file__)))
import sys
import struct
action = sys.argv[3]
if action == "compress":
# initialize dictionary
dictionary =
for i in range(0,256):
# for single characters, the value is the same as the key
# in the compressed file, these would appear as is
dictionary[chr(i)] = i
input_file = open(sys.argv[1], 'rb')
output_file = open(sys.argv[2], 'wb')
data = input_file.read()
# current_data is one byte
current_data = input_file.read(1)
i = 0
j = 1
current_data = data[i:j]
# look for the shortest string not in the dictionary
while i < len(data) - 2:
while current_data in dictionary.keys():
if j < len(data) + 1:
j = j + 1
current_data = data[i:j]
else:
break
# once the shortest string is found, add it to the dictionary
if current_data not in dictionary.keys():
dictionary[current_data] = len(dictionary)
thing_to_write = dictionary[current_data[:-1]]
i = j - 1
current_data = data[i:j]
else:
thing_to_write = dictionary[current_data]
i = i + 1
j = i + 1
# then write to the output file the found string - one character from the end (the longest string that is in the dictionary)\
output_file.write(struct.pack('H', thing_to_write)) # Convert each thing into 2 byte binary
input_file.close()
output_file.close()
print >> sys.stderr, "The size of " + sys.argv[1] + " is " + str(os.path.getsize(sys.argv[1])) + " bytes." + "\n" + "The size of " + sys.argv[2] + " is " + str(os.path.getsize(sys.argv[2])) + " bytes."
【讨论】:
以上是关于逐字节读写以进行压缩的主要内容,如果未能解决你的问题,请参考以下文章