参考文献:
http://blog.csdn.net/meanong/article/details/53942116
https://github.com/invernizzi/scapy-http
目录:
1. scapy存在的问题与解决方案
1.1 内存泄漏问题
1.2 解决方案
1.3 不支持HTTP协议解析
1.4 解决方案
2. pcap文件解析实例
1. scapy存在的问题与解决方案
1.1 内存泄漏问题
在使用 rdpcap() 函数读取pcap文件里的数据时,会发生内存泄漏问题,多读几个文件的话可能内存就满了。
具体的原因在于 scapy 库在读取pcap文件时,open了文件却没有close,而用户又没有办法去close,所以会产生内存泄漏。
rdpcap()函数源码如下:
def rdpcap(filename, count=-1):
"""Read a pcap file and return a packet list
count: read only <count> packets"""
return PcapReader(filename).read_all(count=count)
PcapReader类的read_all()函数继承于父类RawPcapReader,函数源码如下:
def read_all(self,count=-1):
"""return a list of all packets in the pcap file
"""
res=[]
while count != 0:
count -= 1
p = self.read_packet()
if p is None:
break
res.append(p)
return res
在进一步查看其__init__()函数就会发现,它只写了打开文件的代码却没有close掉。
def __init__(self, filename):
self.filename = filename
try:
self.f = gzip.open(filename,"rb")
magic = self.f.read(4)
except IOError:
self.f = open(filename,"rb")
magic = self.f.read(4)
if magic == "\xa1\xb2\xc3\xd4": #big endian
self.endian = ">"
elif magic == "\xd4\xc3\xb2\xa1": #little endian
self.endian = "<"
else:
raise Scapy_Exception("Not a pcap capture file (bad magic)")
hdr = self.f.read(20)
if len(hdr)<20:
raise Scapy_Exception("Invalid pcap file (too short)")
vermaj,vermin,tz,sig,snaplen,linktype = struct.unpack(self.endian+"HHIIII",hdr)
self.linktype = linktype
1.2 解决方案
1.2.1 修改源码
相关部分的源码都在 scapy/utils.py文件下,只需要修改 rdpcap() 函数的内容即可,修改后的结果如下:
def rdpcap(filename, count=-1):
"""Read a pcap or pcapng file and return a packet list
count: read only <count> packets
"""
pcap = PcapReader(filename)
data = pcap.read_all(count=count)
pcap.close()
return data
在close掉pcap这个实例的时候,里面打开的文件也会被一同close掉。
1.2.2 不使用rdpacp()函数读取pcap文件数据
pr = PcapReader(‘E:/HTTP/Code/data/group1.pcap‘)
while True:
packege = pr.read_packet()
if packege is None:
break
else:
#TODO
pr.close()
这里我们直接实例化一个PcapReader对象,然后一个一个 的读取里面的包数据,最后再将这个实例关掉。
1.3 不支持HTTP协议解析
查看 scapy/layers 就会发现,scapy不支持HTTP协议,我试着分析了一下包含HTTP协议的报文,结果都被解析为RAW协议了。
如下:
<Ether dst=00:60:97:de:54:36 src=00:00:0c:04:41:bc type=0x800 |<IP version=4L ihl=5L tos=0x0 len=260 id=38843 flags=DF frag=0L ttl=63 proto=tcp chksum=0xce48 src=172.16.113.84 dst=206.161.232.233 options=[] |<TCP sport=13002 dport=http seq=138591001 ack=2983383564L dataofs=5L reserved=0L flags=PA window=32120 chksum=0xa1c5 urgptr=0 options=[] |<Raw load=‘GET /autoplus/autoplus.gif HTTP/1.0\r\nReferer: http://www.bostonian.com/\r\nUser-Agent: Mozilla/3.01 (X11; I; SunOS 4.1.4 sun4u)\r\nHost: www.bostonian.com\r\nAccept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*\r\n\r\n‘ |>>>>
1.4 解决方案
有人对scapy进行了补充,项目地址:https://github.com/invernizzi/scapy-http 。
只需要使用pip安装即可:
pip install scapy-http
然后再程序里导入http层即可解析http协议
import scapy.all as scapy
from scapy.layers import http
2. pcap文件解析实例
# -*- coding:utf-8 -*-
import scapy.all as scapy
from scapy.layers import http
# 提取出pacp文件中的所有包
packeges = scapy.rdpcap(‘E:/HTTP/Code/data/group5.pcap‘)
print packeges
这里我们直接打印 packeges ,输出的是所有包的类型信息,如下
<group5.pcap: TCP:1323 UDP:0 ICMP:0 Other:0>
for p in packages:
print repr(p)
这里就可以打印出每一个包的详细信息,如下:
<Ether dst=00:60:97:de:54:36 src=00:00:0c:04:41:bc type=0x800 |<IP version=4L ihl=5L tos=0x0 len=290 id=42742 flags=DF frag=0L ttl=63 proto=tcp chksum=0xb263 src=172.16.113.84 dst=167.8.29.15 options=[] |<TCP sport=31009 dport=http seq=1946692237 ack=672469499 dataofs=5L reserved=0L flags=PA window=32120 chksum=0xf91c urgptr=0 options=[] |<HTTP |<HTTPRequest Method=u‘GET‘ Path=u‘/leadpage/credit/credrib.gif‘ Http-Version=u‘HTTP/1.0‘ Host=u‘www.usatoday.com‘ User-Agent=u‘Mozilla/3.01 (X11; I; SunOS 4.1.4 sun4u)‘ Accept=u‘image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*‘ Referer=u‘http://www.usatoday.com/leadpage/credit/credit.htm‘ Headers=u‘Host: www.usatoday.com\r\nReferer: http://www.usatoday.com/leadpage/credit/credit.htm\r\nAccept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*\r\nUser-Agent: Mozilla/3.01 (X11; I; SunOS 4.1.4 sun4u)‘ |>>>>>
<Ether dst=00:60:97:de:54:36 src=00:00:0c:04:41:bc type=0x800 |<IP version=4L ihl=5L tos=0x0 len=281 id=42838 flags=DF frag=0L ttl=63 proto=tcp chksum=0xb20c src=172.16.113.84 dst=167.8.29.15 options=[] |<TCP sport=31011 dport=http seq=4290014843L ack=3755501580L dataofs=5L reserved=0L flags=PA window=32120 chksum=0xe139 urgptr=0 options=[] |<HTTP |<HTTPRequest Method=u‘GET‘ Path=u‘/feedback/buyus.htm‘ Http-Version=u‘HTTP/1.0‘ Host=u‘www.usatoday.com‘ User-Agent=u‘Mozilla/3.01 (X11; I; SunOS 4.1.4 sun4u)‘ Accept=u‘image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*‘ Referer=u‘http://www.usatoday.com/leadpage/credit/credit.htm‘ Headers=u‘Host: www.usatoday.com\r\nReferer: http://www.usatoday.com/leadpage/credit/credit.htm\r\nAccept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*\r\nUser-Agent: Mozilla/3.01 (X11; I; SunOS 4.1.4 sun4u)‘ |>>>>>
<Ether dst=00:60:97:de:54:36 src=00:00:0c:04:41:bc type=0x800 |<IP version=4L ihl=5L tos=0x0 len=273 id=42847 flags=DF frag=0L ttl=63 proto=tcp chksum=0xb20b src=172.16.113.84 dst=167.8.29.15 options=[] |<TCP sport=31018 dport=http seq=1370418497 ack=2248894642L dataofs=5L reserved=0L flags=PA window=32120 chksum=0xf812 urgptr=0 options=[] |<HTTP |<HTTPRequest Method=u‘GET‘ Path=u‘/inetart/scribe.gif‘ Http-Version=u‘HTTP/1.0‘ Host=u‘www.usatoday.com‘ User-Agent=u‘Mozilla/3.01 (X11; I; SunOS 4.1.4 sun4u)‘ Accept=u‘image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*‘ Referer=u‘http://www.usatoday.com/feedback/buyus.htm‘ Headers=u‘Host: www.usatoday.com\r\nReferer: http://www.usatoday.com/feedback/buyus.htm\r\nAccept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*\r\nUser-Agent: Mozilla/3.01 (X11; I; SunOS 4.1.4 sun4u)‘ |>>>>>
这里每个包 p 的结构如下:
物理层 --> 网络层 --> 传输层 --> 应用层
每一层的数据都可以根据相应层的协议名获取,然后再通过字段名获取具体层具体字段的信息,代码如下:
for p in packages:
print repr(p)
print p[‘Ether‘].name
print p[‘Ether‘].dst
print p[‘Ether‘].src
print p[‘IP‘].name
print p[‘IP‘].dst
print p[‘IP‘].src
print p[‘TCP‘].name
print p[‘TCP‘].sport
print p[‘TCP‘].dport
输出结果如下:
<Ether dst=00:60:97:de:54:36 src=00:00:0c:04:41:bc type=0x800 |<IP version=4L ihl=5L tos=0x0 len=204 id=7838 flags=DF frag=0L ttl=63 proto=tcp chksum=0x7ceb src=172.16.113.84 dst=207.46.179.15 options=[] |<TCP sport=1751 dport=http seq=2094829820 ack=3501118724L dataofs=5L reserved=0L flags=PA window=32120 chksum=0x91d6 urgptr=0 options=[] |<HTTP |<HTTPRequest Method=u‘GET‘ Path=u‘/‘ Http-Version=u‘HTTP/1.0‘ Host=u‘home.microsoft.com‘ User-Agent=u‘Mozilla/3.01 (X11; I; SunOS 4.1.4 sun4u)‘ Accept=u‘image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*‘ Headers=u‘Host: home.microsoft.com\r\nAccept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*\r\nUser-Agent: Mozilla/3.01 (X11; I; SunOS 4.1.4 sun4u)‘ |>>>>>
Ethernet
00:60:97:de:54:36
00:00:0c:04:41:bc
IP
207.46.179.15
172.16.113.84
TCP
1751
80
<Ether dst=00:60:97:de:54:36 src=00:00:0c:04:41:bc type=0x800 |<IP version=4L ihl=5L tos=0x0 len=256 id=7878 flags=DF frag=0L ttl=63 proto=tcp chksum=0x7f6b src=172.16.113.84 dst=207.46.176.51 options=[] |<TCP sport=1814 dport=http seq=1495749711 ack=1496378949 dataofs=5L reserved=0L flags=PA window=32120 chksum=0xfd2b urgptr=0 options=[] |<HTTP |<HTTPRequest Method=u‘GET‘ Path=u‘/mps_id_sharing/redirect.asp?home.microsoft.com/Default.asp‘ Http-Version=u‘HTTP/1.0‘ Host=u‘msid.msn.com‘ User-Agent=u‘Mozilla/3.01 (X11; I; SunOS 4.1.4 sun4u)‘ Accept=u‘image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*‘ Headers=u‘Host: msid.msn.com\r\nAccept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*\r\nUser-Agent: Mozilla/3.01 (X11; I; SunOS 4.1.4 sun4u)‘ |>>>>>
Ethernet
00:60:97:de:54:36
00:00:0c:04:41:bc
IP
207.46.176.51
172.16.113.84
TCP
1814
80
<Ether dst=00:60:97:de:54:36 src=00:00:0c:04:41:bc type=0x800 |<IP version=4L ihl=5L tos=0x0 len=256 id=7888 flags=DF frag=0L ttl=63 proto=tcp chksum=0x7c85 src=172.16.113.84 dst=207.46.179.15 options=[] |<TCP sport=1876 dport=http seq=3929480773L ack=300743146 dataofs=5L reserved=0L flags=PA window=32120 chksum=0x7f03 urgptr=0 options=[] |<HTTP |<HTTPRequest Method=u‘GET‘ Path=u‘/Default.asp?newguid=1e5bf4633b9f11d2a26600805fb7e334‘ Http-Version=u‘HTTP/1.0‘ Host=u‘home.microsoft.com‘ User-Agent=u‘Mozilla/3.01 (X11; I; SunOS 4.1.4 sun4u)‘ Accept=u‘image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*‘ Headers=u‘Host: home.microsoft.com\r\nAccept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*\r\nUser-Agent: Mozilla/3.01 (X11; I; SunOS 4.1.4 sun4u)‘ |>>>>>
Ethernet
00:60:97:de:54:36
00:00:0c:04:41:bc
IP
207.46.179.15
172.16.113.84
TCP
1876
80
这里也有个比较重要的属性 payload ,可以获取上一层协议的数据,比如:
for p in packages:
print repr(p)
print p.name
print p.payload.name
print p.payload.payload.name
这里就可以把前三层所使用的协议名打出来:
<Ether dst=00:60:97:de:54:36 src=00:00:0c:04:41:bc type=0x800 |<IP version=4L ihl=5L tos=0x0 len=204 id=7838 flags=DF frag=0L ttl=63 proto=tcp chksum=0x7ceb src=172.16.113.84 dst=207.46.179.15 options=[] |<TCP sport=1751 dport=http seq=2094829820 ack=3501118724L dataofs=5L reserved=0L flags=PA window=32120 chksum=0x91d6 urgptr=0 options=[] |<HTTP |<HTTPRequest Method=u‘GET‘ Path=u‘/‘ Http-Version=u‘HTTP/1.0‘ Host=u‘home.microsoft.com‘ User-Agent=u‘Mozilla/3.01 (X11; I; SunOS 4.1.4 sun4u)‘ Accept=u‘image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*‘ Headers=u‘Host: home.microsoft.com\r\nAccept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*\r\nUser-Agent: Mozilla/3.01 (X11; I; SunOS 4.1.4 sun4u)‘ |>>>>>
Ethernet
IP
TCP
<Ether dst=00:60:97:de:54:36 src=00:00:0c:04:41:bc type=0x800 |<IP version=4L ihl=5L tos=0x0 len=256 id=7878 flags=DF frag=0L ttl=63 proto=tcp chksum=0x7f6b src=172.16.113.84 dst=207.46.176.51 options=[] |<TCP sport=1814 dport=http seq=1495749711 ack=1496378949 dataofs=5L reserved=0L flags=PA window=32120 chksum=0xfd2b urgptr=0 options=[] |<HTTP |<HTTPRequest Method=u‘GET‘ Path=u‘/mps_id_sharing/redirect.asp?home.microsoft.com/Default.asp‘ Http-Version=u‘HTTP/1.0‘ Host=u‘msid.msn.com‘ User-Agent=u‘Mozilla/3.01 (X11; I; SunOS 4.1.4 sun4u)‘ Accept=u‘image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*‘ Headers=u‘Host: msid.msn.com\r\nAccept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*\r\nUser-Agent: Mozilla/3.01 (X11; I; SunOS 4.1.4 sun4u)‘ |>>>>>
Ethernet
IP
TCP
<Ether dst=00:60:97:de:54:36 src=00:00:0c:04:41:bc type=0x800 |<IP version=4L ihl=5L tos=0x0 len=256 id=7888 flags=DF frag=0L ttl=63 proto=tcp chksum=0x7c85 src=172.16.113.84 dst=207.46.179.15 options=[] |<TCP sport=1876 dport=http seq=3929480773L ack=300743146 dataofs=5L reserved=0L flags=PA window=32120 chksum=0x7f03 urgptr=0 options=[] |<HTTP |<HTTPRequest Method=u‘GET‘ Path=u‘/Default.asp?newguid=1e5bf4633b9f11d2a26600805fb7e334‘ Http-Version=u‘HTTP/1.0‘ Host=u‘home.microsoft.com‘ User-Agent=u‘Mozilla/3.01 (X11; I; SunOS 4.1.4 sun4u)‘ Accept=u‘image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*‘ Headers=u‘Host: home.microsoft.com\r\nAccept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*\r\nUser-Agent: Mozilla/3.01 (X11; I; SunOS 4.1.4 sun4u)‘ |>>>>>
Ethernet
IP
TCP