如果检测字符串中的CSV列使用python在日志文件中存在吗?
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如果检测字符串中的CSV列使用python在日志文件中存在吗?相关的知识,希望对你有一定的参考价值。
我怎样才能在日志文件中某列,如果它匹配的CSV文件的第一列?如果没有与之匹配它,然后打印“未检测出”,那么它是否匹配:得到日志文件中的某些列。我一直在解决这个2天,请帮助
这是我的我的日志文件的结构:全内容:trendx.log
1537761898 0 1 1 1537733097 1537733098 1537733097 8224 74 215552 06a60c6018a42b1db22e3bf8620861711401c4bb.crdownload TROJ.Win32.TRX.XXPE50FFF026 c:\users\administrator\desktop\downloader\download\ TRENDX 172.20.4.179 Administrator c1f387a6f45414366755b0a1874b36ff9596d8ad AABACACCBIiAgXWACAAAAAAAAAAAAACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= TSPY_HPDYRE.SM;TSPY_GOLROTED.ACS;TROJ_GEN.R021C0FFO15;TROJ_GEN.R021C0DLO15;Ransom_HPCRYPTESLA.SM2;
和样品我的csv文件:全文:sha1_vsdt.csv
SHA-1,VSDT,
0191a23ee122bdb0c69008971e365ec530bf03f5,MIME 6010-0,
02b809d4edee752d9286677ea30e8a76114aa324,Microsoft RTF 6008-0
0349e0101d8458b6d05860fbee2b4a6d7fa2038d,Adobe Portable Document Format(PDF)
我用这个,但我不知道为什么它输出未被发现然后开始线30到552它并不显示任何内容:
import numpy as np
import pandas as pd
import csv
#Log data into dataframe using genfromtxt
logdata = np.genfromtxt("trendx.log",invalid_raise = False,dtype=str, comments=None,usecols=np.arange(0,24))
logframe = pd.DataFrame(logdata)
#Dataframe trimmed to use only SHA1, PRG and IP
df2=(logframe[[10,14,15]]).rename(columns=10:'SHA1', 14: 'PRG',15:'IP')
#sha1_vsdt data into dataframe using read_csv
df1=pd.read_csv("sha1_vsdt.csv",delimiter=",",error_bad_lines=False,engine = 'python',quoting=3)
#Using merge to compare the two CSV
df = pd.merge(df1, df2, left_on='SHA-1', right_on='SHA1', how='left').replace(np.nan, 'undetected', regex=True)
print(df[['SHA-1','VSDT','PRG','IP']])
输出:
0 0191a23ee122bdb0c69008971e365ec530bf03f5 ... undetected
1 02b809d4edee752d9286677ea30e8a76114aa324 ... undetected
2 0349e0101d8458b6d05860fbee2b4a6d7fa2038d ... undetected
3 035a7afca8b72cf1c05f6062814836ee31091559 ... undetected
4 042065bec5a655f3daec1442addf5acb8f1aa824 ... undetected
5 04939e040d9e85f84d2e2eb28343d94a50ed46ac ... undetected
6 04a1876724b53a016cd9e9c93735985938c91fa4 ... undetected
7 06109df23f7d5deadf0b2c158af1f71c2997d245 ... undetected
8 06194c240c12c51b55d2961ae287fd9628e05751 ... undetected
9 0665de1ad83715cc6e68d00ed700c469944a5925 ... undetected
10 067b448f4c9782489e5ff60c31c62b7059e500b2 ... undetected
11 0688e6966b0e4a1f58d2f3de48f960fce5b42292 ... undetected
12 0689f6f99d10dd8bf396f2d2c73ce9dcb6dcad23 ... undetected
13 06a60c6018a42b1db22e3bf8620861711401c4bb ... undetected
14 0723a895a5f8b2d5d25b4303e9f04d16551791b6 ... undetected
15 07344621cf4480c430f8931af2b2b056775af7e3 ... undetected
16 07831df482f1a34310fc4f5a092c333eeaff4380 ... undetected
17 08386105057cd5867480095696a5ca6701fdb8ad ... undetected
18 0ad5f62b4ec10397b7d13433a8dc794dc6d4f273 ... undetected
19 0bed7d032d5c51f606befd2f10b94e5c75a6a1e3 ... undetected
20 0c3f8d2cce9e7a6e5604b8d0c9fbe1ff6fd5cebb ... undetected
21 0c793b4f4e0be7f24f93786d7d4a719a7a002a0d ... undetected
22 0c7c2b2d05a5c712f4b9302b82fb54007210937f ... undetected
23 0d03da55b246252fb5b440a23943426bda965bcd ... undetected
24 0d592f948a4f7bfa95c7cb09faf067ce9fbc9375 ... undetected
25 0df65d8a57c8349e044f98deda17d70d0c4f926a ... undetected
26 0e13d281af08954102e7caf95864ef553c7277bd ... undetected
27 0ede12d9c17564e803f51de4d279e84623c5a8a6 ... undetected
28 0fc4f3a30684bb17cbcbf4e3def2ac3528a2f04c ... undetected
29 0fcb475fcadd8d8e3b8dd5f4376feda48c73fd24 ... undetected
.. ... ... ...
553 ef90b17c18c3c5960726964cff12b6d6ef22f3f4 ... undetected
554 effbed4e7e619009def1c4322f68092eb9cc197f ... undetected
555 f081c8a737f87167fef83d03405c1fbe55a46986 ... undetected
556 f1304ad198045ebb93e70252f0dda9d68acd83f1 ... undetected
557 f14762b5ce92f2713c584140d694ce25f7beb9c2 ... undetected
558 f187959d6afa483d18c69b9e334575781009cd31 ... undetected
559 f1ae32a92f89f54e542973a98eb3dcbe05fe9c58 ... undetected
560 f28217b5928e4d2fbbc5ca45bd815b1c3963bed2 ... undetected
561 f36687584c4bc38f2aed5511930b50eea378c1bf ... undetected
562 f4846b38f52805ffa2d0ae392df05bbeb8fee2b5 ... undetected
563 f4b8b762feb426de46a0d19b86f31173e0e77c2e ... undetected
564 f4d0cc44a8018c807b9b1865ce2dd70f027d2ceb ... undetected
565 f4fcbbdf8c797c96dd1a3e76baf666c319f52aa8 ... undetected
566 f6c9b393b5148e45138f724cebf5b1e2fd8d9bc7 ... undetected
567 f8910d7869be647d2ec6c49ddf6fef49ed0f09d0 ... undetected
568 f90c38a3d623ea47b129b386d841614d9a290f0a ... undetected
569 f99c069d5ababc7001aa46a494a0400a913a109c ... undetected
570 f9d2c6e2438fc4571f7ea4f639b2950ddd1307e5 ... undetected
571 fa2229ef95b9e45e881ac27004c2a90f6c6e0947 ... undetected
572 fac66887402b4ac4a39696f3f8830a6ec34585be ... undetected
573 fb2086d390c1755b53580013c727398d9fb5c01b ... undetected
574 fb59aa51fec66f8caf409b1ca2b80e7fdaf33c61 ... undetected
575 fc39dfde0 -X=0/0 -X=0/0 -... ... undetected
576 fcb12edabdb2e59916f2f84f204c3e8ec13d1135 ... undetected
577 fcbbfeb67cd2902de545fb159b0eed7343aeb502 ... undetected
578 fced05723f49b6d0836e065a436e8c3b8df2bc12 ... undetected
579 fd1cada68f4a9452275d292fe4b9f76a4bd8bd8b ... undetected
580 fe5babc1e4f11e205457f2ec616f117fd4f4e326 ... undetected
581 fe8c341de79168a1254154f4e4403857c6e79c46 ... undetected
582 fe91021461e48fe82449d2ad73bcc66f6c508152 ... undetected
这是我期望的输出:
18 0ad5f62b4ec10397b7d13433a8dc794dc6d4f273 ... undetected
19 0bed7d032d5c51f606befd2f10b94e5c75a6a1e3 ... Administrator
20 0c3f8d2cce9e7a6e5604b8d0c9fbe1ff6fd5cebb ... undetected
以上是关于如果检测字符串中的CSV列使用python在日志文件中存在吗?的主要内容,如果未能解决你的问题,请参考以下文章
Python:比较 2 个 csv 文件中的 3 列,如果相等则输出