如果检测字符串中的CSV列使用python在日志文件中存在吗?

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如果检测字符串中的CSV列使用python在日志文件中存在吗?相关的知识,希望对你有一定的参考价值。

我怎样才能在日志文件中某列,如果它匹配的CSV文件的第一列?如果没有与之匹配它,然后打印“未检测出”,那么它是否匹配:得到日志文件中的某些列。我一直在解决这个2天,请帮助

这是我的我的日志文件的结构:全内容:trendx.log

1537761898  0   1   1   1537733097  1537733098  1537733097  8224    74  215552  06a60c6018a42b1db22e3bf8620861711401c4bb.crdownload TROJ.Win32.TRX.XXPE50FFF026 c:\users\administrator\desktop\downloader\download\     TRENDX  172.20.4.179    Administrator           c1f387a6f45414366755b0a1874b36ff9596d8ad        AABACACCBIiAgXWACAAAAAAAAAAAAACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=        TSPY_HPDYRE.SM;TSPY_GOLROTED.ACS;TROJ_GEN.R021C0FFO15;TROJ_GEN.R021C0DLO15;Ransom_HPCRYPTESLA.SM2;

和样品我的csv文件:全文:sha1_vsdt.csv

SHA-1,VSDT,
0191a23ee122bdb0c69008971e365ec530bf03f5,MIME 6010-0,
02b809d4edee752d9286677ea30e8a76114aa324,Microsoft RTF 6008-0
0349e0101d8458b6d05860fbee2b4a6d7fa2038d,Adobe Portable Document Format(PDF)

我用这个,但我不知道为什么它输出未被发现然后开始线30到552它并不显示任何内容:

import numpy as np
import pandas as pd
import csv

#Log data into dataframe using genfromtxt
logdata = np.genfromtxt("trendx.log",invalid_raise = False,dtype=str, comments=None,usecols=np.arange(0,24))
logframe = pd.DataFrame(logdata)
#Dataframe trimmed to use only SHA1, PRG and IP
df2=(logframe[[10,14,15]]).rename(columns=10:'SHA1', 14: 'PRG',15:'IP')


#sha1_vsdt data into dataframe using read_csv
df1=pd.read_csv("sha1_vsdt.csv",delimiter=",",error_bad_lines=False,engine = 'python',quoting=3)
#Using merge to compare the two CSV
df = pd.merge(df1, df2, left_on='SHA-1', right_on='SHA1', how='left').replace(np.nan, 'undetected', regex=True)
print(df[['SHA-1','VSDT','PRG','IP']])

输出:

0             0191a23ee122bdb0c69008971e365ec530bf03f5     ...      undetected
1             02b809d4edee752d9286677ea30e8a76114aa324     ...      undetected
2             0349e0101d8458b6d05860fbee2b4a6d7fa2038d     ...      undetected
3             035a7afca8b72cf1c05f6062814836ee31091559     ...      undetected
4             042065bec5a655f3daec1442addf5acb8f1aa824     ...      undetected
5             04939e040d9e85f84d2e2eb28343d94a50ed46ac     ...      undetected
6             04a1876724b53a016cd9e9c93735985938c91fa4     ...      undetected
7             06109df23f7d5deadf0b2c158af1f71c2997d245     ...      undetected
8             06194c240c12c51b55d2961ae287fd9628e05751     ...      undetected
9             0665de1ad83715cc6e68d00ed700c469944a5925     ...      undetected
10            067b448f4c9782489e5ff60c31c62b7059e500b2     ...      undetected
11            0688e6966b0e4a1f58d2f3de48f960fce5b42292     ...      undetected
12            0689f6f99d10dd8bf396f2d2c73ce9dcb6dcad23     ...      undetected
13            06a60c6018a42b1db22e3bf8620861711401c4bb     ...      undetected
14            0723a895a5f8b2d5d25b4303e9f04d16551791b6     ...      undetected
15            07344621cf4480c430f8931af2b2b056775af7e3     ...      undetected
16            07831df482f1a34310fc4f5a092c333eeaff4380     ...      undetected
17            08386105057cd5867480095696a5ca6701fdb8ad     ...      undetected
18            0ad5f62b4ec10397b7d13433a8dc794dc6d4f273     ...      undetected
19            0bed7d032d5c51f606befd2f10b94e5c75a6a1e3     ...      undetected
20            0c3f8d2cce9e7a6e5604b8d0c9fbe1ff6fd5cebb     ...      undetected
21            0c793b4f4e0be7f24f93786d7d4a719a7a002a0d     ...      undetected
22            0c7c2b2d05a5c712f4b9302b82fb54007210937f     ...      undetected
23            0d03da55b246252fb5b440a23943426bda965bcd     ...      undetected
24            0d592f948a4f7bfa95c7cb09faf067ce9fbc9375     ...      undetected
25            0df65d8a57c8349e044f98deda17d70d0c4f926a     ...      undetected
26            0e13d281af08954102e7caf95864ef553c7277bd     ...      undetected
27            0ede12d9c17564e803f51de4d279e84623c5a8a6     ...      undetected
28            0fc4f3a30684bb17cbcbf4e3def2ac3528a2f04c     ...      undetected
29            0fcb475fcadd8d8e3b8dd5f4376feda48c73fd24     ...      undetected
..                                                 ...     ...             ...
553           ef90b17c18c3c5960726964cff12b6d6ef22f3f4     ...      undetected
554           effbed4e7e619009def1c4322f68092eb9cc197f     ...      undetected
555           f081c8a737f87167fef83d03405c1fbe55a46986     ...      undetected
556           f1304ad198045ebb93e70252f0dda9d68acd83f1     ...      undetected
557           f14762b5ce92f2713c584140d694ce25f7beb9c2     ...      undetected
558           f187959d6afa483d18c69b9e334575781009cd31     ...      undetected
559           f1ae32a92f89f54e542973a98eb3dcbe05fe9c58     ...      undetected
560           f28217b5928e4d2fbbc5ca45bd815b1c3963bed2     ...      undetected
561           f36687584c4bc38f2aed5511930b50eea378c1bf     ...      undetected
562           f4846b38f52805ffa2d0ae392df05bbeb8fee2b5     ...      undetected
563           f4b8b762feb426de46a0d19b86f31173e0e77c2e     ...      undetected
564           f4d0cc44a8018c807b9b1865ce2dd70f027d2ceb     ...      undetected
565           f4fcbbdf8c797c96dd1a3e76baf666c319f52aa8     ...      undetected
566           f6c9b393b5148e45138f724cebf5b1e2fd8d9bc7     ...      undetected
567           f8910d7869be647d2ec6c49ddf6fef49ed0f09d0     ...      undetected
568           f90c38a3d623ea47b129b386d841614d9a290f0a     ...      undetected
569           f99c069d5ababc7001aa46a494a0400a913a109c     ...      undetected
570           f9d2c6e2438fc4571f7ea4f639b2950ddd1307e5     ...      undetected
571           fa2229ef95b9e45e881ac27004c2a90f6c6e0947     ...      undetected
572           fac66887402b4ac4a39696f3f8830a6ec34585be     ...      undetected
573           fb2086d390c1755b53580013c727398d9fb5c01b     ...      undetected
574           fb59aa51fec66f8caf409b1ca2b80e7fdaf33c61     ...      undetected
575  fc39dfde0 -X=0/0 -X=0/0 -...     ...      undetected
576           fcb12edabdb2e59916f2f84f204c3e8ec13d1135     ...      undetected
577           fcbbfeb67cd2902de545fb159b0eed7343aeb502     ...      undetected
578           fced05723f49b6d0836e065a436e8c3b8df2bc12     ...      undetected
579           fd1cada68f4a9452275d292fe4b9f76a4bd8bd8b     ...      undetected
580           fe5babc1e4f11e205457f2ec616f117fd4f4e326     ...      undetected
581           fe8c341de79168a1254154f4e4403857c6e79c46     ...      undetected
582           fe91021461e48fe82449d2ad73bcc66f6c508152     ...      undetected

这是我期望的输出:

18            0ad5f62b4ec10397b7d13433a8dc794dc6d4f273      ...           undetected
19            0bed7d032d5c51f606befd2f10b94e5c75a6a1e3      ...        Administrator
20            0c3f8d2cce9e7a6e5604b8d0c9fbe1ff6fd5cebb      ...           undetected

以上是关于如果检测字符串中的CSV列使用python在日志文件中存在吗?的主要内容,如果未能解决你的问题,请参考以下文章

想用Python对csv表格中的某一列数据进行关键词筛选

Python:比较 2 个 csv 文件中的 3 列,如果相等则输出

HSQL CSV 文本表正在读取多行的单列

Google-BigQuery - CSV 文件的架构解析

python - 使用一个标准(从开尔文到摄氏度)替换csv中特定列中的一些值。

python读写csv时中文乱码问题解决办法