如何在文件目录中群集多个csv文件
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何在文件目录中群集多个csv文件相关的知识,希望对你有一定的参考价值。
我有多个具有相同数据结构的csv文件
我想一次群集每个csv文件
import os
import pandas as pd
import numpy as np
from sklearn import metrics
import glob
df = pd.read_csv('File 000rejoice-19.csv')
can=df.drop(columns =['pat'])
from sklearn.cluster import DBSCAN
dbscan=DBSCAN(eps=3,min_samples=4)
X = can.iloc[:, [1,2,3,4]].values
X.shape
model=dbscan.fit(X)
labels=model.labels_
sample_cores=np.zeros_like(labels,dtype=bool)
sample_cores[dbscan.core_sample_indices_]=True
n_clusters=len(set(labels))- (1 if -1 in labels else 0)
n_clusters
此代码仅适用于一个csv文件,我想一次群集多个csv文件
.
。 from os import listdir
from sklearn.cluster import DBSCAN
for file in listdir('.'):
pd.read_csv(file)
can=df.drop(columns =['pat'])
dbscan=DBSCAN(eps=3,min_samples=4)
X = can.iloc[:, [1,2,3,4]].values
X.shape
model=dbscan.fit(X)
labels=model.labels_
sample_cores=np.zeros_like(labels,dtype=bool)
sample_cores[dbscan.core_sample_indices_]=True
n_clusters=len(set(labels))- (1 if -1 in labels else 0)
print(file, n_clusters)
1. read multi files into DataFrame
2. use cluster to deal with DataFrame in step1
由于@denyce提供了本地示例,我可以使用AWS S3给出step1示例
import boto3 def f(bucket, key, region_name, access_mode): s3_resource = boto3.resource('s3', region_name=region_name) s3_bucket = s3_resource.Bucket(bucket) df_list = [] s3_objs = s3_bucket.objects.filter(Prefix=key) for s3_prefix_obj in s3_objs: s3_prefix_df = s3_prefix_obj.get()['Body'].read() # some medium work df_list.append(s3_prefix_df) # combine data together df = pd.concat(s3_prefix_df_list) # step2, do cluster as you described, now df contains all files in s3 folder can=df.drop(columns =['pat']) dbscan=DBSCAN(eps=3,min_samples=4) X = can.iloc[:, [1,2,3,4]].values ....
以上是关于如何在文件目录中群集多个csv文件的主要内容,如果未能解决你的问题,请参考以下文章