如何将字符串转换为数据框
Posted
技术标签:
【中文标题】如何将字符串转换为数据框【英文标题】:How to convert strings into dataframe 【发布时间】:2022-01-13 17:59:39 【问题描述】:我有一个 bytes
对象,这是我想要使用的对象。
bytes_1 = "granularity":"Monthly","main_domain_only":false,"mtd":false,"show_verified":false,"state":null,"page":null,"format":"json","domain":"website.com","start_date":"2019-10-01","end_date":"2021-10-31","country":"uk","status":"Success","last_updated":"2021-10-31","visits":["date":"2019-10-01","visits":42641.106198532005,"date":"2019-11-01","visits":39084.75203858769,"date":"2019-12-01","visits":75293.20188556636,"date":"2020-01-01","visits":74846.32665257844,"date":"2020-02-01","visits":53411.33746849558,"date":"2020-03-01","visits":50202.09919672746,"date":"2020-04-01","visits":83135.4077079868,"date":"2020-05-01","visits":128402.42646177398,"date":"2020-06-01","visits":142254.20581500718,"date":"2020-07-01","visits":97795.90976634984,"date":"2020-08-01","visits":153057.2435480025,"date":"2020-09-01","visits":174668.65913280132,"date":"2020-10-01","visits":128082.17226849863,"date":"2020-11-01","visits":117737.94226795572,"date":"2020-12-01","visits":139259.13459326507,"date":"2021-01-01","visits":129572.35638477515,"date":"2021-02-01","visits":104814.00413267144,"date":"2021-03-01","visits":48927.388186319484,"date":"2021-04-01","visits":30901.658623907377,"date":"2021-05-01","visits":34564.981543265196,"date":"2021-06-01","visits":51215.85515078678,"date":"2021-07-01","visits":23632.959350567497,"date":"2021-08-01","visits":32988.756167336134,"date":"2021-09-01","visits":214154.73499697837,"date":"2021-10-01","visits":22844.79558982703]'
我可以指定一个start_date
和end_date
参数,它会根据我输入的日期将每日visits
数据作为字符串中的上述格式提取。
出于分析目的,我想将上面的字符串转换为数据帧,但我真的想不出一种方法来完成这项工作。
我首先尝试将其转换为字符串并尝试io.StringIO
,但它无法正常工作。
【问题讨论】:
是字符串中的整个字节对象吗?我发现一些]
和'
在您的示例中不匹配
【参考方案1】:
bytes_1
数据几乎可以被解析为 JSON。如果我们进行一些字符串编辑,我们可以创建一个 DataFrame,如下所示:
import json
import pandas as pd
# touch up the input data
bdata_fix = bytes_1.replace('"uk",', '"uk",'). \
replace('"last_updated":"2021-10-31",', '"last_updated":"2021-10-31",'). \
rstrip("'")
info = json.loads(bdata_fix)
df = pd.DataFrame(info['visits'])
给我们数据框
date visits
0 2019-10-01 42641.106199
1 2019-11-01 39084.752039
2 2019-12-01 75293.201886
3 2020-01-01 74846.326653
4 2020-02-01 53411.337468
5 2020-03-01 50202.099197
6 2020-04-01 83135.407708
7 2020-05-01 128402.426462
8 2020-06-01 142254.205815
9 2020-07-01 97795.909766
10 2020-08-01 153057.243548
11 2020-09-01 174668.659133
12 2020-10-01 128082.172268
13 2020-11-01 117737.942268
14 2020-12-01 139259.134593
15 2021-01-01 129572.356385
16 2021-02-01 104814.004133
17 2021-03-01 48927.388186
18 2021-04-01 30901.658624
19 2021-05-01 34564.981543
20 2021-06-01 51215.855151
21 2021-07-01 23632.959351
22 2021-08-01 32988.756167
23 2021-09-01 214154.734997
24 2021-10-01 22844.795590
【讨论】:
好东西!!非常感谢以上是关于如何将字符串转换为数据框的主要内容,如果未能解决你的问题,请参考以下文章
如何将包装为字符串的向量转换为熊猫数据框中的numpy数组?