如何将字符串转换为数据框

Posted

技术标签:

【中文标题】如何将字符串转换为数据框【英文标题】:How to convert strings into dataframe 【发布时间】:2022-01-13 17:59:39 【问题描述】:

我有一个 bytes 对象,这是我想要使用的对象。

bytes_1 = "granularity":"Monthly","main_domain_only":false,"mtd":false,"show_verified":false,"state":null,"page":null,"format":"json","domain":"website.com","start_date":"2019-10-01","end_date":"2021-10-31","country":"uk","status":"Success","last_updated":"2021-10-31","visits":["date":"2019-10-01","visits":42641.106198532005,"date":"2019-11-01","visits":39084.75203858769,"date":"2019-12-01","visits":75293.20188556636,"date":"2020-01-01","visits":74846.32665257844,"date":"2020-02-01","visits":53411.33746849558,"date":"2020-03-01","visits":50202.09919672746,"date":"2020-04-01","visits":83135.4077079868,"date":"2020-05-01","visits":128402.42646177398,"date":"2020-06-01","visits":142254.20581500718,"date":"2020-07-01","visits":97795.90976634984,"date":"2020-08-01","visits":153057.2435480025,"date":"2020-09-01","visits":174668.65913280132,"date":"2020-10-01","visits":128082.17226849863,"date":"2020-11-01","visits":117737.94226795572,"date":"2020-12-01","visits":139259.13459326507,"date":"2021-01-01","visits":129572.35638477515,"date":"2021-02-01","visits":104814.00413267144,"date":"2021-03-01","visits":48927.388186319484,"date":"2021-04-01","visits":30901.658623907377,"date":"2021-05-01","visits":34564.981543265196,"date":"2021-06-01","visits":51215.85515078678,"date":"2021-07-01","visits":23632.959350567497,"date":"2021-08-01","visits":32988.756167336134,"date":"2021-09-01","visits":214154.73499697837,"date":"2021-10-01","visits":22844.79558982703]'

我可以指定一个start_dateend_date 参数,它会根据我输入的日期将每日visits 数据作为字符串中的上述格式提取。

出于分析目的,我想将上面的字符串转换为数据帧,但我真的想不出一种方法来完成这项工作。

我首先尝试将其转换为字符串并尝试io.StringIO,但它无法正常工作。

【问题讨论】:

是字符串中的整个字节对象吗?我发现一些]' 在您的示例中不匹配 【参考方案1】:

bytes_1 数据几乎可以被解析为 JSON。如果我们进行一些字符串编辑,我们可以创建一个 DataFrame,如下所示:

import json
import pandas as pd

# touch up the input data
bdata_fix = bytes_1.replace('"uk",', '"uk",'). \
    replace('"last_updated":"2021-10-31",', '"last_updated":"2021-10-31",'). \
    rstrip("'")

info = json.loads(bdata_fix)
df = pd.DataFrame(info['visits'])

给我们数据框

          date         visits
0   2019-10-01   42641.106199
1   2019-11-01   39084.752039
2   2019-12-01   75293.201886
3   2020-01-01   74846.326653
4   2020-02-01   53411.337468
5   2020-03-01   50202.099197
6   2020-04-01   83135.407708
7   2020-05-01  128402.426462
8   2020-06-01  142254.205815
9   2020-07-01   97795.909766
10  2020-08-01  153057.243548
11  2020-09-01  174668.659133
12  2020-10-01  128082.172268
13  2020-11-01  117737.942268
14  2020-12-01  139259.134593
15  2021-01-01  129572.356385
16  2021-02-01  104814.004133
17  2021-03-01   48927.388186
18  2021-04-01   30901.658624
19  2021-05-01   34564.981543
20  2021-06-01   51215.855151
21  2021-07-01   23632.959351
22  2021-08-01   32988.756167
23  2021-09-01  214154.734997
24  2021-10-01   22844.795590

【讨论】:

好东西!!非常感谢

以上是关于如何将字符串转换为数据框的主要内容,如果未能解决你的问题,请参考以下文章

如何将字符串 dict 转换为 pyspark 数据框?

如何将包装为字符串的向量转换为熊猫数据框中的numpy数组?

将列表转换为数据框时如何使用“换行”命令?

如何将数据框列转换为字符串并替换 nans(fillna 不起作用)

如何将数据框中的多个“字符串”列转换为日期时间列?

如何将数据框列名从字符串转换为适合(qplot,ggplot2)的参数?