根据字典替换数据框列中的值不起作用[重复]

Posted 2023-03-11

技术标签:

【中文标题】根据字典替换数据框列中的值不起作用[重复]【英文标题】：Replace values in columns of dataframe based on dictionary not working [duplicate] 【发布时间】：2021-11-13 06:13:47 【问题描述】：

您可以阅读下面的确切问题，但这基本上是我想要做的：

df1 = pd.DataFrame('A':['A0', 'A1', 'A2', 'A3'],
                        'B': ['B0', 'B1', 'B2', 'B3'],
                        'C': ['C0', 'C1', 'C2', 'C3'],
                        'D': ['D0', 'D1', 'D2', 'D3'])

newVals = dict('A0': 0,
             'A1': 1,
              'A2': 2,
             'A3': 3)
for key, value in newVals.items():
    df1['A'].replace(key, value)

当我这样做时，生成的数据框没有变化。

初始帖子：

好的，我正在分析来自 OSHA (osha_accident_injury.csv) 的工作场所事故数据。每一行都是在事故中受伤的特定人员。每列都是人或事故本身的特征。并且每个特征都被编码为具有相应字符串值的整数。我想用它的字符串定义替换每个整数。数字到字符串的映射列在 osha_accident_lookup.csv 中。事故代码的映射可以在 osha_accident_dictionary.csv 中找到，但我手动将它们输入到地图中。

但是，一些整数映射到多个字符串，因此它还取决于 osha_accident_lookup.csv 中的事故代码。因此，我创建了一个列表，其中包含每个特定事故代码的字典（将整数映射到字符串值）。但是，当我尝试用其特定的字典替换每一列时，它会返回原始数据框，而不是带有字符串值的数据框。谁能看到我做错了什么？

# create list of all distinct accident codes
code_list = []
for index in osha_accident_lookup.index:
    if osha_accident_lookup['accident_code'][index] not in code_list:
        code_list.append(osha_accident_lookup['accident_code'][index])

# remove values not found in actual data
code_list.remove('PTYP')
code_list.remove('COST')
code_list.remove('ENDU')

# create list of dictionaries, s.t. each item maps accident number to accident value
# there is a unique map for each unique accident code
mapList = []
for code in code_list:
    temp_df = pd.DataFrame(osha_accident_lookup[osha_accident_lookup['accident_code'] == code])
    temp_map = dict(zip(temp_df['accident_number'], temp_df['accident_value']))
    mapList.append(temp_map)

# create dictionary that maps code from osha_accident_lookup to column name in osha_accident_injury.csv
code_to_column = dict("OCC": "occ_code", 'CAUS': 'fat_cause', 'DEGR': 'degree_of_inj',
                          "OPER": "const_op_cause", "EN": 'evn_factor', "FT": 'event_type', "HU": 'hum_factor', "IN":
                           "nature_of_inj", "BD": "part_of_body", "SO": "src_of_injury", "TASK": 'task_assigned')

# replace numbers in injury data with string values of what the #'s represent
iterator = 0
for item in mapList:
    code = code_list[iterator]
    col_name = code_to_column[code]
    for key, value in item.items():
        osha_accident_injury[col_name].replace(key: value)
    iterator += 1

osha_accident_injury.csv（前 10 行）：

FIELD1	summary_nr	rel_insp_nr	nature_of_inj	part_of_body	src_of_injury	event_type	evn_factor	hum_factor	degree_of_inj	task_assigned	hazsub	injury_line_nr	load_dt
0	18	10006732	10.0	12.0	15.0	13.0	18.0	1.0	1.0	1.0		1	2017-03-20 01:00:11 EDT
1	26	159996	21.0	19.0	42.0	5.0	13.0	9.0	1.0	1.0		1	2017-03-20 01:00:11 EDT
2	34	10013225	21.0	4.0	19.0	8.0	18.0	1.0	1.0	1.0	0270	1	2017-03-20 01:00:11 EDT
3	42	10014439	1.0	10.0	24.0	2.0	3.0	1.0	2.0	2.0		1	2017-03-20 01:00:11 EDT
4	59	19523588	5.0	4.0	16.0	10.0	9.0	1.0	2.0	1.0		1	2017-03-20 01:00:11 EDT
5	59	19523588	21.0	5.0	16.0	8.0	9.0	14.0	2.0	2.0		2	2017-03-20 01:00:11 EDT
6	59	19523588	21.0	5.0	16.0	6.0	9.0	14.0	2.0	2.0		3	2017-03-20 01:00:11 EDT
7	59	19523588	21.0	5.0	16.0	8.0	9.0	14.0	2.0	2.0		4	2017-03-20 01:00:11 EDT
8	59	19523588	21.0	5.0	16.0	8.0	9.0	14.0	2.0	2.0		5	2017-03-20 01:00:11 EDT
9	59	19523588	21.0	5.0	16.0	8.0	9.0	14.0	2.0	2.0		6	2017-03-20 01:00:11 EDT

osha_accident_lookup.csv（前 10 行）：

accident_code	accident_number	accident_value	load_date
OPER	1	Backfilling and compacting	2018-11-09 20:56:02 EST
OPER	2	Bituminous concrete placement	2018-11-09 20:56:02 EST
OPER	3	Construction of playing fields, tennis courts	2018-11-09 20:56:02 EST
SO	1	AIRCRAFT	2018-11-09 20:56:02 EST
SO	2	AIR PRESSURE	2018-11-09 20:56:02 EST
SO	3	ANIMAL/INS/REPT/ETC.	2018-11-09 20:56:02 EST
OCC	757	Separating, filtering & clarifying mach. operators	2018-11-09 20:56:02 EST
OCC	758	Compressing and compacting machine operators	2018-11-09 20:56:02 EST
OCC	759	Painting and paint spraying machine operators	2018-11-09 20:56:02 EST
OCC	763	Roasting and baking machine operators, food	2018-11-09 20:56:02 EST

osha_data_dictionary.csv（前 10 行）：

table_name	column_name	attribute_name	definition	column_datatype	display_name
osha_accident	nonbuild_ht	Non Building Height	Construction - height in feet when not a building	Numeric, Length=4	Height for Non-Building
osha_accident	project_type	Project Type	Construction - project type (code table PTYP)	Alphanumeric, Length:1	Project Type
osha_accident	event_date	Event Date	Date of accident (yyyymmdd)	Numeric, Length=8	Event Date
osha_accident	event_keyword	Event Keyword	Contains comma separated keywords entered by ERG during the review process.	Alphanumeric, Length:200	Event Keyword
osha_accident	report_id	Report ID	Identifies the OSHA federal or state reporting jurisdiction	Numeric, Length=7	Reporting ID
osha_accident	event_desc	Event Description	Short description of event	Alphanumeric, Length:60	Event Description
osha_accident	load_dt	Load Date Timestamp	The date the load was completed.	date	No Label
osha_accident	summary_nr	Summary NR	Identifies the accident OSHA-170 form	Numeric, Length=9	Summary NR
osha_accident	fatality	Fatality	X=Fatality is associated with accident	Alphanumeric, Length:1	Fatality

【问题讨论】：

尝试使用merge。此外，您还可以通过告诉我们哪个 csv 中的哪些列应该映射到另一个 csv 中的哪些列来提供更多信息我刚刚给出了我的问题的一个抽象版本，它有同样的问题。这会让问题更容易回答吗？ 【参考方案1】：

根据你的例子试试这个方法。

df1['A'] = df1['A'].map(newVals)

【讨论】：

以上是关于根据字典替换数据框列中的值不起作用[重复]的主要内容，如果未能解决你的问题，请参考以下文章