Python - 如果 DOB 和 Address1 和 Address2 和 PostCode 为 NULL，则将行作为 Badrecord 移动到新数据帧

Posted 2023-04-17

技术标签:

【中文标题】Python - 如果 DOB 和 Address1 和 Address2 和 PostCode 为 NULL，则将行作为 Badrecord 移动到新数据帧【英文标题】：Python - Move the rows to new dataframe as Badrecord if DOB and Address1 and Address2 and PostCode have NULL 【发布时间】：2019-08-21 11:04:34 【问题描述】：

我正在尝试将所有 4 列 DOB、Address1、address2 和 Postcode 中具有 NULL 值的行移动到一个新的数据框，并保留原始数据农场的干净记录

我已经尝试使用以下代码解决它

import numpy as np
import pandas as pd
BadRecords = Data.dropna(subset=['DOB','Address1','Address2','PostCode'], how='any') 
print(BadRecords)

当前代码正在打印整个数据集。它应该只过滤 DOB、Address1、Address2 和 postcode 全部 4 为 NULL 的记录

【问题讨论】：

【参考方案1】：

要获取具有空值的记录，您可以像这样过滤原始集：

from pyspark.sql.functions import col, isnull
badRecords = Data.filter(isnull(col('DOB')) & isnull(col('Address1')) & isnull(col('Address2')) & isnull(col('PostCode')))
display(badRecords)

dropna 函数返回一个新的数据框，省略空值的行，因此您只能获得“好”记录

goodRecords = Data.dropna(subset=['DOB','Address1','Address2','PostCode'], how='all')

还要注意how='any' 将删除至少有一个值为空的行，因此如果您只想在所有行都为空时过滤行，则需要使用“全部”设置。

【讨论】：

以上是关于Python - 如果 DOB 和 Address1 和 Address2 和 PostCode 为 NULL，则将行作为 Badrecord 移动到新数据帧的主要内容，如果未能解决你的问题，请参考以下文章