定义跳过 nan 值的 pandas 数据帧的开始

Posted 2023-03-11

技术标签:

【中文标题】定义跳过 nan 值的 pandas 数据帧的开始【英文标题】：define start of a pandas dataframe skipping nan values 【发布时间】：2020-05-29 10:54:22 【问题描述】：

我正在阅读一个我想在 nan 值之后开始几行的 excel：

NaN
NaN
NaN
NaN
Code

我是这样做的：

for data in range(len(df)):
   try:
      if 'Code' in df.iloc[data,0]:
      df = df.iloc[data:,:]
   except:
      passs

但是这样我会错过其他解析错误

我正在尝试这样做：

if pd.isna(df.iloc[data,0]):
    pass
if 'Code' in str(df.iloc[data,0]):
    df = df.iloc[data:,:]

但我明白了：

argument of type 'float' is not iterable in the 'Code' line

任何帮助更有效地解决这个问题

【问题讨论】：

这是你想要的吗？ df[df['col'].notnull()] 每个电子表格的Code 值是否位于同一行中？然后你可以只使用skiprows 参数。你能澄清你的问题吗？你真的应该阅读 Pandas 文档。 【参考方案1】：

我没有太多使用pandas 的经验，但是当我查看reading_excel 的文档时，我得到了以下信息，这可能会帮助您跳过NaN 值。

您可以在阅读excel时传递以下参数

na_values keep_default_na na_filter

您可以获取更多信息here

【讨论】：

【参考方案2】：

Skiprows 将有助于在从 excel 读取时跳过一些行并读取数据..

df_can = pd.read_excel('https://....Canada.xlsx',
                   sheet_name='Canada by Citizenship',
                   skiprows=range(20),
                   skipfooter=2)

请查看此要点，了解如何在跳过某些行后将 excel 文件读入 pandas 数据帧。

https://gist.github.com/dhamayanthim80/b0d861d7cffe48094f89fd8e05609e17

对不起，如果我的回答与您的问题无关。

reading excel to a python data frame starting from row 5 and including headers

请检查这是否有帮助。

【讨论】：

以上是关于定义跳过 nan 值的 pandas 数据帧的开始的主要内容，如果未能解决你的问题，请参考以下文章