从orgmode表创建DataFrame

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了从orgmode表创建DataFrame相关的知识,希望对你有一定的参考价值。

有没有办法从orgmode(ascii)表创建Pandas DataFrame?

所以我有这个:

data = """
| binance         | BTC   | Bitcoin           |      0.00000386 | Buy | 0 |
| binance         | DNT   | district0x        |            1998 | Buy | 0 |
| binance         | TNT   | Tierion           |        1855.143 | Buy | 0 |
| binance         | VIB   | Viberate          |             999 | Buy | 0 |
| Coinexchange.io | BUZZ  | BuzzCoin          |          500000 | Buy | 0 |
| Coinexchange.io | ECC   | ECC               |       81094.078 | Buy | 0 |
| Coinexchange.io | ESP   | Espers            | 509079.92787805 | Buy | 0 |
| Coinexchange.io | MOON  | Mooncoin          |       1496999.5 | Buy | 0 |
| Coinexchange.io | TIPS  | FedoraCoin        |         4989997 | Buy | 0 |
| Coinexchange.io | VOISE | Voise             |            5000 | Buy | 0 |
| Coinexchange.io | VSX   | Vsync             |            5000 | Buy | 0 |
| Coinexchange.io | XP    | Experience Points |          100000 | Buy | 0 |
| Cryptopia       | BTC   | Bitcoin           |            1e-8 | Buy | 0 |
| Cryptopia       | DGB   | DigiByte          |           10000 | Buy | 0 |
| Cryptopia       | XBY   | XTRABYTES         |  17458.51615734 | Buy | 0 |
"""

并创建了一个像这样的Pandas数据帧:

import io
import pandas as pd
from tabulate import tabulate  # <- just for demo purpose (printing out df)

data = """
| binance         | BTC   | Bitcoin           |      0.00000386 | Buy | 0 |
| binance         | DNT   | district0x        |            1998 | Buy | 0 |
| binance         | TNT   | Tierion           |        1855.143 | Buy | 0 |
| binance         | VIB   | Viberate          |             999 | Buy | 0 |
| Coinexchange.io | BUZZ  | BuzzCoin          |          500000 | Buy | 0 |
| Coinexchange.io | ECC   | ECC               |       81094.078 | Buy | 0 |
| Coinexchange.io | ESP   | Espers            | 509079.92787805 | Buy | 0 |
| Coinexchange.io | MOON  | Mooncoin          |       1496999.5 | Buy | 0 |
| Coinexchange.io | TIPS  | FedoraCoin        |         4989997 | Buy | 0 |
| Coinexchange.io | VOISE | Voise             |            5000 | Buy | 0 |
| Coinexchange.io | VSX   | Vsync             |            5000 | Buy | 0 |
| Coinexchange.io | XP    | Experience Points |          100000 | Buy | 0 |
| Cryptopia       | BTC   | Bitcoin           |            1e-8 | Buy | 0 |
| Cryptopia       | DGB   | DigiByte          |           10000 | Buy | 0 |
| Cryptopia       | XBY   | XTRABYTES         |  17458.51615734 | Buy | 0 |
"""

raw_data = io.StringIO(data)
df = pd.read_csv(raw_data, sep='|', header=None)   # << Relevant line
print(tabulate(df))

这就是我得到的:

 0  nan  binance          BTC    Bitcoin                 3.86e-06   Buy  0  nan
 1  nan  binance          DNT    district0x           1998          Buy  0  nan
 2  nan  binance          TNT    Tierion              1855.14       Buy  0  nan
 3  nan  binance          VIB    Viberate              999          Buy  0  nan
 4  nan  Coinexchange.io  BUZZ   BuzzCoin           500000          Buy  0  nan
 5  nan  Coinexchange.io  ECC    ECC                 81094.1        Buy  0  nan
 6  nan  Coinexchange.io  ESP    Espers             509080          Buy  0  nan
 7  nan  Coinexchange.io  MOON   Mooncoin                1.497e+06  Buy  0  nan
 8  nan  Coinexchange.io  TIPS   FedoraCoin              4.99e+06   Buy  0  nan
 9  nan  Coinexchange.io  VOISE  Voise                5000          Buy  0  nan
10  nan  Coinexchange.io  VSX    Vsync                5000          Buy  0  nan
11  nan  Coinexchange.io  XP     Experience Points  100000          Buy  0  nan
12  nan  Cryptopia        BTC    Bitcoin                 1e-08      Buy  0  nan
13  nan  Cryptopia        DGB    DigiByte            10000          Buy  0  nan
14  nan  Cryptopia        XBY    XTRABYTES           17458.5        Buy  0  nan

但这并不完美,因为我必须删除字符串列中所有多余的空格。另外,我必须删除第一个和最后一个col。 (空)。

那么有更方便的方法吗?

答案

您可以将正则表达式传递给sep参数。由于C解析器无法处理> 1个char分隔符,因此请使用engine='python'

df = pd.read_csv(raw_data, sep=r's*|s*', header=None, engine='python')

以上是关于从orgmode表创建DataFrame的主要内容,如果未能解决你的问题,请参考以下文章

从底部工作表对话框片段中获取价值

保存从配置单元表中的 oracle 查询创建的 Spark DataFrame?

emacs orgmode不在标题之间插入行?

将数据从底部工作表对话框片段传递到片段

从 Pandas Dataframe 错误创建 Spark DataFrame

Python:用于元组的 Pandas DataFrame