从orgmode表创建DataFrame
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了从orgmode表创建DataFrame相关的知识,希望对你有一定的参考价值。
有没有办法从orgmode(ascii)表创建Pandas DataFrame?
所以我有这个:
data = """
| binance | BTC | Bitcoin | 0.00000386 | Buy | 0 |
| binance | DNT | district0x | 1998 | Buy | 0 |
| binance | TNT | Tierion | 1855.143 | Buy | 0 |
| binance | VIB | Viberate | 999 | Buy | 0 |
| Coinexchange.io | BUZZ | BuzzCoin | 500000 | Buy | 0 |
| Coinexchange.io | ECC | ECC | 81094.078 | Buy | 0 |
| Coinexchange.io | ESP | Espers | 509079.92787805 | Buy | 0 |
| Coinexchange.io | MOON | Mooncoin | 1496999.5 | Buy | 0 |
| Coinexchange.io | TIPS | FedoraCoin | 4989997 | Buy | 0 |
| Coinexchange.io | VOISE | Voise | 5000 | Buy | 0 |
| Coinexchange.io | VSX | Vsync | 5000 | Buy | 0 |
| Coinexchange.io | XP | Experience Points | 100000 | Buy | 0 |
| Cryptopia | BTC | Bitcoin | 1e-8 | Buy | 0 |
| Cryptopia | DGB | DigiByte | 10000 | Buy | 0 |
| Cryptopia | XBY | XTRABYTES | 17458.51615734 | Buy | 0 |
"""
并创建了一个像这样的Pandas数据帧:
import io
import pandas as pd
from tabulate import tabulate # <- just for demo purpose (printing out df)
data = """
| binance | BTC | Bitcoin | 0.00000386 | Buy | 0 |
| binance | DNT | district0x | 1998 | Buy | 0 |
| binance | TNT | Tierion | 1855.143 | Buy | 0 |
| binance | VIB | Viberate | 999 | Buy | 0 |
| Coinexchange.io | BUZZ | BuzzCoin | 500000 | Buy | 0 |
| Coinexchange.io | ECC | ECC | 81094.078 | Buy | 0 |
| Coinexchange.io | ESP | Espers | 509079.92787805 | Buy | 0 |
| Coinexchange.io | MOON | Mooncoin | 1496999.5 | Buy | 0 |
| Coinexchange.io | TIPS | FedoraCoin | 4989997 | Buy | 0 |
| Coinexchange.io | VOISE | Voise | 5000 | Buy | 0 |
| Coinexchange.io | VSX | Vsync | 5000 | Buy | 0 |
| Coinexchange.io | XP | Experience Points | 100000 | Buy | 0 |
| Cryptopia | BTC | Bitcoin | 1e-8 | Buy | 0 |
| Cryptopia | DGB | DigiByte | 10000 | Buy | 0 |
| Cryptopia | XBY | XTRABYTES | 17458.51615734 | Buy | 0 |
"""
raw_data = io.StringIO(data)
df = pd.read_csv(raw_data, sep='|', header=None) # << Relevant line
print(tabulate(df))
这就是我得到的:
0 nan binance BTC Bitcoin 3.86e-06 Buy 0 nan
1 nan binance DNT district0x 1998 Buy 0 nan
2 nan binance TNT Tierion 1855.14 Buy 0 nan
3 nan binance VIB Viberate 999 Buy 0 nan
4 nan Coinexchange.io BUZZ BuzzCoin 500000 Buy 0 nan
5 nan Coinexchange.io ECC ECC 81094.1 Buy 0 nan
6 nan Coinexchange.io ESP Espers 509080 Buy 0 nan
7 nan Coinexchange.io MOON Mooncoin 1.497e+06 Buy 0 nan
8 nan Coinexchange.io TIPS FedoraCoin 4.99e+06 Buy 0 nan
9 nan Coinexchange.io VOISE Voise 5000 Buy 0 nan
10 nan Coinexchange.io VSX Vsync 5000 Buy 0 nan
11 nan Coinexchange.io XP Experience Points 100000 Buy 0 nan
12 nan Cryptopia BTC Bitcoin 1e-08 Buy 0 nan
13 nan Cryptopia DGB DigiByte 10000 Buy 0 nan
14 nan Cryptopia XBY XTRABYTES 17458.5 Buy 0 nan
但这并不完美,因为我必须删除字符串列中所有多余的空格。另外,我必须删除第一个和最后一个col。 (空)。
那么有更方便的方法吗?
答案
您可以将正则表达式传递给sep
参数。由于C解析器无法处理> 1个char分隔符,因此请使用engine='python'
:
df = pd.read_csv(raw_data, sep=r's*|s*', header=None, engine='python')
以上是关于从orgmode表创建DataFrame的主要内容,如果未能解决你的问题,请参考以下文章
保存从配置单元表中的 oracle 查询创建的 Spark DataFrame?