如何将 Python 列表转换为 pandas DataFrame 或 excel 文件输出，具体要求如下：

Posted 2023-02-15

技术标签:

【中文标题】如何将 Python 列表转换为 pandas DataFrame 或 excel 文件输出，具体要求如下：【英文标题】：How to convert Python list into pandas DataFrame or excel file output with specific requirements as below: 【发布时间】：2022-01-22 00:17:00 【问题描述】：

我有以下来自 JSON 的列表：

list = ['select', 'name1 = a.column1', 'name2 = a.column2', 'name3 = a.[column3]',
        'from', 'xyz.[Table1$Name] c',
        'select', 'name2 = b.othercolumn1', 'name2 = b.[othercolumn2]', 'name3 = b.othercolumn3',
        'from', 'abc.[Table2$Name] d',
        'where', 'x.[TableX] = '123',++++ and so on.....]

我想要的输出是一个包含 2 列的数据框，我可以将其提取为 excel，如下所示：

Table Name         Column Name
Table1$Name        column1
Table1$Name        column2
Table1$Name        column3
Table2$Name        othercolumn1
Table2$Name        othercolumn2
Table2$Name        othercolumn3

我已经尝试了很多方法，但我无法实现所需的输出，我只想让这个“.bim”文件中的所有列针对相应的表进行 excel 文件输出，其他一切都像 where 语句，[] , c, a., b., 应在最终输出中删除。

【问题讨论】：

这看起来像是一个 sql 解析问题。在***.com/questions/68880439/…之前回答过类似的问题您的 list 似乎包含许多拆分的 SQL SELECT 语句。您已经显示了两个（不正确的）琐碎的单表选择。您是否只有这样的琐碎选择，或者您的列表是否包含其他 sql 语句或多表选择？ @SergeBallesta 我的列表包含许多“select”语句，然后是“from”语句。例如[“选择”、“列 1”、“从”、“表 1”、“哪里”、“abc=xyz”、“选择”、“abc1”、“abc2”、“abc3”、“从”、“表 2” , ++++ ... ... ]。我想要的输出只是作为 DataFrame 的表名和列名。 @RobRaymond 我已经在指定线程上查看了您的答案，但是，在我的情况下，它是不同的，不能用相同的方法解决。如果您能帮助我，那就太好了，因为自上周以来我一直在花时间寻找解决方案。谢谢你.. :) 这里有两种可能的方法。一种是加入列表的元素以构建真正的选择语句并在它们上使用 SQL 解析器。另一种是尝试直接使用元素本身，使用'select'、'from'和'where'作为分隔标记。要走的路取决于您的列表实际包含的内容：如果 SQL 语句正确，第一种方法应该更复杂但可能更健壮，第二种方法应该更简单，但如果列表仅限于琐碎的选择语句，则只是一种选择。由于您只显示垃圾，我不能说更多...... 【参考方案1】： 根据 cmets，您实际上是在问如何将 SQL 解析为它的组件此方法使用lark进行解析。这是一个复杂的话题，因此已经演示了如何以另一种方式与 pandas 集成根据我使用的语法，您的示例 SQL 无效。恕我直言，这是一个不错的 SQL 语法。我使用过许多数据库引擎，但我使用过的不是一个允许在 select 子句中赋值或在列名或表名中使用 $

from lark import Lark, Tree, Visitor, Tree, Token, UnexpectedInput
import requests
import pandas as pd

grammar = requests.get(
    "https://raw.githubusercontent.com/z***le/sql_to_ibis/main/sql_to_ibis/grammar/sql.lark"
).text
parser = Lark(grammar, start="query_expr")


class sqlparts(Visitor):
    __columns = []
    __tables = []

    def __init__(self):
        self.__columns = []
        self.__tables = []

    def column_name(self, tree):
        self.__columns += [
            tok.value
            for tok in tree.scan_values(
                lambda v: isinstance(v, Token) and v.type == "CNAME"
            )
        ]

    def table(self, tree):
        self.__tables += [
            tok.value
            for tok in tree.scan_values(
                lambda v: isinstance(v, Token) and v.type == "CNAME"
            )
        ]

    def data(self):
        return "columns": self.__columns, "tables": self.__tables


df = pd.DataFrame(
    
        "sql": [
            "select col1, col2, col3 from table1, table2 where col7=8",
            'select "hello" from a_long_table where col7=8',
            'select "hello" from a_long_table where col7=8 groupby col8',
            "select col$ from tables",
        ]
    
)


def applyparse(sql):
    d = sqlparts()
    try:
        t = parser.parse(sql, start="query_expr")
        d.visit(t)
        return d.data()
    except UnexpectedInput as e:
        return "error": e


df.join(df["sql"].apply(applyparse).apply(pd.Series))

sql	columns	tables	error
select col1, col2, col3 from table1, table2 where col7=8	['col7', 'col1', 'col2', 'col3']	['table1', 'table2']	nan
select "hello" from a_long_table where col7=8	['col7']	['a_long_table']	nan
select "hello" from a_long_table where col7=8 groupby col8	['col7', 'col8']	['a_long_table']	nan
select col$ from tables	nan	nan	No terminal matches '$' in the current parser context, at line 1 col 11

【讨论】：

以上是关于如何将 Python 列表转换为 pandas DataFrame 或 excel 文件输出，具体要求如下：的主要内容，如果未能解决你的问题，请参考以下文章