如何将 csv 文件读入 SWI prolog 中的列表列表，其中内部列表代表 CSV 的每一行？

Posted 2023-02-25

技术标签:

【中文标题】如何将 csv 文件读入 SWI prolog 中的列表列表，其中内部列表代表 CSV 的每一行？【英文标题】：How to read a csv file into a list of lists in SWI prolog where the inner list represents each line of the CSV? 【发布时间】：2020-07-01 22:01:23 【问题描述】：

我有一个如下所示的 CSV 文件：即不是 Prolog 格式

james,facebook,intel,samsung
rebecca,intel,samsung,facebook
Ian,samsung,facebook,intel

我正在尝试编写一个 Prolog 谓词来读取文件并返回一个看起来像的列表

[[james,facebook,intel,samsung],[rebecca,intel,samsung,facebook],[Ian,samsung,facebook,intel]]

在其他谓词中进一步使用。

我还是一个初学者，从 SO 中找到了一些很好的信息并修改了它们以查看是否可以得到它，但我被卡住了，因为我只生成了一个看起来像这样的列表

[[(james,facebook,intel,samsung)],[(rebecca,intel,samsung,facebook)],[(Ian,samsung,facebook,intel)]]

这意味着当我调用内部列表的头部时，我得到(james,facebook,intel,samsung) 而不是james。

这是正在使用的代码:-（在 SO 上看到并已修改）

stream_representations(Input,Lines) :-
    read_line_to_codes(Input,Line),
    (   Line == end_of_file 
    ->  Lines = []
    ;   atom_codes(FinalLine, Line), 
        term_to_atom(LineTerm,FinalLine), 
        Lines = [[LineTerm] | FurtherLines],
        stream_representations(Input,FurtherLines) 
    ).

main(Lines) :- 
    open('file.txt', read, Input), 
    stream_representations(Input, Lines), 
    close(Input).

【问题讨论】：

【参考方案1】：

问题在于term_to_atom(LineTerm,FinalLine)。

首先我们将 CSV 文件中的一行读入一个字符代码列表中 read_line_to_codes(Input,Line).

让我们用atom_codes/2模拟输入：

?- atom_codes('james,facebook,intel,samsung',Line).
Line = [106, 97, 109, 101, 115, 44, 102, 97, 99|...].

然后我们将原来读入的原子重构为FinalLine（这看起来很浪费，一定有办法直接将一行吸纳成原子）

?- atom_codes('james,facebook,intel,samsung',Line), 
   atom_codes(FinalLine, Line). 

Line = [106, 97, 109, 101, 115, 44, 102, 97, 99|...],
FinalLine = 'james,facebook,intel,samsung'.

我们尝试使用term_to_atom/2 将FinalLine 中的这个原子映射到一个术语LineTerm

?- atom_codes('james,facebook,intel,samsung',Line), 
   atom_codes(FinalLine, Line),
   term_to_atom(LineTerm,FinalLine).

Line = [106, 97, 109, 101, 115, 44, 102, 97, 99|...],
FinalLine = 'james,facebook,intel,samsung',
LineTerm =  (james, facebook, intel, samsung).

您在这里看到了问题：LineTerm 不完全是一个列表，而是一个使用仿函数 , 分隔元素的嵌套术语：

?- atom_codes('james,facebook,intel,samsung',Line), 
   atom_codes(FinalLine, Line),
   term_to_atom(LineTerm,FinalLine),
   write_canonical(LineTerm).

','(james,','(facebook,','(intel,samsung)))

Line = [106, 97, 109, 101, 115, 44, 102, 97, 99|...],
FinalLine = 'james,facebook,intel,samsung',
LineTerm =  (james, facebook, intel, samsung).

因此，这个 ','(james,','(facebook,','(intel,samsung))) 术语也将出现在最终结果中，只是写法不同：(james,facebook,intel,samsung) 并打包到一个列表中： [(james,facebook,intel,samsung)]

你不想要这个词，你想要一个列表。您可以使用atomic_list_concat/2 创建一个可以作为列表读取的新原子：

?- atom_codes('james,facebook,intel,samsung',Line), 
   atom_codes(FinalLine, Line),
   atomic_list_concat(['[',FinalLine,']'],ListyAtom),
   term_to_atom(LineTerm,ListyAtom),
   LineTerm = [V1,V2,V3,V4].

Line = [106, 97, 109, 101, 115, 44, 102, 97, 99|...],
FinalLine = 'james,facebook,intel,samsung',
ListyAtom = '[james,facebook,intel,samsung]',
LineTerm = [james, facebook, intel, samsung],
V1 = james,
V2 = facebook,
V3 = intel,
V4 = samsung.

但这太野蛮了。

我们必须用更少的步骤完成整个处理过程：

在输入中读取一行以逗号分隔的字符串。直接将其转换为原子或字符串列表。

DCG 似乎是正确的解决方案。也许有人可以添加两个班轮。

【讨论】：

顺便说一下，Prolog 中的文本处理似乎大量使用“原子”（实际上不是原子性而是作为文本片段）而不是“字符串”（在 SWI Prolog 中是一个特殊的数据类型，但传统上是字符代码列表，即整数）。我宁愿使用字符串...

以上是关于如何将 csv 文件读入 SWI prolog 中的列表列表，其中内部列表代表 CSV 的每一行？的主要内容，如果未能解决你的问题，请参考以下文章