如何从多个重复的 json 文件中删除一个文本块,其中文件之间有微小的变化?
Posted
技术标签:
【中文标题】如何从多个重复的 json 文件中删除一个文本块,其中文件之间有微小的变化?【英文标题】:How do I remove a block of text from mutiple repetitive json files where there is a small change between the files? 【发布时间】:2022-01-07 08:04:59 【问题描述】:我有一个包含重复部分的 json 文件,我正在尝试编写一个脚本来从多个文件中删除某个文本块。 Python 脚本将是最优选的,否则从我的搜索中 sed 也可以工作,尽管我对此一无所知。 这是我的 json 文件格式的示例:
"Animal":
"Type_species": "Reptile"
,
"FindMe": "https://www.merriam-webster.com/dictionary/amphibian",
"Description": "Most are cold blooded."
,
"Animal":
"Type_species": "Mammal"
,
"FindMe": "https://kids.nationalgeographic.com/animals/mammals/",
"Description": "There Are Approximately 5,000 Mammal Species."
,
"Animal":
"Type_species": "Amphibian"
,
"FindMe": "https://en.wikipedia.org/wiki/Amphibian",
"Description": "Most amphibians have thin, moist skin that helps them to breathe"
,
-
如何从 json 文件中删除以下内容?
"Animal":
"Type_species": "Mammal"
,
"FindMe": "https://kids.nationalgeographic.com/animals/mammals/",
"Description": "There Are Approximately 5,000 Mammal Species."
,
我的另一个问题是, 2. 如何调整脚本以解决跨多个文件的不同“FindMe”网址?例如,对于多个文件,第二个文件将具有以下内容,依此类推?
"Animal":
"Type_species": "Mammal"
,
"FindMe": "https://kids.nationalgeographic.com/animals/mammals/facts/arctic-fox",
"Description": "There Are Approximately 5,000 Mammal Species."
,
我认为使用正则表达式会有所帮助,但我无法理解它们并在脚本中实现它们。
感谢您的帮助,谢谢。
更新: 我希望最终结果如下所示:
"Animal":
"Type_species": "Reptile"
,
"FindMe": "https://www.merriam-webster.com/dictionary/amphibian",
"Description": "Most are cold blooded."
,
"Animal":
"Type_species": "Amphibian"
,
"FindMe": "https://en.wikipedia.org/wiki/Amphibian",
"Description": "Most amphibians have thin, moist skin that helps them to breathe"
,
【问题讨论】:
所以要删除所有Mammal
类型?
是吗?我想从文件中删除整个哺乳动物对象。我添加了一个最终结果以供参考。
【参考方案1】:
假设您的完整 JSON 包含一个字典列表(您的示例建议),那么:
JSON = "data": [
"Animal":
"Type_species": "Reptile"
,
"FindMe": "https://www.merriam-webster.com/dictionary/amphibian",
"Description": "Most are cold blooded."
,
"Animal":
"Type_species": "Mammal"
,
"FindMe": "https://kids.nationalgeographic.com/animals/mammals/",
"Description": "There Are Approximately 5,000 Mammal Species."
,
"Animal":
"Type_species": "Amphibian"
,
"FindMe": "https://en.wikipedia.org/wiki/Amphibian",
"Description": "Most amphibians have thin, moist skin that helps them to breathe"
]
JSON['data'] = [d for d in JSON['data'] if d['Animal']['Type_species'] != 'Mammal']
print(JSON)
【讨论】:
你能解释一下语法和正在做什么吗?我想更好地理解它。 我推荐部分5.1.3 感谢 DarkKnight。如何以以前的结构化 json 格式保留输出/打印输出?我需要读入一个格式化的 json 文件,删除文本,然后用删除的文本保存文件。 查看this answer【参考方案2】:这可能对你有用(GNU sed):
sed '/^\s*/:a;N;/^\(\s*\).*\n\1,/!ba;/"Type_species": "Mammal"/d' file
收集每个动物的详细信息,如果动物包含"Type_species": "Mammal"
,则删除它。
【讨论】:
以上是关于如何从多个重复的 json 文件中删除一个文本块,其中文件之间有微小的变化?的主要内容,如果未能解决你的问题,请参考以下文章
如何避免使用 Python re 库删除文本文件中正则表达式标志之间的文本块?