如何从多个重复的 json 文件中删除一个文本块，其中文件之间有微小的变化？

Posted 2023-03-15

技术标签:

【中文标题】如何从多个重复的 json 文件中删除一个文本块，其中文件之间有微小的变化？【英文标题】：How do I remove a block of text from mutiple repetitive json files where there is a small change between the files? 【发布时间】：2022-01-07 08:04:59 【问题描述】：

我有一个包含重复部分的 json 文件，我正在尝试编写一个脚本来从多个文件中删除某个文本块。 Python 脚本将是最优选的，否则从我的搜索中 sed 也可以工作，尽管我对此一无所知。这是我的 json 文件格式的示例：

    
      "Animal": 
        "Type_species": "Reptile"
      ,
      "FindMe": "https://www.merriam-webster.com/dictionary/amphibian",
      "Description": "Most are cold blooded."
    ,
    
      "Animal": 
        "Type_species": "Mammal"
      ,
      "FindMe": "https://kids.nationalgeographic.com/animals/mammals/",
      "Description": "There Are Approximately 5,000 Mammal Species."
    ,
    
      "Animal": 
        "Type_species": "Amphibian"
      ,
      "FindMe": "https://en.wikipedia.org/wiki/Amphibian",
      "Description": "Most amphibians have thin, moist skin that helps them to breathe"
    ,

如何从 json 文件中删除以下内容？

    
      "Animal": 
        "Type_species": "Mammal"
      ,
      "FindMe": "https://kids.nationalgeographic.com/animals/mammals/",
      "Description": "There Are Approximately 5,000 Mammal Species."
    ,

我的另一个问题是， 2. 如何调整脚本以解决跨多个文件的不同“FindMe”网址？例如，对于多个文件，第二个文件将具有以下内容，依此类推？

    
      "Animal": 
        "Type_species": "Mammal"
      ,
      "FindMe": "https://kids.nationalgeographic.com/animals/mammals/facts/arctic-fox",
      "Description": "There Are Approximately 5,000 Mammal Species."
    ,

我认为使用正则表达式会有所帮助，但我无法理解它们并在脚本中实现它们。

感谢您的帮助，谢谢。

更新：我希望最终结果如下所示：

    
      "Animal": 
        "Type_species": "Reptile"
      ,
      "FindMe": "https://www.merriam-webster.com/dictionary/amphibian",
      "Description": "Most are cold blooded."
    ,
    
      "Animal": 
        "Type_species": "Amphibian"
      ,
      "FindMe": "https://en.wikipedia.org/wiki/Amphibian",
      "Description": "Most amphibians have thin, moist skin that helps them to breathe"
    ,

【问题讨论】：

所以要删除所有Mammal 类型？是吗？我想从文件中删除整个哺乳动物对象。我添加了一个最终结果以供参考。 【参考方案1】：

假设您的完整 JSON 包含一个字典列表（您的示例建议），那么：

JSON = "data": [
    "Animal": 
        "Type_species": "Reptile"
    ,
    "FindMe": "https://www.merriam-webster.com/dictionary/amphibian",
    "Description": "Most are cold blooded."
,
    
    "Animal": 
        "Type_species": "Mammal"
    ,
    "FindMe": "https://kids.nationalgeographic.com/animals/mammals/",
    "Description": "There Are Approximately 5,000 Mammal Species."
,
    
    "Animal": 
        "Type_species": "Amphibian"
    ,
    "FindMe": "https://en.wikipedia.org/wiki/Amphibian",
    "Description": "Most amphibians have thin, moist skin that helps them to breathe"
]

JSON['data'] = [d for d in JSON['data'] if d['Animal']['Type_species'] != 'Mammal']

print(JSON)

【讨论】：

你能解释一下语法和正在做什么吗？我想更好地理解它。我推荐部分5.1.3 感谢 DarkKnight。如何以以前的结构化 json 格式保留输出/打印输出？我需要读入一个格式化的 json 文件，删除文本，然后用删除的文本保存文件。查看this answer【参考方案2】：

这可能对你有用（GNU sed）：

sed '/^\s*/:a;N;/^\(\s*\).*\n\1,/!ba;/"Type_species": "Mammal"/d' file

收集每个动物的详细信息，如果动物包含"Type_species": "Mammal"，则删除它。

【讨论】：

以上是关于如何从多个重复的 json 文件中删除一个文本块，其中文件之间有微小的变化？的主要内容，如果未能解决你的问题，请参考以下文章