从很长的字符串中获取特定的字符串

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了从很长的字符串中获取特定的字符串相关的知识,希望对你有一定的参考价值。

我正在使用正则表达式来获取进行一些比较操作所需的数据。我从https://play.pokemonshowdown.com/data/pokedex.js?fe67f5ac那里刮了一下,那里有所有可能的神奇宝贝的清单。我使用正则表达式将Pokemon划分为不同的层,如下所示:

LC = (re.findall(r'name:(.+?)tier:"LC"\',data)) 

然后像这样收集数据:

self.LC = s.join(LC)    
self.names = (re.findall(r'name:"(.+?)"', self.LC))
self.stats = (re.findall(r'baseStats:(.+?)', self.LC))
self.types = (re.findall(r'types:\[(.+?)]', self.LC))

[不幸的是,当我使用第一行代码时,它几乎收集了该列表中的每个Pokemon,从而创建了一个效率极低且无用的字符串。任何有关如何解决此问题的帮助将不胜感激。

现在这是我在打印列表中的第一项时得到的:

 "Bulbasaur",types:["Grass","Poison"],genderRatio: 
  M:0.875,F:0.125,baseStats: 
  hp:45,atk:49,def:49,spa:65,spd:65,spe:45,abilities
 "0":"Overgrow",H:"Chlorophyll",heightm:0.7,weightkg:6.9,color:"Green",evos:["Ivysaur"],eggGroups:["Monster","Grass"],tier:"LC",ivysaur:num:2,name:"Ivysaur",types:["Grass","Poison"],..........                                                           

我想要的是:

  "Bulbasaur",types:["Grass","Poison"],genderRatio: 
 M:0.875,F:0.125,baseStats: 
 hp:45,atk:49,def:49,spa:65,spd:65,spe:45,abilities: 
"0":"Overgrow",H:"Chlorophyll",heightm:0.7,weightkg:6.9,color:"Green",evos: 
["Ivysaur"],eggGroups:["Monster","Grass"],

字符串在我要查找的任何层之前结束,在本例中为LC。

答案

[作为替代,您可以使用How to convert raw javascript object to python dictionary?中的建议之一将您的javascript对象转换为适当的JSON(仅在删除字符串开头export和结尾;之后)。从那里,您将可以使用python字典符号来按需访问/过滤。

原始javascript对象

exports.BattlePokedex = bulbasaur:num:1,name:"Bulbasaur",types:["Grass","Poison"],genderRatio:M:0.875,
F:0.125,baseStats:hp:45,atk:49,def:49,spa:65,spd:65,spe:45,abilities:"0":"Overgrow",H:"Chlorophyll",
heightm:0.7,weightkg:6.9,color:"Green",evos:["Ivysaur"],eggGroups:["Monster","Grass"],tier:"LC",
ivysaur:num:2,name:"Ivysaur",types:["Grass","Poison"],genderRatio:M:0.875,F:0.125,
baseStats:hp:60,atk:62,def:63,spa:80,spd:80,spe:60,abilities:... ...

Javascript对象符号

"bulbasaur":"num":1,"name":"Bulbasaur","types":["Grass","Poison"],"genderRatio":"M":0.875,"F":0.125,
"baseStats":"hp":45,"atk":49,"def":49,"spa":65,"spd":65,"spe":45,"abilities":"0":"Overgrow",
"H":"Chlorophyll","heightm":0.7,"weightkg":6.9,"color":"Green","evos":["Ivysaur"],
"eggGroups":["Monster","Grass"],"tier":"LC","ivysaur":"num":2,"name":"Ivysaur","types":["Grass","Poison"],
"genderRatio":"M":0.875,"F":0.125,"baseStats":"hp":60,"atk":62,"def":63,"spa":80,"spd":80,"spe":60,
"abilities":... ...
import requests
import json
import _jsonnet
import collections

r = requests.get("https://play.pokemonshowdown.com/data/pokedex.js?fe67f5ac")
d = r.content.decode('utf-8')

# remove JS export and the ";" at the end
json_str = d[24:-1]
print(json_str)

# convert to JSON
json_obj = _jsonnet.evaluate_snippet("snippet", json_str)
pyDict = json.loads(json_obj)

print(pyDict)
print("Total: ".format(len(pyDict)))

tier_dict = collections.defaultdict(list)

for pkm in pyDict:
    tier = pyDict[pkm].get("tier")
    if tier:
        tier_dict[tier].append(
            pyDict[pkm].get("name"): 
                "baseStats": pyDict[pkm].get("baseStats"),
                "types": pyDict[pkm].get("types")
                # add desired stat here
            
        )

print(tier_dict)
Total: 1203

tier_dict的输出


    'NU': [
        'Abomasnow': 
            'baseStats': 
                'atk': 92,
                ...
    ],
    'Illegal': [
        'Abomasnow-Mega': 
            'baseStats': 
                'atk': 132,
                ...
    ],
    'RU': [
        'Accelgor': 
            'baseStats': 
                'atk': 70,
                ...
    ],
    'OU': [
        'Aegislash': 
            'baseStats': 
                'atk': 50,
                ...

以上是关于从很长的字符串中获取特定的字符串的主要内容,如果未能解决你的问题,请参考以下文章

从 JSON 中提取很长的字符串到 CLOB

如何将很长的字符串保存到firebase firestore数据库中?

XmlReader - 如何在没有 System.OutOfMemoryException 的情况下读取元素中很长的字符串

wordwrap 一个很长的字符串

Delphi 分割字符串,很长的字符串。

Oracle中储存很大很长的字符串用啥类型