Python中从一些成语中各提取一个字组成一句话有啥方法

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python中从一些成语中各提取一个字组成一句话有啥方法相关的知识,希望对你有一定的参考价值。

参考技术A 看一下要抓的成语的标签有什么特点,查看源码,不同成语指向的链接不同,其实也就”/cy0/93.html”中的数字不同,所以正则式里匹配两次数字就行了。
Python数据抓取的三种方法:正则表达式(re库),BeautifulSoup(bs4),lxml。
数据获取是金融量化分析的第一步,找不到可靠、真实的数据,量化分析就无从谈起。

在 Python 中从数组中提取 JSON ID

【中文标题】在 Python 中从数组中提取 JSON ID【英文标题】:JSON ID extraction from Array in Python 【发布时间】:2021-02-06 02:55:51 【问题描述】:

我编写了一个脚本,使用以下内容从 Verizon 的连接管理 API 中提取数据。下面是根据搜索项请求线路信息的部分,在本例中为 SIM 或 iccid。我没有包含前面的部分,因为它们只是为了连接到 API 并获取凭据。

header = 
    'accept': 'application/json',
    'VZ-M2M-Token': session_token,
    'Authorization': 'Bearer' + bearer_token,
    'Content-Type': 'application/json',


data = ' "deviceId":  "id": ' + SIM +', "kind": "ICCID" '

response = requests.post('https://thingspace.verizon.com/api/m2m/v1/devices/actions/list', headers=header, data=data)

我得到的响应是一个 JSON 数组,看起来像


  "hasMoreData": false,
  "devices": [
    
      "accountName": "123456789-00001",
      "billingCycleEndDate": "2020-10-31T20:00:00-04:00",
      "carrierInformations": [
        
          "carrierName": "Verizon Wireless",
          "servicePlan": "3rrrrx48wwwwrjgjtyjtyjtyjtyj",
          "state": "active"
        
      ],
      "connected": true,
      "createdAt": "2016-11-04T11:06:28-04:00",
      "deviceIds": [
        
          "id": "5256694405",
          "kind": "mdn"
        ,
        
          "id": "3114949302094150",
          "kind": "imsi"
        ,
        
          "id": "35922505468230",
          "kind": "imei"
        ,
        
          "id": "891480000054957290575",
          "kind": "iccId"
        ,
        
          "id": "15256694405",
          "kind": "msisdn"
        ,
        
          "id": "5256694405",
          "kind": "min"
        
      ],
      "extendedAttributes": [
        
          "key": "PrimaryPlaceOfUseTitle"
        ,
        
          "key": "PrimaryPlaceOfUseFirstName",
          "value": "5256694405",
        ,
        
          "key": "PrimaryPlaceOfUseMiddleName"
        ,
        
          "key": "PrimaryPlaceOfUseLastName",
          "value": "ESN"
        ,
        
          "key": "PrimaryPlaceOfUseSuffix"
        ,
        
          "key": "PrimaryPlaceOfUseAddressLine1"
        ,
        
          "key": "PrimaryPlaceOfUseAddressLine2"
        ,
        
          "key": "PrimaryPlaceOfUseCity"
        ,
        
          "key": "PrimaryPlaceOfUseState"
        ,
        
          "key": "PrimaryPlaceOfUseCountry"
        ,
        
          "key": "PrimaryPlaceOfUseZipCode"
        ,
        
          "key": "PrimaryPlaceOfUseZipCode4"
        ,
        
          "key": "PrimaryPlaceOfUseCBRPhone"
        ,
        
          "key": "PrimaryPlaceOfUseCBRPhoneType"
        ,
        
          "key": "PrimaryPlaceOfUseEmailAddress"
        ,
        
          "key": "AccountNumber",
          "value": "5256694405-00001"
        ,
        
          "key": "SmsrOid"
        ,
        
          "key": "ProfileStatus"
        ,
        
          "key": "PromoCodes",
          "value": ""
        ,
        
          "key": "PromotionStartDate",
          "value": ""
        ,
        
          "key": "PromotionScheduledEndDate",
          "value": ""
        ,
        
          "key": "LeadId",
          "value": ""
        ,
        
          "key": "CustomerName",
          "value": ""
        ,
        
          "key": "CustomerAddressLine1",
          "value": ""
        ,
        
          "key": "CustomerAddressLine2",
          "value": ""
        ,
        
          "key": "CustomerAddressCity",
          "value": ""
        ,
        
          "key": "CustomerAddressState",
          "value": ""
        ,
        
          "key": "CustomerAddressZipCode",
          "value": ""
        ,
        
          "key": "ServiceZipCode",
          "value": ""
        ,
        
          "key": "SkuNumber",
          "value": "VZW080000460053"
        ,
        
          "key": "CostCenterCode"
        ,
        
          "key": "PreIMEI",
          "value": "3592254564568445"
        ,
        
          "key": "PreSKU",
          "value": "VZW080000100037"
        ,
        
          "key": "SIMOTADate",
          "value": "4/30/2020 1:22:18 PM"
        ,
        
          "key": "RoamingStatus",
          "value": "NotRoaming"
        ,
        
          "key": "LastRoamingStatusUpdate",
          "value": "9/24/2020 5:40:26 PM"
        
      ],
      "groupNames": [
        "Default: 0220433754-00001"
      ],
      "ipAddress": "100.100.100.100",
      "lastActivationBy": "User Verizon",
      "lastActivationDate": "2016-11-04T11:06:28-04:00",
      "lastConnectionDate": "2020-09-24T13:40:26-04:00"
    
  ]

我在脚本中添加了一个部分,使用下面的代码从数组中提取 mdn、iccid 和 imei。

def puller(line_json):
  line_data = json.loads(line_json)
  mdn = (line_data['devices'][0]['deviceIds'][0]['id'])
  iccid = (line_data['devices'][0]['deviceIds'][3]['id'])
  imei = (line_data['devices'][0]['deviceIds'][2]['id'])

  print('phone = ' ,mdn)
  print('SIM = ' , iccid)
  print('IMEI = ' , imei)

我测试了这段代码,它使用一个测试 ID 以应有的方式工作。然后我继续使用另一个测试 ID 进行测试,我了解到数组结构并不总是相同的。第二个 JSON 数组在下面。我想知道是否有更好的方法来找到我想要的特定值,但是我没有像上面那样具体告诉脚本项目在结构中的位置。


  "hasMoreData": false,
  "devices": [
    
      "accountName": "02234234234-00001",
      "billingCycleEndDate": "2020-10-31T20:00:00-04:00",
      "carrierInformations": [
        
          "carrierName": "Verizon Wireless",
          "servicePlan": "37776xdsfewsfwe576193",      
          "state": "active"
        
      ],
      "connected": true,
      "createdAt": "2016-05-24T15:55:06-04:00",
      "deviceIds": [
        
          "id": "0945437676404",
          "kind": "esn"
        ,
        
          "id": "1234565799",
          "kind": "mdn"
        ,
        
          "id": "31148454545458767",
          "kind": "imsi"
        ,
        
          "id": "01426786678211",
          "kind": "imei"
        ,
        
          "id": "89148000006456456454",
          "kind": "iccId"
        ,
        
          "id": "1234565799",
          "kind": "min"
        
      ],
      "extendedAttributes": [
        
          "key": "PrimaryPlaceOfUseTitle"
        ,
        
          "key": "PrimaryPlaceOfUseFirstName",
          "value": "096114564506772"
        ,
        
          "key": "PrimaryPlaceOfUseMiddleName"
        ,
        
          "key": "PrimaryPlaceOfUseLastName",
          "value": "096546454806772"
        ,
        
          "key": "PrimaryPlaceOfUseSuffix"
        ,
        
          "key": "PrimaryPlaceOfUseAddressLine1"
        ,
        
          "key": "PrimaryPlaceOfUseAddressLine2"
        ,
        
          "key": "PrimaryPlaceOfUseCity"
        ,
        
          "key": "PrimaryPlaceOfUseState"
        ,
        
          "key": "PrimaryPlaceOfUseCountry"
        ,
        
          "key": "PrimaryPlaceOfUseZipCode"
        ,
        
          "key": "PrimaryPlaceOfUseZipCode4"
        ,
        
          "key": "PrimaryPlaceOfUseCBRPhone"
        ,
        
          "key": "PrimaryPlaceOfUseCBRPhoneType"
        ,
        
          "key": "PrimaryPlaceOfUseEmailAddress"
        ,
        
          "key": "AccountNumber",
          "value": "02242342354-00001"
        ,
        
          "key": "SmsrOid"
        ,
        
          "key": "ProfileStatus"
        ,
        
          "key": "PromoCodes",
          "value": ""
        ,
        
          "key": "PromotionStartDate",
          "value": ""
        ,
        
          "key": "PromotionScheduledEndDate",
          "value": ""
        ,
        
          "key": "LeadId",
          "value": ""
        ,
        
          "key": "CustomerName",
          "value": ""
        ,
        
          "key": "CustomerAddressLine1",
          "value": ""
        ,
        
          "key": "CustomerAddressLine2",
          "value": ""
        ,
        
          "key": "CustomerAddressCity",
          "value": ""
        ,
        
          "key": "CustomerAddressState",
          "value": ""
        ,
        
          "key": "CustomerAddressZipCode",
          "value": ""
        ,
        
          "key": "ServiceZipCode",
          "value": ""
        ,
        
          "key": "SkuNumber",
          "value": "VZW12000364343005"
        ,
        
          "key": "CostCenterCode"
        ,
        
          "key": "PreIMEI"
        ,
        
          "key": "PreSKU",
          "value": "VZW12000334340005"
        ,
        
          "key": "SIMOTADate",
          "value": "3/13/2020 10:52:07 AM"
        ,
        
          "key": "RoamingStatus",
          "value": "NotRoaming"
        ,
        
          "key": "LastRoamingStatusUpdate",
          "value": "10/20/2020 6:14:20 PM"
        
      ],
      "groupNames": [
        "Default: 02342343754-00001"
      ],
      "ipAddress": "101.101.101.101",
      "lastActivationBy": "User Verizon",
      "lastActivationDate": "2016-05-24T15:55:16-04:00",
      "lastConnectionDate": "2020-10-20T14:14:20-04:00"
    
  ]

我尝试使用我所做的一些研究中的这段代码来找到我正在寻找的值;在这种情况下,mdn.我遇到的问题是响应返回了一组没有任何信息的空白括号,所以我知道我可能做错了什么。

def json_extract(obj, kind):
    """Recursively fetch values from nested JSON."""
    arr = []

    def extract(obj, arr, kind):
        """Recursively search for values of key in JSON tree."""
        if isinstance(obj, dict):
            for k, v in obj.items():
                if isinstance(v, (dict, list)):
                    extract(v, arr, kind)
                elif k == kind:
                    arr.append(v)
        elif isinstance(obj, list):
            for item in obj:
                extract(item, arr, kind)
        return arr

    values = extract(obj, arr, kind)
    return values


names = json_extract(response , 'mdn')
print(names)

【问题讨论】:

【参考方案1】:

我了解到您正在尝试从上面的 json 对象中查找 mdn、iccid 和 imei'id,因此,与其使用递归和您在那里完成的复杂编码,使用 python 的内置库来帮助更容易你出去:

您可以使用next 函数来达到您的目的:

# load your json data
line_data = json.loads(data) 

# narrow your focus on the array in question
device_ids = line_data['devices'][0]['deviceIds']

# This gets the first item's id attribute from the list that matches the condition, and returns None if no item matches.

mdn = next((x['id'] for x in device_ids if x['kind'] == "mdn"), None)
iccid = next((x['id'] for x in device_ids if x['kind'] == "iccid"), None)
imei = next((x['id'] for x in device_ids if x['kind'] == "imei"), None)

如果 None 在数组中找不到这样的元素,您将需要处理它。

参考:Find object in list that has attribute equal to some value (that meets any condition)

【讨论】:

以上是关于Python中从一些成语中各提取一个字组成一句话有啥方法的主要内容,如果未能解决你的问题,请参考以下文章

在python beautifulsoup中从html中提取json

python——成语接龙小游戏

在 Python 中从 PDF 中提取超链接

在 Python 中从 PDF 中提取页面大小

在 Python 3.4 中从 PDF 中提取文本的最佳工具 [关闭]

python成语填字并设置难度等级