Python中从一些成语中各提取一个字组成一句话有啥方法

Posted 2023-05-05

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Python中从一些成语中各提取一个字组成一句话有啥方法相关的知识，希望对你有一定的参考价值。

参考技术A 看一下要抓的成语的标签有什么特点，查看源码，不同成语指向的链接不同，其实也就”/cy0/93.html”中的数字不同，所以正则式里匹配两次数字就行了。
Python数据抓取的三种方法：正则表达式（re库），BeautifulSoup（bs4），lxml。
数据获取是金融量化分析的第一步，找不到可靠、真实的数据，量化分析就无从谈起。

在 Python 中从数组中提取 JSON ID

【中文标题】在 Python 中从数组中提取 JSON ID【英文标题】：JSON ID extraction from Array in Python 【发布时间】：2021-02-06 02:55:51 【问题描述】：

我编写了一个脚本，使用以下内容从 Verizon 的连接管理 API 中提取数据。下面是根据搜索项请求线路信息的部分，在本例中为 SIM 或 iccid。我没有包含前面的部分，因为它们只是为了连接到 API 并获取凭据。

header = 
    'accept': 'application/json',
    'VZ-M2M-Token': session_token,
    'Authorization': 'Bearer' + bearer_token,
    'Content-Type': 'application/json',


data = ' "deviceId":  "id": ' + SIM +', "kind": "ICCID" '

response = requests.post('https://thingspace.verizon.com/api/m2m/v1/devices/actions/list', headers=header, data=data)

我得到的响应是一个 JSON 数组，看起来像


  "hasMoreData": false,
  "devices": [
    
      "accountName": "123456789-00001",
      "billingCycleEndDate": "2020-10-31T20:00:00-04:00",
      "carrierInformations": [
        
          "carrierName": "Verizon Wireless",
          "servicePlan": "3rrrrx48wwwwrjgjtyjtyjtyjtyj",
          "state": "active"
        
      ],
      "connected": true,
      "createdAt": "2016-11-04T11:06:28-04:00",
      "deviceIds": [
        
          "id": "5256694405",
          "kind": "mdn"
        ,
        
          "id": "3114949302094150",
          "kind": "imsi"
        ,
        
          "id": "35922505468230",
          "kind": "imei"
        ,
        
          "id": "891480000054957290575",
          "kind": "iccId"
        ,
        
          "id": "15256694405",
          "kind": "msisdn"
        ,
        
          "id": "5256694405",
          "kind": "min"
        
      ],
      "extendedAttributes": [
        
          "key": "PrimaryPlaceOfUseTitle"
        ,
        
          "key": "PrimaryPlaceOfUseFirstName",
          "value": "5256694405",
        ,
        
          "key": "PrimaryPlaceOfUseMiddleName"
        ,
        
          "key": "PrimaryPlaceOfUseLastName",
          "value": "ESN"
        ,
        
          "key": "PrimaryPlaceOfUseSuffix"
        ,
        
          "key": "PrimaryPlaceOfUseAddressLine1"
        ,
        
          "key": "PrimaryPlaceOfUseAddressLine2"
        ,
        
          "key": "PrimaryPlaceOfUseCity"
        ,
        
          "key": "PrimaryPlaceOfUseState"
        ,
        
          "key": "PrimaryPlaceOfUseCountry"
        ,
        
          "key": "PrimaryPlaceOfUseZipCode"
        ,
        
          "key": "PrimaryPlaceOfUseZipCode4"
        ,
        
          "key": "PrimaryPlaceOfUseCBRPhone"
        ,
        
          "key": "PrimaryPlaceOfUseCBRPhoneType"
        ,
        
          "key": "PrimaryPlaceOfUseEmailAddress"
        ,
        
          "key": "AccountNumber",
          "value": "5256694405-00001"
        ,
        
          "key": "SmsrOid"
        ,
        
          "key": "ProfileStatus"
        ,
        
          "key": "PromoCodes",
          "value": ""
        ,
        
          "key": "PromotionStartDate",
          "value": ""
        ,
        
          "key": "PromotionScheduledEndDate",
          "value": ""
        ,
        
          "key": "LeadId",
          "value": ""
        ,
        
          "key": "CustomerName",
          "value": ""
        ,
        
          "key": "CustomerAddressLine1",
          "value": ""
        ,
        
          "key": "CustomerAddressLine2",
          "value": ""
        ,
        
          "key": "CustomerAddressCity",
          "value": ""
        ,
        
          "key": "CustomerAddressState",
          "value": ""
        ,
        
          "key": "CustomerAddressZipCode",
          "value": ""
        ,
        
          "key": "ServiceZipCode",
          "value": ""
        ,
        
          "key": "SkuNumber",
          "value": "VZW080000460053"
        ,
        
          "key": "CostCenterCode"
        ,
        
          "key": "PreIMEI",
          "value": "3592254564568445"
        ,
        
          "key": "PreSKU",
          "value": "VZW080000100037"
        ,
        
          "key": "SIMOTADate",
          "value": "4/30/2020 1:22:18 PM"
        ,
        
          "key": "RoamingStatus",
          "value": "NotRoaming"
        ,
        
          "key": "LastRoamingStatusUpdate",
          "value": "9/24/2020 5:40:26 PM"
        
      ],
      "groupNames": [
        "Default: 0220433754-00001"
      ],
      "ipAddress": "100.100.100.100",
      "lastActivationBy": "User Verizon",
      "lastActivationDate": "2016-11-04T11:06:28-04:00",
      "lastConnectionDate": "2020-09-24T13:40:26-04:00"
    
  ]

我在脚本中添加了一个部分，使用下面的代码从数组中提取 mdn、iccid 和 imei。

def puller(line_json):
  line_data = json.loads(line_json)
  mdn = (line_data['devices'][0]['deviceIds'][0]['id'])
  iccid = (line_data['devices'][0]['deviceIds'][3]['id'])
  imei = (line_data['devices'][0]['deviceIds'][2]['id'])

  print('phone = ' ,mdn)
  print('SIM = ' , iccid)
  print('IMEI = ' , imei)

我测试了这段代码，它使用一个测试 ID 以应有的方式工作。然后我继续使用另一个测试 ID 进行测试，我了解到数组结构并不总是相同的。第二个 JSON 数组在下面。我想知道是否有更好的方法来找到我想要的特定值，但是我没有像上面那样具体告诉脚本项目在结构中的位置。


  "hasMoreData": false,
  "devices": [
    
      "accountName": "02234234234-00001",
      "billingCycleEndDate": "2020-10-31T20:00:00-04:00",
      "carrierInformations": [
        
          "carrierName": "Verizon Wireless",
          "servicePlan": "37776xdsfewsfwe576193",      
          "state": "active"
        
      ],
      "connected": true,
      "createdAt": "2016-05-24T15:55:06-04:00",
      "deviceIds": [
        
          "id": "0945437676404",
          "kind": "esn"
        ,
        
          "id": "1234565799",
          "kind": "mdn"
        ,
        
          "id": "31148454545458767",
          "kind": "imsi"
        ,
        
          "id": "01426786678211",
          "kind": "imei"
        ,
        
          "id": "89148000006456456454",
          "kind": "iccId"
        ,
        
          "id": "1234565799",
          "kind": "min"
        
      ],
      "extendedAttributes": [
        
          "key": "PrimaryPlaceOfUseTitle"
        ,
        
          "key": "PrimaryPlaceOfUseFirstName",
          "value": "096114564506772"
        ,
        
          "key": "PrimaryPlaceOfUseMiddleName"
        ,
        
          "key": "PrimaryPlaceOfUseLastName",
          "value": "096546454806772"
        ,
        
          "key": "PrimaryPlaceOfUseSuffix"
        ,
        
          "key": "PrimaryPlaceOfUseAddressLine1"
        ,
        
          "key": "PrimaryPlaceOfUseAddressLine2"
        ,
        
          "key": "PrimaryPlaceOfUseCity"
        ,
        
          "key": "PrimaryPlaceOfUseState"
        ,
        
          "key": "PrimaryPlaceOfUseCountry"
        ,
        
          "key": "PrimaryPlaceOfUseZipCode"
        ,
        
          "key": "PrimaryPlaceOfUseZipCode4"
        ,
        
          "key": "PrimaryPlaceOfUseCBRPhone"
        ,
        
          "key": "PrimaryPlaceOfUseCBRPhoneType"
        ,
        
          "key": "PrimaryPlaceOfUseEmailAddress"
        ,
        
          "key": "AccountNumber",
          "value": "02242342354-00001"
        ,
        
          "key": "SmsrOid"
        ,
        
          "key": "ProfileStatus"
        ,
        
          "key": "PromoCodes",
          "value": ""
        ,
        
          "key": "PromotionStartDate",
          "value": ""
        ,
        
          "key": "PromotionScheduledEndDate",
          "value": ""
        ,
        
          "key": "LeadId",
          "value": ""
        ,
        
          "key": "CustomerName",
          "value": ""
        ,
        
          "key": "CustomerAddressLine1",
          "value": ""
        ,
        
          "key": "CustomerAddressLine2",
          "value": ""
        ,
        
          "key": "CustomerAddressCity",
          "value": ""
        ,
        
          "key": "CustomerAddressState",
          "value": ""
        ,
        
          "key": "CustomerAddressZipCode",
          "value": ""
        ,
        
          "key": "ServiceZipCode",
          "value": ""
        ,
        
          "key": "SkuNumber",
          "value": "VZW12000364343005"
        ,
        
          "key": "CostCenterCode"
        ,
        
          "key": "PreIMEI"
        ,
        
          "key": "PreSKU",
          "value": "VZW12000334340005"
        ,
        
          "key": "SIMOTADate",
          "value": "3/13/2020 10:52:07 AM"
        ,
        
          "key": "RoamingStatus",
          "value": "NotRoaming"
        ,
        
          "key": "LastRoamingStatusUpdate",
          "value": "10/20/2020 6:14:20 PM"
        
      ],
      "groupNames": [
        "Default: 02342343754-00001"
      ],
      "ipAddress": "101.101.101.101",
      "lastActivationBy": "User Verizon",
      "lastActivationDate": "2016-05-24T15:55:16-04:00",
      "lastConnectionDate": "2020-10-20T14:14:20-04:00"
    
  ]

我尝试使用我所做的一些研究中的这段代码来找到我正在寻找的值；在这种情况下，mdn.我遇到的问题是响应返回了一组没有任何信息的空白括号，所以我知道我可能做错了什么。

def json_extract(obj, kind):
    """Recursively fetch values from nested JSON."""
    arr = []

    def extract(obj, arr, kind):
        """Recursively search for values of key in JSON tree."""
        if isinstance(obj, dict):
            for k, v in obj.items():
                if isinstance(v, (dict, list)):
                    extract(v, arr, kind)
                elif k == kind:
                    arr.append(v)
        elif isinstance(obj, list):
            for item in obj:
                extract(item, arr, kind)
        return arr

    values = extract(obj, arr, kind)
    return values


names = json_extract(response , 'mdn')
print(names)

【问题讨论】：

【参考方案1】：

我了解到您正在尝试从上面的 json 对象中查找 mdn、iccid 和 imei'id，因此，与其使用递归和您在那里完成的复杂编码，使用 python 的内置库来帮助更容易你出去：

您可以使用next 函数来达到您的目的：

# load your json data
line_data = json.loads(data) 

# narrow your focus on the array in question
device_ids = line_data['devices'][0]['deviceIds']

# This gets the first item's id attribute from the list that matches the condition, and returns None if no item matches.

mdn = next((x['id'] for x in device_ids if x['kind'] == "mdn"), None)
iccid = next((x['id'] for x in device_ids if x['kind'] == "iccid"), None)
imei = next((x['id'] for x in device_ids if x['kind'] == "imei"), None)

如果 None 在数组中找不到这样的元素，您将需要处理它。

参考：Find object in list that has attribute equal to some value (that meets any condition)

【讨论】：

以上是关于Python中从一些成语中各提取一个字组成一句话有啥方法的主要内容，如果未能解决你的问题，请参考以下文章