Python中从一些成语中各提取一个字组成一句话有啥方法
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python中从一些成语中各提取一个字组成一句话有啥方法相关的知识,希望对你有一定的参考价值。
参考技术A 看一下要抓的成语的标签有什么特点,查看源码,不同成语指向的链接不同,其实也就”/cy0/93.html”中的数字不同,所以正则式里匹配两次数字就行了。Python数据抓取的三种方法:正则表达式(re库),BeautifulSoup(bs4),lxml。
数据获取是金融量化分析的第一步,找不到可靠、真实的数据,量化分析就无从谈起。
在 Python 中从数组中提取 JSON ID
【中文标题】在 Python 中从数组中提取 JSON ID【英文标题】:JSON ID extraction from Array in Python 【发布时间】:2021-02-06 02:55:51 【问题描述】:我编写了一个脚本,使用以下内容从 Verizon 的连接管理 API 中提取数据。下面是根据搜索项请求线路信息的部分,在本例中为 SIM 或 iccid。我没有包含前面的部分,因为它们只是为了连接到 API 并获取凭据。
header =
'accept': 'application/json',
'VZ-M2M-Token': session_token,
'Authorization': 'Bearer' + bearer_token,
'Content-Type': 'application/json',
data = ' "deviceId": "id": ' + SIM +', "kind": "ICCID" '
response = requests.post('https://thingspace.verizon.com/api/m2m/v1/devices/actions/list', headers=header, data=data)
我得到的响应是一个 JSON 数组,看起来像
"hasMoreData": false,
"devices": [
"accountName": "123456789-00001",
"billingCycleEndDate": "2020-10-31T20:00:00-04:00",
"carrierInformations": [
"carrierName": "Verizon Wireless",
"servicePlan": "3rrrrx48wwwwrjgjtyjtyjtyjtyj",
"state": "active"
],
"connected": true,
"createdAt": "2016-11-04T11:06:28-04:00",
"deviceIds": [
"id": "5256694405",
"kind": "mdn"
,
"id": "3114949302094150",
"kind": "imsi"
,
"id": "35922505468230",
"kind": "imei"
,
"id": "891480000054957290575",
"kind": "iccId"
,
"id": "15256694405",
"kind": "msisdn"
,
"id": "5256694405",
"kind": "min"
],
"extendedAttributes": [
"key": "PrimaryPlaceOfUseTitle"
,
"key": "PrimaryPlaceOfUseFirstName",
"value": "5256694405",
,
"key": "PrimaryPlaceOfUseMiddleName"
,
"key": "PrimaryPlaceOfUseLastName",
"value": "ESN"
,
"key": "PrimaryPlaceOfUseSuffix"
,
"key": "PrimaryPlaceOfUseAddressLine1"
,
"key": "PrimaryPlaceOfUseAddressLine2"
,
"key": "PrimaryPlaceOfUseCity"
,
"key": "PrimaryPlaceOfUseState"
,
"key": "PrimaryPlaceOfUseCountry"
,
"key": "PrimaryPlaceOfUseZipCode"
,
"key": "PrimaryPlaceOfUseZipCode4"
,
"key": "PrimaryPlaceOfUseCBRPhone"
,
"key": "PrimaryPlaceOfUseCBRPhoneType"
,
"key": "PrimaryPlaceOfUseEmailAddress"
,
"key": "AccountNumber",
"value": "5256694405-00001"
,
"key": "SmsrOid"
,
"key": "ProfileStatus"
,
"key": "PromoCodes",
"value": ""
,
"key": "PromotionStartDate",
"value": ""
,
"key": "PromotionScheduledEndDate",
"value": ""
,
"key": "LeadId",
"value": ""
,
"key": "CustomerName",
"value": ""
,
"key": "CustomerAddressLine1",
"value": ""
,
"key": "CustomerAddressLine2",
"value": ""
,
"key": "CustomerAddressCity",
"value": ""
,
"key": "CustomerAddressState",
"value": ""
,
"key": "CustomerAddressZipCode",
"value": ""
,
"key": "ServiceZipCode",
"value": ""
,
"key": "SkuNumber",
"value": "VZW080000460053"
,
"key": "CostCenterCode"
,
"key": "PreIMEI",
"value": "3592254564568445"
,
"key": "PreSKU",
"value": "VZW080000100037"
,
"key": "SIMOTADate",
"value": "4/30/2020 1:22:18 PM"
,
"key": "RoamingStatus",
"value": "NotRoaming"
,
"key": "LastRoamingStatusUpdate",
"value": "9/24/2020 5:40:26 PM"
],
"groupNames": [
"Default: 0220433754-00001"
],
"ipAddress": "100.100.100.100",
"lastActivationBy": "User Verizon",
"lastActivationDate": "2016-11-04T11:06:28-04:00",
"lastConnectionDate": "2020-09-24T13:40:26-04:00"
]
我在脚本中添加了一个部分,使用下面的代码从数组中提取 mdn、iccid 和 imei。
def puller(line_json):
line_data = json.loads(line_json)
mdn = (line_data['devices'][0]['deviceIds'][0]['id'])
iccid = (line_data['devices'][0]['deviceIds'][3]['id'])
imei = (line_data['devices'][0]['deviceIds'][2]['id'])
print('phone = ' ,mdn)
print('SIM = ' , iccid)
print('IMEI = ' , imei)
我测试了这段代码,它使用一个测试 ID 以应有的方式工作。然后我继续使用另一个测试 ID 进行测试,我了解到数组结构并不总是相同的。第二个 JSON 数组在下面。我想知道是否有更好的方法来找到我想要的特定值,但是我没有像上面那样具体告诉脚本项目在结构中的位置。
"hasMoreData": false,
"devices": [
"accountName": "02234234234-00001",
"billingCycleEndDate": "2020-10-31T20:00:00-04:00",
"carrierInformations": [
"carrierName": "Verizon Wireless",
"servicePlan": "37776xdsfewsfwe576193",
"state": "active"
],
"connected": true,
"createdAt": "2016-05-24T15:55:06-04:00",
"deviceIds": [
"id": "0945437676404",
"kind": "esn"
,
"id": "1234565799",
"kind": "mdn"
,
"id": "31148454545458767",
"kind": "imsi"
,
"id": "01426786678211",
"kind": "imei"
,
"id": "89148000006456456454",
"kind": "iccId"
,
"id": "1234565799",
"kind": "min"
],
"extendedAttributes": [
"key": "PrimaryPlaceOfUseTitle"
,
"key": "PrimaryPlaceOfUseFirstName",
"value": "096114564506772"
,
"key": "PrimaryPlaceOfUseMiddleName"
,
"key": "PrimaryPlaceOfUseLastName",
"value": "096546454806772"
,
"key": "PrimaryPlaceOfUseSuffix"
,
"key": "PrimaryPlaceOfUseAddressLine1"
,
"key": "PrimaryPlaceOfUseAddressLine2"
,
"key": "PrimaryPlaceOfUseCity"
,
"key": "PrimaryPlaceOfUseState"
,
"key": "PrimaryPlaceOfUseCountry"
,
"key": "PrimaryPlaceOfUseZipCode"
,
"key": "PrimaryPlaceOfUseZipCode4"
,
"key": "PrimaryPlaceOfUseCBRPhone"
,
"key": "PrimaryPlaceOfUseCBRPhoneType"
,
"key": "PrimaryPlaceOfUseEmailAddress"
,
"key": "AccountNumber",
"value": "02242342354-00001"
,
"key": "SmsrOid"
,
"key": "ProfileStatus"
,
"key": "PromoCodes",
"value": ""
,
"key": "PromotionStartDate",
"value": ""
,
"key": "PromotionScheduledEndDate",
"value": ""
,
"key": "LeadId",
"value": ""
,
"key": "CustomerName",
"value": ""
,
"key": "CustomerAddressLine1",
"value": ""
,
"key": "CustomerAddressLine2",
"value": ""
,
"key": "CustomerAddressCity",
"value": ""
,
"key": "CustomerAddressState",
"value": ""
,
"key": "CustomerAddressZipCode",
"value": ""
,
"key": "ServiceZipCode",
"value": ""
,
"key": "SkuNumber",
"value": "VZW12000364343005"
,
"key": "CostCenterCode"
,
"key": "PreIMEI"
,
"key": "PreSKU",
"value": "VZW12000334340005"
,
"key": "SIMOTADate",
"value": "3/13/2020 10:52:07 AM"
,
"key": "RoamingStatus",
"value": "NotRoaming"
,
"key": "LastRoamingStatusUpdate",
"value": "10/20/2020 6:14:20 PM"
],
"groupNames": [
"Default: 02342343754-00001"
],
"ipAddress": "101.101.101.101",
"lastActivationBy": "User Verizon",
"lastActivationDate": "2016-05-24T15:55:16-04:00",
"lastConnectionDate": "2020-10-20T14:14:20-04:00"
]
我尝试使用我所做的一些研究中的这段代码来找到我正在寻找的值;在这种情况下,mdn.我遇到的问题是响应返回了一组没有任何信息的空白括号,所以我知道我可能做错了什么。
def json_extract(obj, kind):
"""Recursively fetch values from nested JSON."""
arr = []
def extract(obj, arr, kind):
"""Recursively search for values of key in JSON tree."""
if isinstance(obj, dict):
for k, v in obj.items():
if isinstance(v, (dict, list)):
extract(v, arr, kind)
elif k == kind:
arr.append(v)
elif isinstance(obj, list):
for item in obj:
extract(item, arr, kind)
return arr
values = extract(obj, arr, kind)
return values
names = json_extract(response , 'mdn')
print(names)
【问题讨论】:
【参考方案1】:我了解到您正在尝试从上面的 json 对象中查找 mdn、iccid 和 imei'id,因此,与其使用递归和您在那里完成的复杂编码,使用 python 的内置库来帮助更容易你出去:
您可以使用next
函数来达到您的目的:
# load your json data
line_data = json.loads(data)
# narrow your focus on the array in question
device_ids = line_data['devices'][0]['deviceIds']
# This gets the first item's id attribute from the list that matches the condition, and returns None if no item matches.
mdn = next((x['id'] for x in device_ids if x['kind'] == "mdn"), None)
iccid = next((x['id'] for x in device_ids if x['kind'] == "iccid"), None)
imei = next((x['id'] for x in device_ids if x['kind'] == "imei"), None)
如果 None
在数组中找不到这样的元素,您将需要处理它。
参考:Find object in list that has attribute equal to some value (that meets any condition)
【讨论】:
以上是关于Python中从一些成语中各提取一个字组成一句话有啥方法的主要内容,如果未能解决你的问题,请参考以下文章
在python beautifulsoup中从html中提取json