如何从 json 文件中提取子域?
Posted
技术标签:
【中文标题】如何从 json 文件中提取子域?【英文标题】:How can I extract subdomains from a json file? 【发布时间】:2021-12-13 22:55:54 【问题描述】:我有一长串 json 文件。我想提取 harvard.edu 的子域,它位于“host”中的变量 host 中:“ceonlineb2b.hms.harvard.edu 使用 bash。如果有人能帮忙,我会很高兴。下面只是一个 sn-p json 文件。
"data":
"total_items": 3,
"offset": 0,
"limit": 1,
"items": [
"name": "ceonlineb2b.hms.harvard.edu",
"alexa": null,
"cert_summary": null,
"dns_records":
"A": [
"3.221.168.206",
"54.174.253.3"
],
"AAAA": null,
"CAA": null,
"CNAME": [
"hms-moodleb2b-prod.cabem.com"
],
"MX": null,
"NS": null,
"SOA": null,
"TXT": null,
"SPF": null,
"updated_at": "2021-05-14T23:12:43.332816923Z"
,
"hosts_enrichment": [
"ip": "3.221.168.206",
"as_num": 14618,
"as_org": "amazon-aes",
"isp": "amazon.com",
"city_name": "ashburn",
"country": "united states",
"country_iso_code": "us",
"location":
"lat": 39.0481,
"lon": -77.4728
,
"ip": "54.174.253.3",
"as_num": 14618,
"as_org": "amazon-aes",
"isp": "amazon.com",
"city_name": "ashburn",
"country": "united states",
"country_iso_code": "us",
"location":
"lat": 39.0481,
"lon": -77.4728
],
"http_extract":
"cookies": [
"domain": "",
"expire": "0001-01-01T00:00:00Z",
"http_only": true,
"key": "MoodleSession",
"max_age": 0,
"path": "/",
"security": true,
"value": "tqhmqc4muk513sad1bmnl3kocj"
],
"description": "",
"emails": null,
"final_redirect_url":
"full_uri": "https://ceonlineb2b.hms.harvard.edu/login/index.php",
"host": "ceonlineb2b.hms.harvard.edu",
"path": "/login/index.php"
,
"extracted_at": "2020-10-04T20:55:26.043777194Z",
"favicon_sha256": "",
"http_headers": [
"name": "date",
"value": "Sun, 04 Oct 2020 20:55:25 GMT"
,
"name": "content-type",
"value": "text/html; charset=utf-8"
,
"name": "server",
"value": "Apache/2.4.46 () OpenSSL/1.0.2k-fips"
,
"name": "x-powered-by",
"value": "PHP/7.2.24"
,
"name": "content-language",
"value": "en"
,
"name": "content-script-type",
"value": "text/javascript"
,
"name": "content-style-type",
"value": "text/css"
,
"name": "x-ua-compatible",
"value": "IE=edge"
,
"name": "cache-control",
"value": "private, pre-check=0, post-check=0, max-age=0, no-transform"
,
"name": "pragma",
"value": "no-cache"
,
"name": "expires",
"value": ""
,
"name": "accept-ranges",
"value": "none"
,
"name": "set-cookie",
"value": "MoodleSession=tqhmqc4muk513sad1bmnl3kocj; path=/; secure;HttpOnly;Secure;SameSite=None"
],
"http_status_code": 200,
"links": [
"anchor": "Forgotten your username or password?",
"url": "https://ceonlineb2b.hms.harvard.edu/login/forgot_password.php",
"url_host": "ceonlineb2b.hms.harvard.edu"
,
"anchor": "Privacy Statement",
"url": "/local/staticpage/view.php?page=privacy-statement",
"url_host": ""
,
"anchor": "Terms of Service",
"url": "/local/staticpage/view.php?page=terms-of-service",
"url_host": ""
,
"anchor": "Copyright Information",
"url": "/local/staticpage/view.php?page=copyright-information",
"url_host": ""
],
"meta_tags": [
"name": "keywords",
"value": "moodle, HMS Postgraduate Courses: Log in to the site"
,
"name": "format-detection",
"value": "telephone=no"
,
"name": "robots",
"value": "noindex"
,
"name": "viewport",
"value": "width=device-width, initial-scale=1.0"
],
"robots_txt": "",
"scripts": [
"https://ceonlineb2b.hms.harvard.edu/theme/yui_combo.php?rollup/3.17.2/yui-moodlesimple-min.js",
"https://ceonlineb2b.hms.harvard.edu/lib/javascript.php/1589465014/lib/javascript-static.js",
"https://ceonlineb2b.hms.harvard.edu/lib/javascript.php/1589465014/lib/requirejs/require.min.js",
"https://ceonlineb2b.hms.harvard.edu/theme/javascript.php/hms/1589465013/footer"
],
"styles": [
"https://ceonlineb2b.hms.harvard.edu/theme/yui_combo.php?rollup/3.17.2/yui-moodlesimple-min.css",
"https://ceonlineb2b.hms.harvard.edu/theme/styles.php/hms/1589465013_1/all"
],
"title": "HMS Postgraduate Courses: Log in to the site"
,
"is_CNAME": null,
"is_MX": null,
"is_NS": null,
"is_PTR": null,
"is_subdomain": true,
"name_without_suffix": "ceonlineb2b.hms.harvard",
"updated_at": "2021-05-16T10:25:01.59086376Z",
"user_scan_at": null,
"whois_parsed": null,
"security_score":
"score": 100
,
"cve_list": null,
"technologies": [
"name": "Moodle",
"version": ""
,
"name": "RequireJS",
"version": ""
],
"trackers": null,
"organizations": null
]
【问题讨论】:
请添加到您的问题(不发表评论):您搜索了什么,找到了什么?你尝试过什么,它是如何失败的? 使用JSON.parse()
将其转换为对象使用正则表达式提取电子邮件:***.com/a/43913430/16775704
【参考方案1】:
对于 bash 上的 json 解析,我建议查看 jq。它重量轻且用途广泛。
我们可以使用 -r 标志只输出值。
Output the fields of each object with the keys in sorted order.
--raw-output / -r:
您提供的 JSON 结构的子域位于 .data.items[].http_extract.final_redirect_url.host
"data":
"items": [
"http_extract":
"final_redirect_url":
"full_uri": "https://ceonlineb2b.hms.harvard.edu/login/index.php",
"host": "ceonlineb2b.hms.harvard.edu",
"path": "/login/index.php"
,
...
我已将您的 json 保存到一个文件中,se.json
使用 jq 提取完整域的示例
jq -r '.data.items[].http_extract.final_redirect_url.host' se.json
输出
ceonlineb2b.hms.harvard.edu
要提取子域,只需使用 sub() 执行搜索/替换。
sub(regex; tostring) sub(regex; string; flags)
Emit the string obtained by replacing the first match of regex in the input string with tostring, after interpolation. tostring should be a jq string, and may contain references to named captures. The named captures are, in effect, presented as a JSON object (as constructed by capture) to tostring, so a reference to a captured variable named "x" would take the form: "(.x)".
使用 jq 提取子域
jq -r '.data.items[].http_extract.final_redirect_url.host | sub(".hms.harvard.edu";"")' se.json
输出
ceonlineb2b
【讨论】:
非常感谢 Jason以上是关于如何从 json 文件中提取子域?的主要内容,如果未能解决你的问题,请参考以下文章
如何使用从 javascript/html 中的外部 php 文件中提取的 JSON 数据?
从 json 文件 bigquery 中提取 Json Array 元素