如何从 json 文件中提取子域?

Posted

技术标签:

【中文标题】如何从 json 文件中提取子域?【英文标题】:How can I extract subdomains from a json file? 【发布时间】:2021-12-13 22:55:54 【问题描述】:

我有一长串 json 文件。我想提取 harvard.edu 的子域,它位于“host”中的变量 host 中:“ceonlineb2b.hms.harvard.edu 使用 bash。如果有人能帮忙,我会很高兴。下面只是一个 sn-p json 文件。


  "data": 
    "total_items": 3,
    "offset": 0,
    "limit": 1,
    "items": [
      
        "name": "ceonlineb2b.hms.harvard.edu",
        "alexa": null,
        "cert_summary": null,
        "dns_records": 
          "A": [
            "3.221.168.206",
            "54.174.253.3"
          ],
          "AAAA": null,
          "CAA": null,
          "CNAME": [
            "hms-moodleb2b-prod.cabem.com"
          ],
          "MX": null,
          "NS": null,
          "SOA": null,
          "TXT": null,
          "SPF": null,
          "updated_at": "2021-05-14T23:12:43.332816923Z"
        ,
        "hosts_enrichment": [
          
            "ip": "3.221.168.206",
            "as_num": 14618,
            "as_org": "amazon-aes",
            "isp": "amazon.com",
            "city_name": "ashburn",
            "country": "united states",
            "country_iso_code": "us",
            "location": 
              "lat": 39.0481,
              "lon": -77.4728
            
          ,
          
            "ip": "54.174.253.3",
            "as_num": 14618,
            "as_org": "amazon-aes",
            "isp": "amazon.com",
            "city_name": "ashburn",
            "country": "united states",
            "country_iso_code": "us",
            "location": 
              "lat": 39.0481,
              "lon": -77.4728
            
          
        ],
        "http_extract": 
          "cookies": [
            
              "domain": "",
              "expire": "0001-01-01T00:00:00Z",
              "http_only": true,
              "key": "MoodleSession",
              "max_age": 0,
              "path": "/",
              "security": true,
              "value": "tqhmqc4muk513sad1bmnl3kocj"
            
          ],
          "description": "",
          "emails": null,
          "final_redirect_url": 
            "full_uri": "https://ceonlineb2b.hms.harvard.edu/login/index.php",
            "host": "ceonlineb2b.hms.harvard.edu",
            "path": "/login/index.php"
          ,
          "extracted_at": "2020-10-04T20:55:26.043777194Z",
          "favicon_sha256": "",
          "http_headers": [
            
              "name": "date",
              "value": "Sun, 04 Oct 2020 20:55:25 GMT"
            ,
            
              "name": "content-type",
              "value": "text/html; charset=utf-8"
            ,
            
              "name": "server",
              "value": "Apache/2.4.46 () OpenSSL/1.0.2k-fips"
            ,
            
              "name": "x-powered-by",
              "value": "PHP/7.2.24"
            ,
            
              "name": "content-language",
              "value": "en"
            ,
            
              "name": "content-script-type",
              "value": "text/javascript"
            ,
            
              "name": "content-style-type",
              "value": "text/css"
            ,
            
              "name": "x-ua-compatible",
              "value": "IE=edge"
            ,
            
              "name": "cache-control",
              "value": "private, pre-check=0, post-check=0, max-age=0, no-transform"
            ,
            
              "name": "pragma",
              "value": "no-cache"
            ,
            
              "name": "expires",
              "value": ""
            ,
            
              "name": "accept-ranges",
              "value": "none"
            ,
            
              "name": "set-cookie",
              "value": "MoodleSession=tqhmqc4muk513sad1bmnl3kocj; path=/; secure;HttpOnly;Secure;SameSite=None"
            
          ],
          "http_status_code": 200,
          "links": [
            
              "anchor": "Forgotten your username or password?",
              "url": "https://ceonlineb2b.hms.harvard.edu/login/forgot_password.php",
              "url_host": "ceonlineb2b.hms.harvard.edu"
            ,
            
              "anchor": "Privacy Statement",
              "url": "/local/staticpage/view.php?page=privacy-statement",
              "url_host": ""
            ,
            
              "anchor": "Terms of Service",
              "url": "/local/staticpage/view.php?page=terms-of-service",
              "url_host": ""
            ,
            
              "anchor": "Copyright Information",
              "url": "/local/staticpage/view.php?page=copyright-information",
              "url_host": ""
            
          ],
          "meta_tags": [
            
              "name": "keywords",
              "value": "moodle, HMS Postgraduate Courses: Log in to the site"
            ,
            
              "name": "format-detection",
              "value": "telephone=no"
            ,
            
              "name": "robots",
              "value": "noindex"
            ,
            
              "name": "viewport",
              "value": "width=device-width, initial-scale=1.0"
            
          ],
          "robots_txt": "",
          "scripts": [
            "https://ceonlineb2b.hms.harvard.edu/theme/yui_combo.php?rollup/3.17.2/yui-moodlesimple-min.js",
            "https://ceonlineb2b.hms.harvard.edu/lib/javascript.php/1589465014/lib/javascript-static.js",
            "https://ceonlineb2b.hms.harvard.edu/lib/javascript.php/1589465014/lib/requirejs/require.min.js",
            "https://ceonlineb2b.hms.harvard.edu/theme/javascript.php/hms/1589465013/footer"
          ],
          "styles": [
            "https://ceonlineb2b.hms.harvard.edu/theme/yui_combo.php?rollup/3.17.2/yui-moodlesimple-min.css",
            "https://ceonlineb2b.hms.harvard.edu/theme/styles.php/hms/1589465013_1/all"
          ],
          "title": "HMS Postgraduate Courses: Log in to the site"
        ,
        "is_CNAME": null,
        "is_MX": null,
        "is_NS": null,
        "is_PTR": null,
        "is_subdomain": true,
        "name_without_suffix": "ceonlineb2b.hms.harvard",
        "updated_at": "2021-05-16T10:25:01.59086376Z",
        "user_scan_at": null,
        "whois_parsed": null,
        "security_score": 
          "score": 100
        ,
        "cve_list": null,
        "technologies": [
          
            "name": "Moodle",
            "version": ""
          ,
          
            "name": "RequireJS",
            "version": ""
          
        ],
        "trackers": null,
        "organizations": null
      
    ]
  

【问题讨论】:

请添加到您的问题(不发表评论):您搜索了什么,找到了什么?你尝试过什么,它是如何失败的? 使用JSON.parse()将其转换为对象使用正则表达式提取电子邮件:***.com/a/43913430/16775704 【参考方案1】:

对于 bash 上的 json 解析,我建议查看 jq。它重量轻且用途广泛。

我们可以使用 -r 标志只输出值。

Output the fields of each object with the keys in sorted order.  

--raw-output / -r:

您提供的 JSON 结构的子域位于 .data.items[].http_extract.final_redirect_url.host


  "data": 
    "items": [
      
        "http_extract": 
          "final_redirect_url": 
            "full_uri": "https://ceonlineb2b.hms.harvard.edu/login/index.php",
            "host": "ceonlineb2b.hms.harvard.edu",
            "path": "/login/index.php"
          ,
        ...

我已将您的 json 保存到一个文件中,se.json

使用 jq 提取完整域的示例

jq -r '.data.items[].http_extract.final_redirect_url.host' se.json

输出

ceonlineb2b.hms.harvard.edu

要提取子域,只需使用 sub() 执行搜索/替换。

sub(regex; tostring) sub(regex; string; flags)  

Emit the string obtained by replacing the first match of regex in the input string with tostring, after interpolation. tostring should be a jq string, and may contain references to named captures. The named captures are, in effect, presented as a JSON object (as constructed by capture) to tostring, so a reference to a captured variable named "x" would take the form: "(.x)".

使用 jq 提取子域

jq -r '.data.items[].http_extract.final_redirect_url.host | sub(".hms.harvard.edu";"")' se.json

输出

ceonlineb2b

【讨论】:

非常感谢 Jason

以上是关于如何从 json 文件中提取子域?的主要内容,如果未能解决你的问题,请参考以下文章

JSFinder 一个从JS文件中获取url和子域名的工具

如何使用从 javascript/html 中的外部 php 文件中提取的 JSON 数据?

从 json 文件 bigquery 中提取 Json Array 元素

如何在 for 循环中提取 JSON 文件的元素

从 JSON 文件中提取字段,将其与纯文本文件匹配值进行比较,并从 JSON 文件中提取特定字段

如何从 PHP [Json results from open alpr] 中读取这种类型的 JSON 文件?