从 bigquery 查询整行时如何将模式与行合并 - java

Posted

技术标签:

【中文标题】从 bigquery 查询整行时如何将模式与行合并 - java【英文标题】:How to merge Schema With Rows when querying a full row from bigquery - java 【发布时间】:2016-12-17 21:29:29 【问题描述】:

我将以下记录上传到 biqQuery:


  insertId: "1234",
  executionId: "1111",
  jobs:[
    name:"aaaa", version:"0.0.0",
    name:"bbbb", version:"0.0.0",
    name:"cccc", version:"0.0.0",
  ]

这是我的架构

[
  "name":"insertId", "type":"STRING",
  "name": "executionId","type": "STRING",
  
    "name": "jobs",
    "type": "record",
    "mode": "repeated",
    "fields": [
      
        "name": "name",
        "type": "STRING"
      ,
      
        "name": "version",
        "type": "STRING"
      
    ]
  
]

现在我在java中查询

"SELECT * FROM `myDataset.myTable` where executionId=\"1111\" ;"

这是我使用的代码,取自here:

    String projectId = "myProjectId";
    String queryString = "SELECT * FROM `myDataset.myTable` where executionId=\"1111\" ;"; 
    long waitTime = 10000;
    boolean useLegacySql = false;

    Iterator<GetQueryResultsResponse> pages = run(projectId, queryString, waitTime, useLegacySql);
    List<TableRow> tableRow = pages.next().getRows();
    for(TableRow row: tableRow)
    
        System.out.println(row);
    

这是我得到的输出:


      "f": [
        "v": "1234",
        "v": "1111" ,
        "v": [
          "v":  "f": [
                   "v": "aaaa" ,
                   "v": "0.0.0"
                ]
              
            ,
            "v":  "f": [
                   "v": "bbbb",
                   "v": "0.0.0" 
                ]
              
            ,
             "v": 
                "f": [
                  "v": "cccc" ,
                  "v": "0.0.0"
                ]
              
             ]]

现在我的架构是动态的,可能包含嵌套和重复的字段,有些是空的,我如何将架构与行合并并根据架构动态获取我的原始数据?

mergeSchemaWithRows(schema, rows) 之类的东西在 google-cloud npm 包中)

【问题讨论】:

【参考方案1】:

这就是我所做的:

/**
     * Merge a rowset returned from the API with a table schema.
     *
     * @static
     * @private
     *
     * @param object schema
     * @param array rows
     * @return array Fields using their matching names from the table's schema.
     */
      public static JSONObject mergeRowsWithSchema(JSONArray schemaArr, JSONObject row) 

          JSONObject convertedJson = new JSONObject();
          JSONArray jsonFields = row.getJSONArray("f");
          int i = -1;
          for (Object field : jsonFields) 
              i++;
              JSONObject fieldObj = (JSONObject)field;
              if(fieldObj.isNull("v")) continue;
              Object value = fieldObj.get("v");
              JSONObject schemaField = schemaArr.getJSONObject(i);
              Object convertedValue = null;
              if (schemaField.has("mode") && schemaField.getString("mode").toUpperCase().equals("REPEATED")) 
                  JSONArray convertedArray = new JSONArray();
                  for (Object val : (JSONArray)value) 
                      convertedArray.put(convert(schemaField,((JSONObject)val).get("v")));
                  convertedValue = convertedArray; 

              
              else
                  convertedValue = convert(schemaField, value);
              convertedJson.put(schemaField.getString("name"), convertedValue);
            
          return convertedJson;

      

    private static Object convert(JSONObject schemaField, Object value) 
        if(value == null || value.equals(null)) return value;
        switch (schemaField.getString("type").toUpperCase()) 
          case "STRING": 
              return value;
          case "BOOLEAN": 
                return value.equals("true");
        case "FLOAT": 
           return Float.parseFloat((String)value);
        case "INTEGER":
           return Integer.parseInt((String)value);
        case "RECORD":
              return mergeRowsWithSchema(schemaField.getJSONArray("fields"), (JSONObject)value); 
        case "TIMESTAMP":
              long lng =(long)Double.parseDouble((String)value);
                Date date = new Date(lng * 1000);
                DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'");
                return dateFormat.format(date);
      
        return null;
    

【讨论】:

以上是关于从 bigquery 查询整行时如何将模式与行合并 - java的主要内容,如果未能解决你的问题,请参考以下文章

运行查询时出现 BigQuery 错误“解析从位置开始的行时检测到错误:219019。错误:缺少右双引号 (”) 字符

使用联合查询将 bigquery 表与谷歌云 postgres 表合并

BigQuery 中的 MERGE 语法是扫描整个表吗?

从 Bigquery 中的查询复制表

我可以通过单个查询将多个 BigQuery 列合并到一个重复字段中吗?

Bigquery:如何将 2 个时间戳列合并为 1 个列?