Python GAE - 如何以编程方式将数据从备份导出到 Big Query?

Posted

技术标签:

【中文标题】Python GAE - 如何以编程方式将数据从备份导出到 Big Query?【英文标题】:Python GAE - How to export data from a backup to Big Query programmatically? 【发布时间】:2016-05-18 19:34:35 【问题描述】:

我已经在谷歌上搜索了很长时间,但没有找到一种方法将我的备份(在存储桶内)导出到 Big Query,而无需手动执行...

可以这样做吗?

非常感谢!

【问题讨论】:

【参考方案1】:

您应该可以通过python-bigquery api 这样做。

首先,您需要连接到 BigQuery 服务。这是我用来这样做的代码:

class BigqueryAdapter(object):
    def __init__(self, **kwargs):
        self._project_id = kwargs['project_id']
        self._key_filename = kwargs['key_filename']
        self._account_email = kwargs['account_email']
        self._dataset_id = kwargs['dataset_id']
        self.connector = None
        self.start_connection()

    def start_connection(self):
        key = None
        with open(self._key_filename) as key_file:
            key = key_file.read()
        credentials = SignedJwtAssertionCredentials(self._account_email,
                                                    key,
                                                    ('https://www.googleapis' +
                                                     '.com/auth/bigquery'))
        authorization = credentials.authorize(httplib2.Http())
        self.connector = build('bigquery', 'v2', http=authorization)

之后,您可以使用self.connector 运行jobs(in this answer 您会找到一些示例)。

要从 Google Cloud Storage 获取备份,您必须像这样定义 configuration

body = "configuration": 
  "load": 
    "sourceFormat": #Either "CSV", "DATASTORE_BACKUP", "NEWLINE_DELIMITED_JSON" or "AVRO".
    "fieldDelimiter": "," #(if it's comma separated)
    "destinationTable": 
      "projectId": #your_project_id
      "tableId": #your_table_to_save_the_data
      "datasetId": #your_dataset_id
    ,
    "writeDisposition": #"WRITE_TRUNCATE" or "WRITE_APPEND"
    "sourceUris": [
        #the path to your backup in google cloud storage. it could be something like "'gs://bucket_name/filename*'. Notice you can use the '*' operator.
    ],
    "schema":  # [Optional] The schema for the destination table. The schema can be omitted if the destination table already exists, or if you're loading data from Google Cloud Datastore.
      "fields": [ # Describes the fields in a table.
        
          "fields": [ # [Optional] Describes the nested schema fields if the type property is set to RECORD.
            # Object with schema name: TableFieldSchema
          ],
          "type": "A String", # [Required] The field data type. Possible values include STRING, BYTES, INTEGER, FLOAT, BOOLEAN, TIMESTAMP or RECORD (where RECORD indicates that the field contains a nested schema).
          "description": "A String", # [Optional] The field description. The maximum length is 16K characters.
          "name": "A String", # [Required] The field name. The name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_), and must start with a letter or underscore. The maximum length is 128 characters.
          "mode": "A String", # [Optional] The field mode. Possible values include NULLABLE, REQUIRED and REPEATED. The default value is NULLABLE.
        ,
      ],
    ,
  ,

然后运行:

self.connector.jobs().insert(body=body).execute()

希望这就是您想要的。如果您遇到任何问题,请告诉我们。

【讨论】:

以上是关于Python GAE - 如何以编程方式将数据从备份导出到 Big Query?的主要内容,如果未能解决你的问题,请参考以下文章

如何在 Datastore GAE Python 中定义键名?

如何以编程方式将 PDF 数据发送到打印机?

如何以编程方式(Python)抓取流式实时股票图表代码数据及其指标

如何以编程方式 (Python/JS/C++) 将矢量图形 (SVG) 插入 JPG/TIF 等光栅图像?

如何以编程方式更改/更新 Python PyQt4 TableView 中的数据?

python GAE中的动态下拉列表