Python GAE - 如何以编程方式将数据从备份导出到 Big Query?
Posted
技术标签:
【中文标题】Python GAE - 如何以编程方式将数据从备份导出到 Big Query?【英文标题】:Python GAE - How to export data from a backup to Big Query programmatically? 【发布时间】:2016-05-18 19:34:35 【问题描述】:我已经在谷歌上搜索了很长时间,但没有找到一种方法将我的备份(在存储桶内)导出到 Big Query,而无需手动执行...
可以这样做吗?
非常感谢!
【问题讨论】:
【参考方案1】:您应该可以通过python-bigquery api 这样做。
首先,您需要连接到 BigQuery 服务。这是我用来这样做的代码:
class BigqueryAdapter(object):
def __init__(self, **kwargs):
self._project_id = kwargs['project_id']
self._key_filename = kwargs['key_filename']
self._account_email = kwargs['account_email']
self._dataset_id = kwargs['dataset_id']
self.connector = None
self.start_connection()
def start_connection(self):
key = None
with open(self._key_filename) as key_file:
key = key_file.read()
credentials = SignedJwtAssertionCredentials(self._account_email,
key,
('https://www.googleapis' +
'.com/auth/bigquery'))
authorization = credentials.authorize(httplib2.Http())
self.connector = build('bigquery', 'v2', http=authorization)
之后,您可以使用self.connector
运行jobs
(in this answer 您会找到一些示例)。
要从 Google Cloud Storage 获取备份,您必须像这样定义 configuration
:
body = "configuration":
"load":
"sourceFormat": #Either "CSV", "DATASTORE_BACKUP", "NEWLINE_DELIMITED_JSON" or "AVRO".
"fieldDelimiter": "," #(if it's comma separated)
"destinationTable":
"projectId": #your_project_id
"tableId": #your_table_to_save_the_data
"datasetId": #your_dataset_id
,
"writeDisposition": #"WRITE_TRUNCATE" or "WRITE_APPEND"
"sourceUris": [
#the path to your backup in google cloud storage. it could be something like "'gs://bucket_name/filename*'. Notice you can use the '*' operator.
],
"schema": # [Optional] The schema for the destination table. The schema can be omitted if the destination table already exists, or if you're loading data from Google Cloud Datastore.
"fields": [ # Describes the fields in a table.
"fields": [ # [Optional] Describes the nested schema fields if the type property is set to RECORD.
# Object with schema name: TableFieldSchema
],
"type": "A String", # [Required] The field data type. Possible values include STRING, BYTES, INTEGER, FLOAT, BOOLEAN, TIMESTAMP or RECORD (where RECORD indicates that the field contains a nested schema).
"description": "A String", # [Optional] The field description. The maximum length is 16K characters.
"name": "A String", # [Required] The field name. The name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_), and must start with a letter or underscore. The maximum length is 128 characters.
"mode": "A String", # [Optional] The field mode. Possible values include NULLABLE, REQUIRED and REPEATED. The default value is NULLABLE.
,
],
,
,
然后运行:
self.connector.jobs().insert(body=body).execute()
希望这就是您想要的。如果您遇到任何问题,请告诉我们。
【讨论】:
以上是关于Python GAE - 如何以编程方式将数据从备份导出到 Big Query?的主要内容,如果未能解决你的问题,请参考以下文章
如何在 Datastore GAE Python 中定义键名?
如何以编程方式(Python)抓取流式实时股票图表代码数据及其指标
如何以编程方式 (Python/JS/C++) 将矢量图形 (SVG) 插入 JPG/TIF 等光栅图像?