如何在 Cassandra 中存储嵌套数据
Posted
技术标签:
【中文标题】如何在 Cassandra 中存储嵌套数据【英文标题】:How do I store nested data in Cassandra 【发布时间】:2015-08-12 22:36:39 【问题描述】:考虑以下“文档”,这两个文档将如何存储在集合中。
// collection posts:
id: 1,
name: "kingsbounty",
fields:
"title":
"title": "Game Title",
"value": "Kings Bounty"
,
"body":
"title": "Game Description",
"value": "Kings Bounty is a turn-based fantasy..."
// collection posts:
id: 2,
name: "outrun",
fields:
"vehicle":
"title": "Vehicle",
"value": "Ferrari Testarossa"
,
"color":
"title": "Vehicle Color",
"value": "Red"
,
"driver":
"title": "Driver",
"value": "David Hasselhoff"
注意字段是如何变化大小的地图。
因为 cassandra 不允许定义这种类型 fields <map <map, text>>
我想学习“cassandra”的方法,即非规范化的方法。 这种方式不会被非规范化,但可以存储和检索任意长度的嵌套数据。
CREATE TABLE posts (
id uuid,
name text,
fields list<text>
PRIMARY KEY (id)
);
CREATE INDEX post_name_key ON posts (name);
CREATE TABLE post_fields (
post_name text,
field_name text,
title text,
value text,
PRIMARY KEY (post_name, field_name)
);
INSERT INTO posts (id, name, fields) VALUES ( uuid(), 'kingsbounty', [ 'title', 'body' ] );
INSERT INTO posts (id, name, fields) VALUES ( uuid(), 'outrun', [ 'vehicle', 'color', 'driver' ] );
INSERT INTO post_fields (post_name, field_name, title, value) VALUES ( 'kingsbounty', 'title', 'Game Title', 'Kings Bounty');
INSERT INTO post_fields (post_name, field_name, title, value) VALUES ( 'kingsbounty', 'body', 'Game Description', 'Kings Bounty is a turn-based fantasy...');
INSERT INTO post_fields (post_name, field_name, title, value) VALUES ( 'outrun', 'vehicle', 'Vehicle', 'Ferrari Testarossa');
INSERT INTO post_fields (post_name, field_name, title, value) VALUES ( 'outrun', 'color', 'Vehicle Color', 'Red');
INSERT INTO post_fields (post_name, field_name, title, value) VALUES ( 'outrun', 'driver', 'Driver', 'David Hasselhoff');
SELECT fields FROM posts WHERE name = 'kingsbounty';
fields
-------------------
['title', 'body']
SELECT * FROM post_fields WHERE post_name = 'kingsbounty';
post_name | field_name | title | value
-------------+------------+------------------+-----------------------------------------
kingsbounty | body | Game Description | Kings Bounty is a turn-based fantasy...
kingsbounty | title | Game Title | Kings Bounty
SELECT fields FROM posts WHERE name = 'outrun';
fields
--------------------------------
['vehicle', 'color', 'driver']
SELECT * FROM post_fields WHERE post_name = 'outrun';
post_name | field_name | title | value
-----------+------------+---------------+--------------------
outrun | color | Vehicle Color | Red
outrun | driver | Driver | David Hasselhoff
outrun | vehicle | Vehicle | Ferrari Testarossa
有什么更好的非规范化方式来存储此类数据?
【问题讨论】:
【参考方案1】:来自 irc 上#cassandra 的 jeffj 建议我什至不需要第一个表。
我现在开始明白了。
CREATE TABLE posts (
name text,
field text,
title text,
value text,
PRIMARY KEY (name, field)
);
INSERT INTO posts (name, field, title, value) VALUES ( 'kingsbounty', 'title', 'Game Title', 'Kings Bounty');
INSERT INTO posts (name, field, title, value) VALUES ( 'kingsbounty', 'body', 'Game Description', 'Kings Bounty is a turn-based fantasy...');
INSERT INTO posts (name, field, title, value) VALUES ( 'outrun', 'vehicle', 'Vehicle', 'Ferrari Testarossa');
INSERT INTO posts (name, field, title, value) VALUES ( 'outrun', 'color', 'Vehicle Color', 'Red');
INSERT INTO posts (name, field, title, value) VALUES ( 'outrun', 'driver', 'Driver', 'David Hasselhoff');
SELECT field FROM posts WHERE name = 'kingsbounty';
field
-------
body
title
SELECT * FROM posts WHERE name = 'kingsbounty';
name | field | title | value
-------------+-------+------------------+-----------------------------------------
kingsbounty | body | Game Description | Kings Bounty is a turn-based fantasy...
kingsbounty | title | Game Title | Kings Bounty
SELECT fields FROM posts WHERE name = 'outrun';
field
---------
color
driver
vehicle
SELECT * FROM posts WHERE name = 'outrun';
name | field | title | value
--------+---------+---------------+--------------------
outrun | color | Vehicle Color | Red
outrun | driver | Driver | David Hasselhoff
outrun | vehicle | Vehicle | Ferrari Testarossa
【讨论】:
【参考方案2】:使用您想要返回的任何信息创建表。假设您需要返回所有信息,请将其存储到如下所示的单个表中,并在客户端进行必要的操作。
CREATE TABLE posts (
id uuid,
name text,
fields map<text,text>,
PRIMARY KEY (id)
);
insert into posts (id,name,fields) values (uuid(),'kingsbounty','title':'"title": "Game Title","value": "Kings Bounty"','body':'"title": "Game Description","value": "Kings Bounty is a turn-based fantasy..."');
insert into posts (id,name,fields) values (uuid(),'outrun','vehicle':'"title": "Vehicle","value": "Ferrari Testarossa"','color':'"title": "Vehicle Color","value": "Red"','driver':'"title": "Driver","value": "David Hasselhoff"');
cqlsh> select id,name,fields from posts;
id | name | fields
--------------------------------------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
dd31393d-2654-42ec-a5fb-73ab13c12932 | kingsbounty | 'body': '"title": "Game Description","value": "Kings Bounty is a turn-based fantasy..."', 'title': '"title": "Game Title","value": "Kings Bounty"'
a1e2b512-7177-4a2d-8da3-528b9d5097c0 | outrun | 'color': '"title": "Vehicle Color","value": "Red"', 'driver': '"title": "Driver","value": "David Hasselhoff"', 'vehicle': '"title": "Vehicle","value": "Ferrari Testarossa"'
【讨论】:
以上是关于如何在 Cassandra 中存储嵌套数据的主要内容,如果未能解决你的问题,请参考以下文章
如何从 cassandra 或 hbase 中提取 leveldb 类型的数据存储(sstable + memtable)?