如何在 Cassandra 中存储嵌套数据

Posted

技术标签:

【中文标题】如何在 Cassandra 中存储嵌套数据【英文标题】:How do I store nested data in Cassandra 【发布时间】:2015-08-12 22:36:39 【问题描述】:

考虑以下“文档”,这两个文档将如何存储在集合中。

// collection posts:

  id: 1,
  name: "kingsbounty",
  fields: 
    "title": 
      "title": "Game Title",
      "value": "Kings Bounty"
    
  ,
  
    "body": 
      "title": "Game Description",
      "value": "Kings Bounty is a turn-based fantasy..."
    
  


// collection posts:

  id: 2,
  name: "outrun",
  fields:  
    "vehicle": 
      "title": "Vehicle",
      "value": "Ferrari Testarossa"
    ,
    "color": 
      "title": "Vehicle Color",
      "value": "Red"
    ,
    "driver": 
      "title": "Driver",
      "value": "David Hasselhoff"
    
  

注意字段是如何变化大小的地图。

因为 cassandra 不允许定义这种类型 fields <map <map, text>>

我想学习“cassandra”的方法,即非规范化的方法。 这种方式不会被非规范化,但可以存储和检索任意长度的嵌套数据。

CREATE TABLE posts (
  id uuid,
  name text,
  fields list<text>
  PRIMARY KEY (id)
);
CREATE INDEX post_name_key ON posts (name);

CREATE TABLE post_fields (
  post_name text,
  field_name text,
  title text,
  value text,
  PRIMARY KEY (post_name, field_name)
);

INSERT INTO posts (id, name, fields) VALUES ( uuid(), 'kingsbounty', [ 'title', 'body' ] );
INSERT INTO posts (id, name, fields) VALUES ( uuid(), 'outrun', [ 'vehicle', 'color', 'driver' ] );

INSERT INTO post_fields (post_name, field_name, title, value) VALUES ( 'kingsbounty', 'title', 'Game Title', 'Kings Bounty');
INSERT INTO post_fields (post_name, field_name, title, value) VALUES ( 'kingsbounty', 'body', 'Game Description', 'Kings Bounty is a turn-based fantasy...');
INSERT INTO post_fields (post_name, field_name, title, value) VALUES ( 'outrun', 'vehicle', 'Vehicle', 'Ferrari Testarossa');
INSERT INTO post_fields (post_name, field_name, title, value) VALUES ( 'outrun', 'color', 'Vehicle Color', 'Red');
INSERT INTO post_fields (post_name, field_name, title, value) VALUES ( 'outrun', 'driver', 'Driver', 'David Hasselhoff');

SELECT fields FROM posts WHERE name = 'kingsbounty';

     fields
    -------------------
     ['title', 'body']

SELECT * FROM post_fields WHERE post_name = 'kingsbounty';

     post_name   | field_name | title            | value
    -------------+------------+------------------+-----------------------------------------
     kingsbounty |       body | Game Description | Kings Bounty is a turn-based fantasy...
     kingsbounty |      title |       Game Title |                            Kings Bounty

SELECT fields FROM posts WHERE name = 'outrun';

     fields
    --------------------------------
     ['vehicle', 'color', 'driver']

SELECT * FROM post_fields WHERE post_name = 'outrun';

     post_name | field_name | title         | value
    -----------+------------+---------------+--------------------
        outrun |      color | Vehicle Color |                Red
        outrun |     driver |        Driver |   David Hasselhoff
        outrun |    vehicle |       Vehicle | Ferrari Testarossa

有什么更好的非规范化方式来存储此类数据?

【问题讨论】:

【参考方案1】:

来自 irc 上#cassandra 的 jeffj 建议我什至不需要第一个表。

我现在开始明白了。

CREATE TABLE posts (
  name text,
  field text,
  title text,
  value text,
  PRIMARY KEY (name, field)
);

INSERT INTO posts (name, field, title, value) VALUES ( 'kingsbounty', 'title', 'Game Title', 'Kings Bounty');
INSERT INTO posts (name, field, title, value) VALUES ( 'kingsbounty', 'body', 'Game Description', 'Kings Bounty is a turn-based fantasy...');
INSERT INTO posts (name, field, title, value) VALUES ( 'outrun', 'vehicle', 'Vehicle', 'Ferrari Testarossa');
INSERT INTO posts (name, field, title, value) VALUES ( 'outrun', 'color', 'Vehicle Color', 'Red');
INSERT INTO posts (name, field, title, value) VALUES ( 'outrun', 'driver', 'Driver', 'David Hasselhoff');

SELECT field FROM posts WHERE name = 'kingsbounty';

 field
-------
  body
 title

SELECT * FROM posts WHERE name = 'kingsbounty';

 name        | field | title            | value
-------------+-------+------------------+-----------------------------------------
 kingsbounty |  body | Game Description | Kings Bounty is a turn-based fantasy...
 kingsbounty | title |       Game Title |                            Kings Bounty

SELECT fields FROM posts WHERE name = 'outrun';

 field
---------
   color
  driver
 vehicle


SELECT * FROM posts WHERE name = 'outrun';

 name   | field   | title         | value
--------+---------+---------------+--------------------
 outrun |   color | Vehicle Color |                Red
 outrun |  driver |        Driver |   David Hasselhoff
 outrun | vehicle |       Vehicle | Ferrari Testarossa

【讨论】:

【参考方案2】:

使用您想要返回的任何信息创建表。假设您需要返回所有信息,请将其存储到如下所示的单个表中,并在客户端进行必要的操作。

CREATE TABLE posts (
  id uuid,
  name text,
  fields map<text,text>,
  PRIMARY KEY (id)
);

insert into posts (id,name,fields) values (uuid(),'kingsbounty','title':'"title": "Game Title","value": "Kings Bounty"','body':'"title": "Game Description","value": "Kings Bounty is a turn-based fantasy..."');
insert into posts (id,name,fields) values (uuid(),'outrun','vehicle':'"title": "Vehicle","value": "Ferrari Testarossa"','color':'"title": "Vehicle Color","value": "Red"','driver':'"title": "Driver","value": "David Hasselhoff"');

   cqlsh> select id,name,fields from posts;

 id                                   | name        | fields
--------------------------------------+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 dd31393d-2654-42ec-a5fb-73ab13c12932 | kingsbounty | 'body': '"title": "Game Description","value": "Kings Bounty is a turn-based fantasy..."', 'title': '"title": "Game Title","value": "Kings Bounty"'
 a1e2b512-7177-4a2d-8da3-528b9d5097c0 |      outrun | 'color': '"title": "Vehicle Color","value": "Red"', 'driver': '"title": "Driver","value": "David Hasselhoff"', 'vehicle': '"title": "Vehicle","value": "Ferrari Testarossa"'

【讨论】:

以上是关于如何在 Cassandra 中存储嵌套数据的主要内容,如果未能解决你的问题,请参考以下文章

如何从 cassandra 或 hbase 中提取 leveldb 类型的数据存储(sstable + memtable)?

如何在 Cassandra 中存储自定义对象?

Cassandra 数据如何更新

如何在 Column 中插入没有空值的 Cassandra

如何从 Spark 结构化流中的 Cassandra 等外部存储读取 Kafka 和查询?

在 cassandra 中使用可能的空值对嵌套数据进行建模