Cassandra 什么是墓碑

Posted yuxiaohao

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Cassandra 什么是墓碑相关的知识,希望对你有一定的参考价值。

什么是墓碑?

Apache Cassandra™(DDAC)的DataStax分发中,删除数据时会创建一个逻辑删除。以下示例列表并不详尽,但说明了一些生成逻辑删除的操作:

创建逻辑删除后,可以将其标记在分区的不同部分。根据标记的位置,墓碑可以分为以下组之一。每个类别通常对应一种唯一类型的数据删除操作。

逻辑删除通过写路径,并被写入一个或多个节点上的SSTables中。逻辑删除的一个关键区别是由gc_grace_seconds设置的内置有效期,称为宽限期在其有效期结束时,该墓碑将作为常规压实过程的一部分被删除

表中的逻辑删除过多可能会对应用程序性能产生负面影响。许多墓碑通常指示数据模型或应用程序中的潜在问题。

 

创建键空间和表

在以下示例中,cycling键空间用于说明不同的逻辑删除类别。使用了两个表: rank_by_year_and_cycling_name和 cyclist_career_teams

提示:由于以下示例同时使用cqlsh和CQL命令,因此建议使用两个不同的终端。

或者,使用一个终端cqlsh并使用DataStax Studio发出CQL命令 

在开始之前,将以下命令复制到cqlsh提示中以创建cycling键空间,创建两个表并将数据插入 rank_by_year_and_cycling_name表中。

您稍后将数据插入到“ 单元格逻辑删除”和“ TTL逻辑删除”中cyclist_career_teams表中

CREATE KEYSPACE cycling WITH replication = 
{class: SimpleStrategy, replication_factor: 1} AND durable_writes = true;

CREATE TABLE cycling.rank_by_year_and_name (
    race_year int,
    race_name text,
    rank int,
    cyclist_name text,
    PRIMARY KEY ((race_year, race_name), rank)
) WITH CLUSTERING ORDER BY (rank ASC);

INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2015, Tour of Japan - Stage 4 - Minami > Shinshu, Benjamin PRADES, 1);
INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2015, Tour of Japan - Stage 4 - Minami > Shinshu, Adam PHELAN, 2);
INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2015, Tour of Japan - Stage 4 - Minami > Shinshu, Thomas LEBAS, 3);
INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2015, Giro d‘‘Italia - Stage 11 - Forli > Imola, Ilnur ZAKARIN, 1);
INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2015, Giro d‘‘Italia - Stage 11 - Forli > Imola, Carlos BETANCUR, 2);
INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2014, 4th Tour of Beijing, Phillippe GILBERT, 1);
INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2014, 4th Tour of Beijing, Daniel MARTIN, 2);
INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank) VALUES (2014, 4th Tour of Beijing, Johan Esteban CHAVES, 3);

CREATE TABLE cycling.cyclist_career_teams (
    id UUID PRIMARY KEY,
    lastname text,
    teams set<text>
);

 

冲洗到SSTables

在对表进行每次修改之后,请nodetool flushcycling键空间运行命令以将 数据从内存表刷新到磁盘上的SSTables。在运行sstabledump以查看输出之前必须执行此步骤

nodetool flush cycling;

刷新cycling键空间后,sstabledump 在SSTable上运行命令,如以下示例所示。

cd / var / lib / cassandra / data / cycling / rank_by_year_and_name-bc05fba12baf11e8b4a8ad2b042f3e18
sstabledump mc-2-big-Data.db

注:sstabledump工具是Apache的Cassandra ™3.0,DDAC,DSE 5.0及更高版本。对于以前的版本,请改用sstable2json实用程序。

 

分区墓碑

当明确删除整个分区时,将生成分区逻辑删除。在CQL DELETE语句中,WHERE子句是针对分区键的相等条件。

DELETE from cycling.rank_by_year_and_name WHERE 
 race_year = 2014 AND race_name = 4th Tour of Beijing;

查看此分区的sstabledump输出,deletion_info 逻辑删除标记在分区级别,并且与分区内的任何行或单元都不相关。

{
    "partition" : {
      "key" : [ "2014", "4th Tour of Beijing" ],
      "position" : 0,
      "deletion_info" : { "marked_deleted" : "2018-05-16T19:40:06.454282Z", "local_delete_time" : "2018-05-16T19:40:06Z" }
    },
    "rows" : [ ]
  }

 

行墓碑

当明确删除分区中的特定行时,将生成行逻辑删除。该模式具有一个复合主键,该主键同时包含分区键和集群键。在CQL DELETE语句中,WHERE子句是针对分区键列和集群键列的相等条件。

DELETE from cycling.rank_by_year_and_name WHERE 
 race_year = 2015 AND race_name = Giro d‘‘Italia - Stage 11 - Forli > Imola AND rank = 2;

查看此分区的sstabledump输出,deletion_info 逻辑删除标记在行级别,并由该分区下的聚类键标识。分区和行单元格均不包含墓碑标记。

{
    "partition" : {
      "key" : [ "2015", "Giro d‘Italia - Stage 11 - Forli > Imola" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 74,
        "clustering" : [ 2 ],
        "deletion_info" : { "marked_deleted" : "2018-05-18T15:29:06.227148Z", "local_delete_time" : "2018-05-18T15:29:06Z" },
        "cells" : [ ]
      }
    ]
  }

 

范围墓碑

当可以通过范围搜索表示的分区中的几行被明确删除时,就会发生范围逻辑删除。该架构具有一个复合主键,该主键同时包含分区键和集群键。在CQL DELETE语句中,WHERE子句是针对分区键的相等条件,加上针对聚类键的不相等条件。

提示:如果从一开始就遵循本教程,请放下 rank_by_year_and_name表格,然后重新创建表格以使用必要的数据填充表格。
DELETE from cycling.rank_by_year_and_name WHERE 
 race_year = 2015 AND race_name = Tour of Japan - Stage 4 - Minami > Shinshu AND rank > 1;

查看此分区的sstabledump输出,deletion_info 逻辑删除标记在行级别。特殊的边界标记标记 range_tombstone_bound已删除行的范围范围(由聚类键值标识)。

{
    "partition" : {
      "key" : [ "2015", "Tour of Japan - Stage 4 - Minami > Shinshu" ],
      "position" : 252
    },
    "rows" : [
      {
        "type" : "range_tombstone_bound",
        "start" : {
          "type" : "inclusive",
          "deletion_info" : { "marked_deleted" : "2018-05-18T16:09:21.474713Z", "local_delete_time" : "2018-05-18T16:09:21Z" }
        }
      },
      {
        "type" : "range_tombstone_bound",
        "end" : {
          "type" : "exclusive",
          "clustering" : [ 1 ],
          "deletion_info" : { "marked_deleted" : "2018-05-18T16:09:21.474713Z", "local_delete_time" : "2018-05-18T16:09:21Z" }
        }
      }
    ]
  }

 

ComplexColumn墓碑

当插入或更新集合类型列(例如集合,列表和映射)时,将生成ComplexColumn逻辑删除。

先前我们创建了 cyclist_career_teams表格。运行以下cqlsh 命令以将数据插入该表。

INSERT INTO cycling.cyclist_career_teams (
     id,
     lastname,
     teams)
     VALUES (cb07baad-eac8-4f65-b28a-bddc06a0de23, ARMITSTEAD, { 
     Boels-Dolmans Cycling Team,AA Drink - Leontien.nl,Team Garmin - Cervelo } );

查看此分区的sstabledump输出,在该分区上没有发生明显的手动删除,但是deletion_info在单元格级别上为collection type column列出了一个标记teams

{
    "partition" : {
      "key" : [ "cb07baad-eac8-4f65-b28a-bddc06a0de23" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 130,
        "liveness_info" : { "tstamp" : "2018-05-18T16:26:23.779724Z" },
        "cells" : [
          { "name" : "lastname", "value" : "ARMITSTEAD" },
          { "name" : "teams", "deletion_info" : { "marked_deleted" : "2018-05-18T16:26:23.779723Z", "local_delete_time" : "2018-05-18T16:26:23Z" } },
          { "name" : "teams", "path" : [ "AA Drink - Leontien.nl" ], "value" : "" },
          { "name" : "teams", "path" : [ "Boels-Dolmans Cycling Team" ], "value" : "" },
          { "name" : "teams", "path" : [ "Team Garmin - Cervelo" ], "value" : "" }
        ]
      }
    ]
  }

 

ComplexColumn墓碑

当插入或更新集合类型列(例如集合,列表和映射)时,将生成ComplexColumn逻辑删除。

先前我们创建了 cyclist_career_teams表格。运行以下cqlsh 命令以将数据插入该表。

INSERT INTO cycling.cyclist_career_teams (
     id,
     lastname,
     teams)
     VALUES (cb07baad-eac8-4f65-b28a-bddc06a0de23, ARMITSTEAD, { 
     Boels-Dolmans Cycling Team,AA Drink - Leontien.nl,Team Garmin - Cervelo } );

查看此分区的sstabledump输出,在该分区上没有发生明显的手动删除,但是deletion_info在单元格级别上为collection type column列出了一个标记teams

{
    "partition" : {
      "key" : [ "cb07baad-eac8-4f65-b28a-bddc06a0de23" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 130,
        "liveness_info" : { "tstamp" : "2018-05-18T16:26:23.779724Z" },
        "cells" : [
          { "name" : "lastname", "value" : "ARMITSTEAD" },
          { "name" : "teams", "deletion_info" : { "marked_deleted" : "2018-05-18T16:26:23.779723Z", "local_delete_time" : "2018-05-18T16:26:23Z" } },
          { "name" : "teams", "path" : [ "AA Drink - Leontien.nl" ], "value" : "" },
          { "name" : "teams", "path" : [ "Boels-Dolmans Cycling Team" ], "value" : "" },
          { "name" : "teams", "path" : [ "Team Garmin - Cervelo" ], "value" : "" }
        ]
      }
    ]
  }

 

单元格墓碑   

null以下示例所示,当从单元格中明确删除一个值(例如分区的特定行的列)时,或在使用插入或更新单元格时,会生成单元格逻辑删除 

INSERT INTO cycling.rank_by_year_and_name (
     race_year,
     race_name,
     cyclist_name,
     rank)
     VALUES (2018, Giro d‘‘Italia - Stage 11 - Osimo > Imola, null, 1);

查看此分区的“ sstabledump”输出,deletion_info 逻辑删除标记与特定的单元关联。

{
    "partition" : {
      "key" : [ "2018", "Giro d‘Italia - Stage 11 - Osimo > Imola" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 80,
        "clustering" : [ 1 ],
        "liveness_info" : { "tstamp" : "2018-05-18T17:13:42.602827Z" },
        "cells" : [
          { "name" : "cyclist_name", "deletion_info" : { "local_delete_time" : "2018-05-18T17:13:42Z" }
          }
        ]
      }
    ]
  }

 

TTL墓碑

当TTL(生存时间)期满时,将生成TTL逻辑删除。TTL过期标记可以出现在行或单元格级别。但是,Cassandra标记的TTL数据与显式删除的逻辑删除数据不同。即使分区只有一行(没有聚类键),TTL标记仍会在行级别进行。

以下语句为整个行设置TTL。

INSERT INTO cycling.cyclist_career_teams (
     id,
     lastname,
     teams)
     VALUES (e7cd5752-bc0d-4157-a80f-7523add8dbcd, VAN DER BREGGEN, { 
     Rabobank-Liv Woman Cycling Team,Sengers Ladies Cycling Team,Team Flexpoint }) USING TTL 1;

以下语句为单个单元格设置TTL。

UPDATE cycling.rank_by_year_and_name USING TTL 1
  SET cyclist_name = Cloudy Archipelago WHERE race_year = 2018 AND 
  race_name = Giro d‘‘Italia - Stage 11 - Osimo > Imola AND rank = 1;

查看这些分区的sstabledump输出,第一个CQL语句"expired" : true在该liveness_info部分中使用TTL过期标记标记该行(分区键:e7cd5752-bc0d-4157-a80f-7523add8dbcd)

{
    "partition" : {
      "key" : [ "e7cd5752-bc0d-4157-a80f-7523add8dbcd" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 134,
        "liveness_info" : { "tstamp" : "2018-05-18T17:38:13.135226Z", "ttl" : 1, "expires_at" : "2018-05-18T17:38:14Z", "expired" : true },
        "cells" : [
          { "name" : "lastname", "value" : "VAN DER BREGGEN" },
          { "name" : "teams", "deletion_info" : { "marked_deleted" : "2018-05-18T17:38:13.135225Z", "local_delete_time" : "2018-05-18T17:38:13Z" } },
          { "name" : "teams", "path" : [ "Rabobank-Liv Woman Cycling Team" ], "value" : "" },
          { "name" : "teams", "path" : [ "Sengers Ladies Cycling Team" ], "value" : "" },
          { "name" : "teams", "path" : [ "Team Flexpoint" ], "value" : "" }
        ]
      }
    ]
  }

 

第二个CQL语句使用该单元格的"expired" : trueTTL过期标记标记该单元格(分区键:2018,聚类键:1,列名:cyclist_name)

{
    "partition" : {
      "key" : [ "2018", "Giro d‘Italia - Stage 11 - Osimo > Imola" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 95,
        "clustering" : [ 1 ],
        "cells" : [
          { "name" : "cyclist_name", "value" : "Cloudy Archipelago", "tstamp" : "2018-05-18T18:22:52.532855Z", "ttl" : 1, "expires_at" : "2018-05-18T18:22:53Z", "expired" : true }
        ]
      }
    ]
  }

 

 

 

 

 

以上是关于Cassandra 什么是墓碑的主要内容,如果未能解决你的问题,请参考以下文章

tssl在cassandra创建墓碑

在 cassandra 的地图中添加新值/更新现有值是不是会创建墓碑?

Cassandra 数据上的 TTL Remover

为啥墓碑桶可以用来插入?

LCS 上的主要压缩

如何使用Apache Flink阅读Cassandra?