如何使用Cassandra来存储time-series类型的数据

Posted 2020-09-07 一直问

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了如何使用Cassandra来存储time-series类型的数据相关的知识，希望对你有一定的参考价值。

Cassandra非常适合存储时序类型的数据，本文我们将使用一个气象站的例子，该气象站每分钟需要存储一条温度数据。

一、方案1，每个设备占用一行

这个方案的思路就是给每个数据源创建一行，比如这里一个气象站的温度就占用一行，然后每个分钟要采集一个温度，那么就让每个时刻的时标将作为列名，而温度值就是列值。

（1）创建表的语句如下：

CREATE TABLE temperature (

weatherstation_id text,

event_time timestamp,

temperature text,

PRIMARY KEY (weatherstation_id,event_time) );

（2）然后插入如下数据。

INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES (‘1234ABCD‘,‘2013-04-03 07:01:00‘,‘72F‘);

INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES (‘1234ABCD‘,‘2013-04-03 07:02:00‘,‘73F‘);

INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES (‘1234ABCD‘,‘2013-04-03 07:03:00‘,‘73F‘);

INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES (‘1234ABCD‘,‘2013-04-03 07:04:00‘,‘74F‘);

（3）如果要查询这个气象站的所有数据，则如下

SELECT event_time,temperature FROM temperature WHERE weatherstation_id=‘1234ABCD‘;

（4）如果要查询某个时间范围的数据，则如下：

SELECT temperature FROM temperature WHERE weatherstation_id=‘1234ABCD‘ AND event_time > ‘2013-04-03 07:01:00‘

二、方案2，每个设备的每天的数据占用一行

有时候把一个设备的所有数据存储在一行可能有点困难，比如放不下（这种情况应该很少见），此时我们就可以对上一个方案做拆分，在row key中增加一个表示，比如可以限制把每个设备每一天的数据放在单独一行，这样一行的数量大小就可控了。

（1）创建表

CREATE TABLE temperature_by_day (

weatherstation_id text,

date text,

event_time timestamp,

temperature text,

PRIMARY KEY ((weatherstation_id,date),event_time) );

（2）插入数据

INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature) VALUES (‘1234ABCD‘,‘2013-04-03‘,‘2013-04-03 07:01:00‘,‘72F‘);

INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature) VALUES (‘1234ABCD‘,‘2013-04-03‘,‘2013-04-03 07:02:00‘,‘73F‘);

INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature) VALUES (‘1234ABCD‘,‘2013-04-04‘,‘2013-04-04 07:01:00‘,‘73F‘);

INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature) VALUES (‘1234ABCD‘,‘2013-04-04‘,‘2013-04-04 07:02:00‘,‘74F‘);

（3）查询某个设备某一天的数据

SELECT * FROM temperature_by_day WHERE weatherstation_id=‘1234ABCD‘ AND date=‘2013-04-03‘;

三、方案3，存储带时效性的数据，过期就自动删除

对于时序的数据的另外一种典型应用就是要做循环存储，想象一下，比如我们要在一个dashboard展示最新的10条温度数据，老的数据就没用了，可以不用理会。如果使用其他的数据库，我们往往需要设置一个后台的job去对历史数据做定时清理，我们现在使用pg的时候就是这么干的。但是使用Cassandra，我们可以使用Cassandra的一个叫做过期列（expiring colmn）的新特性，只要超过指定的时间，这个列就自动消失了。

（1）创建表

CREATE TABLE latest_temperatures (

weatherstation_id text,

event_time timestamp,

temperature text,

PRIMARY KEY (weatherstation_id,event_time),

) WITH CLUSTERING ORDER BY (event_time DESC);

（2）插入数据

INSERT INTO latest_temperatures(weatherstation_id,event_time,temperature) VALUES (‘1234ABCD‘,‘2013-04-03 07:03:00‘,‘72F‘) USING TTL 20;

INSERT INTO latest_temperatures(weatherstation_id,event_time,temperature) VALUES (‘1234ABCD‘,‘2013-04-03 07:02:00‘,‘73F‘) USING TTL 20;

INSERT INTO latest_temperatures(weatherstation_id,event_time,temperature) VALUES (‘1234ABCD‘,‘2013-04-03 07:01:00‘,‘73F‘) USING TTL 20;

INSERT INTO latest_temperatures(weatherstation_id,event_time,temperature) VALUES (‘1234ABCD‘,‘2013-04-03 07:04:00‘,‘74F‘) USING TTL 20;