06-Leveldb实现-sstable
Posted anda0109
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了06-Leveldb实现-sstable相关的知识,希望对你有一定的参考价值。
sstable文件格式如下:
<beginning_of_file>
[data block 1]
[data block 2]
...
[data block N]
[meta block 1]
...
[meta block K]
[metaindex block]
[index block]
[Footer] (fixed size; starts at file_size - sizeof(Footer))
<end_of_file>
文件包含了内部的指针. 每个指针称为一个BlockHandle,包含了如下信息:
offset: varint64
size: varint64
See varints for an explanation of varint64 format.
-
key/value键值对有序地存储在一系列的数据块(data block)中。 数据块从文件的开始存储,一个块挨着一个块。每个数据块通过
block_builder.cc
的方式进行格式化, 可选择地进行压缩。 -
在数据块之后存储元数据块(meta blocks)。支持的meta block类型描述如下。未来可能增加更多的类型。每个meta block也是通过block_builder.cc进行组织,可选择地进行压缩。
-
A “metaindex” block. It contains one entry for every other meta
block where the key is the name of the meta block and the value is a
BlockHandle pointing to that meta block. -
An “index” block. This block contains one entry per data block,
where the key is a string >= last key in that data block and before
the first key in the successive data block. The value is the
BlockHandle for the data block. -
At the very end of the file is a fixed length footer that contains
the BlockHandle of the metaindex and index blocks as well as a magic number.metaindex_handle: char[p]; // Block handle for metaindex index_handle: char[q]; // Block handle for index padding: char[40-p-q];// zeroed bytes to make fixed length // (40==2*BlockHandle::kMaxEncodedLength) magic: fixed64; // == 0xdb4775248b80fb57 (little-endian)
“filter” Meta Block
If a FilterPolicy
was specified when the database was opened, a
filter block is stored in each table. The “metaindex” block contains
an entry that maps from filter.<N>
to the BlockHandle for the filter
block where <N>
is the string returned by the filter policy’s
Name()
method.
The filter block stores a sequence of filters, where filter i contains
the output of FilterPolicy::CreateFilter()
on all keys that are stored
in a block whose file offset falls within the range
[ i*base ... (i+1)*base-1 ]
Currently, “base” is 2KB. So for example, if blocks X and Y start in
the range [ 0KB .. 2KB-1 ]
, all of the keys in X and Y will be
converted to a filter by calling FilterPolicy::CreateFilter()
, and the
resulting filter will be stored as the first filter in the filter
block.
The filter block is formatted as follows:
[filter 0]
[filter 1]
[filter 2]
...
[filter N-1]
[offset of filter 0] : 4 bytes
[offset of filter 1] : 4 bytes
[offset of filter 2] : 4 bytes
...
[offset of filter N-1] : 4 bytes
[offset of beginning of offset array] : 4 bytes
lg(base) : 1 byte
The offset array at the end of the filter block allows efficient
mapping from a data block offset to the corresponding filter.
“stats” Meta Block
This meta block contains a bunch of stats. The key is the name
of the statistic. The value contains the statistic.
TODO(postrelease): record following stats.
data size
index size
key size (uncompressed)
value size (uncompressed)
number of entries
number of data blocks
sstable构建的主要代码如下:
class LEVELDB_EXPORT TableBuilder
public:
// Create a builder that will store the contents of the table it is
// building in *file. Does not close the file. It is up to the
// caller to close the file after calling Finish().
TableBuilder(const Options& options, WritableFile* file);
TableBuilder(const TableBuilder&) = delete;
TableBuilder& operator=(const TableBuilder&) = delete;
// REQUIRES: Either Finish() or Abandon() has been called.
~TableBuilder();
// Change the options used by this builder. Note: only some of the
// option fields can be changed after construction. If a field is
// not allowed to change dynamically and its value in the structure
// passed to the constructor is different from its value in the
// structure passed to this method, this method will return an error
// without changing any fields.
Status ChangeOptions(const Options& options);
// Add key,value to the table being constructed.
// REQUIRES: key is after any previously added key according to comparator.
// REQUIRES: Finish(), Abandon() have not been called
void Add(const Slice& key, const Slice& value);
// Advanced operation: flush any buffered key/value pairs to file.
// Can be used to ensure that two adjacent entries never live in
// the same data block. Most clients should not need to use this method.
// REQUIRES: Finish(), Abandon() have not been called
void Flush();
// Return non-ok iff some error has been detected.
Status status() const;
// Finish building the table. Stops using the file passed to the
// constructor after this function returns.
// REQUIRES: Finish(), Abandon() have not been called
Status Finish();
// Indicate that the contents of this builder should be abandoned. Stops
// using the file passed to the constructor after this function returns.
// If the caller is not going to call Finish(), it must call Abandon()
// before destroying this builder.
// REQUIRES: Finish(), Abandon() have not been called
void Abandon();
// Number of calls to Add() so far.
uint64_t NumEntries() const;
// Size of the file generated so far. If invoked after a successful
// Finish() call, returns the size of the final generated file.
uint64_t FileSize() const;
private:
bool ok() const return status().ok();
void WriteBlock(BlockBuilder* block, BlockHandle* handle);
void WriteRawBlock(const Slice& data, CompressionType, BlockHandle* handle);
struct Rep;
Rep* rep_;
;
其中void Add(const Slice& key, const Slice& value)方法便是向其中添加key/value键值对。当table结束时调用Finish()方法完成sstable的构建,代码如下:
Status TableBuilder::Finish()
Rep* r = rep_;
Flush();// Write data block
assert(!r->closed);
r->closed = true;
BlockHandle filter_block_handle, metaindex_block_handle, index_block_handle;
// Write filter meta block
if (ok() && r->filter_block != nullptr)
WriteRawBlock(r->filter_block->Finish(), kNoCompression, &filter_block_handle);
// Write metaindex block
if (ok())
BlockBuilder meta_index_block(&r->options);
if (r->filter_block != nullptr)
// Add mapping from "filter.Name" to location of filter data
std::string key = "filter.";
key.append(r->options.filter_policy->Name());
std::string handle_encoding;
filter_block_handle.EncodeTo(&handle_encoding);
meta_index_block.Add(key, handle_encoding);
// TODO(postrelease): Add stats and other meta blocks
WriteBlock(&meta_index_block, &metaindex_block_handle);
// Write index block
if (ok())
if (r->pending_index_entry)
r->options.comparator->FindShortSuccessor(&r->last_key);
std::string handle_encoding;
r->pending_handle.EncodeTo(&handle_encoding);
r->index_block.Add(r->last_key, Slice(handle_encoding));
r->pending_index_entry = false;
WriteBlock(&r->index_block, &index_block_handle);
// Write footer
if (ok())
Footer footer;
footer.set_metaindex_handle(metaindex_block_handle);
footer.set_index_handle(index_block_handle);
std::string footer_encoding;
footer.EncodeTo(&footer_encoding);
r->status = r->file->Append(footer_encoding);
if (r->status.ok())
r->offset += footer_encoding.size();
return r->status;
可以看到在Finish函数中,我们首先写入data block(当然在前面Add方法中当一个data block达到32K大小时也会进行写入),然后写入filter meta block,紧接着写入meta index block,再写入index block,最后写入footer。一个sstable就构建完成了。
以上是关于06-Leveldb实现-sstable的主要内容,如果未能解决你的问题,请参考以下文章
LevelDB 源码剖析SSTable模块:SSTableBlock布隆过滤器LRU Cache