Oracle数据仓库参考架构
Posted dingdingfish
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Oracle数据仓库参考架构相关的知识,希望对你有一定的参考价值。
基础概念
Introduction to Data Warehousing Concepts一文提到了ODS的概念:
Operational data stores exist to support daily operations. The ODS data is cleaned and validated, but it is not historically deep: it may be just the data for the current day. Rather than support the historically rich queries that a data warehouse can handle, the ODS gives data warehouses a place to get access to the most current data, which has not yet been loaded into the data warehouse. The ODS may also be used as a source to load the data warehouse. As data warehousing loading techniques have become more advanced, data warehouses may have less need for ODS as a source for loading data. Instead, constant trickle-feed systems can load the data warehouse in near real time.
ODS和DW的区别,参考这里
数仓与OLTP的对比:
极简架构:
带Staging Area的数仓架构:
Staging area的定义是:A place where data is processed before entering the warehouse. A staging area simplifies building summaries and general warehouse management.
带数据集市的数据仓库:
[外链图片转存中…(img-VsACgoc5-1629855324501)]
架构图
中文版,来自 通过实用数据仓库参考架构实现普及化 BI
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-HdFKmtlx-1629856161142)(https://obibb.files.wordpress.com/2014/06/oracle-information-management-e28093-logical-view1.png#pic_center)]
在论文Oracle Database In-Memory - a game changer for Data
Warehousing? 中的架构图:
层次解释
Staging Data Layer (临时数据层)
该层在数据进入数据仓库之前充当数据操作的临时存储区域。
暂存数据层是应用业务规则来实现这些目标(数据清洗,数据质量等)的地方。拒绝的数据保留在该层中,以供手动或自动更正。
ODS(Operational Data Store)也可以放到这一层,参考Feed Data to the Foundation Layer of Oracle Communications Data Model
The other use cases where this option should be chosen is also for Real-time feed (of Oracle Communications Data Model) for operational reporting and/or when the foundation layer of Oracle Communications Data Model is used as Operational Data Store.
以及这里:
In such cases, the staging area can be used as a real-time Operational Data Store, at least for the source concerned, and aggregation could run directly from the Operational Data Store (operational system) to the Access layer, or to the presentation layer in specific cases.
这一层也称为数据准备区,参见Oracle® Database Data Warehousing Guide 21c:
In a typical data warehouse, data preparation consists of extracting the data from
one or more sources, cleansing, and formatting it for consistency, and
transforming into the data warehouse schema. The data preparation area is called
the staging area and the base tables in a data warehouse are loaded from the
tables in the staging area. The synchronous refresh method fits into this model
because it allows you to load change data into the staging logs.
此文档也提到了:
As noted earlier, the modern approach to data warehousing does not pit star schemas and 3NF against each other. Rather, both techniques are used, with a foundation layer of 3NF - the Enterprise Data Warehouse of 3NF, acting as the bedrock data, and star schemas as a central part of an access and performance optimization layer.
如前所述,现代数据仓库方法不会使星型模式和 3NF 相互对抗。 相反,这两种技术都被使用,3NF 的基础层——3NF 的企业数据仓库,充当基石数据,星型模式作为访问和性能优化层的核心部分。
Foundation Data Layer (基础数据层)
基础数据层有时也称为原子数据层。顾名思义,该层以尽可能低的粒度级别记录数据。它代表数据仓库的核心,是负责长期管理数据的层。
基础数据层以接近第三范式 (3NF) 的规范化方式建模,以提高存储效率。
数据也以业务中立的方式记录。这消除了业务变化的影响,并避免了任何不必要的数据重组。
这里提到了一个COTS (commercial, off the shelf) 的概念,表示商业的,现成的。
如何理解业务中立或流程中立?参考这里。
Access and Performance Layer (访问与性能层)
用于改进基础数据层中数据管理的规范化 3NF 模型不一定是为用户提供数据访问权限的最佳方式,因为它更难以导航。因此,访问和性能层用于改进架构中的信息访问。
然而,最重要的是认识到用于访问数据仓库的工具会随着时间的推移而改变。您可能几乎无法控制特定业务部门采用哪些工具 - 我们可能都希望情况并非如此。由于新工具可能会对数据的结构方式施加不同的要求,因此我们必须能够从底层数据池中创建(或重新创建)这些结构,这一点至关重要。将数据重组为不同表示形式的能力,无论是逻辑上还是物理上,是访问和性能层的基本目的。
从概念上讲,这是数据子集的面向主题的表示,以简化业务分析,同时保留公共维度——仅此而已。视图或第二层聚合结构中的物理实现通常是一个实现细节。它们是根据需要创建的,以方便特定模块或工具集的访问。 Oracle 提供的所有聚合结构都可以从标准 SQL 访问,并且可以添加或需要,从而为最终用户和应用程序带来透明的性能改进。
访问数据层结构的填充和更新可以高度自动化,可以使用物化视图来跟踪源中的数据更改,或者通过将 ETL 的范围扩展到 ETL 内过程。
Oracle架构中心的数仓架构
Enterprise data warehousing - an integrated data lake example
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-lYhapHD3-1629854850953)(https://docs.oracle.com/en/solutions/oci-curated-analysis/img/analysis-enterprise-and-streamed-data-architecture.png#pic_center)]
第三方解读
INTRODUCTION TO ORACLE DATA WAREHOUSE REFERENCE ARCHITECTURE
数仓的目的,load the data, manage the data and provide access to the data。
又提到数仓的两种构建方式:Kimball vs. Inmon Data Warehouse Architectures
参考这里:https://www.geeksforgeeks.org/difference-between-kimball-and-inmon/
ORACLE DATA WAREHOUSE REFERENCE ARCHITECTURE – FOUNDATION LAYER
这一层是数仓的核心。设计原则为:
- Process Neutral
- 3NF or Normalized
- Enterprise View
- History
参考
- https://www.oracle.com/assets/oracle-wp-big-data-refarch-2019930.pdf
- https://greatobi.wordpress.com/2011/04/11/oracle%E2%80%99s-data-warehouse-reference-architecture/
- https://docs.oracle.com/en/database/oracle/oracle-database/19/dwhsg/data-warehouse-optimizations-techniques.html#GUID-C98E4A18-20EE-4C95-A18E-3811BB714D01 数据仓库优化技术,可以看下索引和并行执行
- Best Practices for Real-time Data Warehousing
-
- https://www.rittmanmead.com/blog/2009/07/drilling-down-in-the-oracle-next-generation-reference-dw-architecture/
以上是关于Oracle数据仓库参考架构的主要内容,如果未能解决你的问题,请参考以下文章