那些年我们一起看过的经典大数据论文
Posted @SmartSi
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了那些年我们一起看过的经典大数据论文相关的知识,希望对你有一定的参考价值。
1. Stream
1.1 基础
- One SQL to Rule Them All: An Efficient and Syntactically Idiomatic Approach to Management of Streams and Tables
- The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing
- Monitoring Streams – A New Class of Data Management Applications
- Exploiting Punctuation Semantics in Continuous Data Streams
- STREAM: The Stanford Data Stream Management System
- The 8 Requirements of Real-Time Stream Processing
- The Design of the Borealis Stream Processing Engine
- High-Availability Algorithms for Distributed Stream Processing
- A Cooperative, Self-Configuring High-Availability Solution for Stream Processing∗
- Out-of-Order Processing: A New Architecture for HighPerformance Stream Systems
- Fast and Highly-Available Stream Processing over Wide Area Networks
- S4: Distributed Stream Computing Platform
- Discretized Streams: Fault-Tolerant Streaming Computation at Scale
- MillWheel: Fault-Tolerant Stream Processing at Internet Scale
- Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management
- Trill: A High-Performance Incremental Query Processor for Diverse Analytics
- Summingbird: A Framework for Integrating Batch and Online MapReduce Computations
- Drizzle: Fast and Adaptable Stream Processing at Scale
- Realtime Data Processing at Facebook
1.2 Spark
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
- Spark: Cluster Computing with Working Sets
1.3 Flink
- Apache Flink™: Stream and Batch Processing in a Single Engine
- Lightweight Asynchronous Snapshots for Distributed Dataflows
- State Management in Apache Flink
1.4 Storm
2. Hadoop
- MapReduce: Simplified Data Processing on Large Clusters
- Bigtable: A Distributed Storage System for Structured Data
- SQL-on-Hadoop: Full Circle Back to Shared-Nothing Database Architectures
- Hive – A Petabyte Scale Data Warehouse Using Hadoop
3. OLAP
4. MQ
5. 存储
- The Log-Structured Merge-Tree
- The Google File System
- RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems
6. 数据湖
7. 其他
- In Search of an Understandable Consensus Algorithm
- Distributed Snapshots: Determining Global States of Distributed Systems
- Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources
- Encoded Bitmap Indexing for Data Warehouses
以上是关于那些年我们一起看过的经典大数据论文的主要内容,如果未能解决你的问题,请参考以下文章