Data + AI Summit 2022 PPT 下载

Posted 过往记忆

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Data + AI Summit 2022 PPT 下载相关的知识,希望对你有一定的参考价值。

Data + AI Summit 2022 于2022年06月27日至30日举行。本次会议是在旧金山进行,中国的小伙伴是可以在线收听的,一共为期四天,第一天是培训,后面几天才是正式会议。本次会议有超过200个议题,演讲嘉宾包括业界、研究和学术界的专家,本次会议主要分为六大块:

数据分析, BI 以及可视化:了解最新的数据分析、BI 和可视化技术以及客户和社区的解决方案。•数据工程:从实现数据管道到管理数据质量、ETL和数据质量框架再到数据 ops,深入了解最新的数据工程知识。•Data Lakes, Data Warehouses and Data Lakehouses:了解数据湖和数据仓库演变为 Data Lakehouses 背后的概念和最佳实践;•数据科学, 机器学习以及 MLOps:了解关于生产数据科学和机器学习管道的技术和最佳实践。•数据安全和治理:•学术研究:致力于学术和先进的工业研究领域,包括大规模调度程序,图表,数据分析和机器学习系统。

会议的全部日程请参见:https://databricks.com/dataaisummit/agenda

如果想及时了解Spark、Hadoop或者HBase相关的文章,欢迎关注微信公众号:过往记忆大数据

本次会议的超清视频已经在前几天分享给大家了,需要的同学可以到 《Data + AI Summit 2022 超清视频下载》获取下载链接。本文主要收集了本次会议的 PPT,需要的同学可以获取。

超清 PPT 下载途径

目前可以获取到的 PPT 主要有 170 个左右,关注微信公众号 过往记忆大数据 或者 Java与大数据架构

•回复 10189 获取 Data + AI Summit 2022 超清 PPT;•回复 10187 获取 Data + AI Summit 2022 超清 视频。

推荐观看的议题

由于 Data + AI Summit 2022 会议的议题比较多,不一定都感兴趣,所以这块我给大家整理出十几个比较干的议题,推荐大家观看:

•Apache Spark SQL Aggregate Improvement at Meta (Facebook)

•Recent Parquet Improvements in Apache Spark

•Spark Data Source V2 Performance Improvement: Aggregate Push Down

•Deep Dive into the New Features of Apache Spark 3.2 and 3.3

•Managing Straggler Executors at Apache Spark 3.3

•Apache Spark on Kubernetes—Lessons Learned from Launching Millions of Spark Executors

•PySpark in Apache Spark 3.3 and Beyond

•Delta Lake 2.0 Overview

•Improving Interactive Querying Experience on Spark SQL

•Moving from Apache Spark 2 to Apache Spark 3: Spark Version Upgrade at Scale in Pinterest

•Radical Speed on the Lakehouses: Photon under the hood

•Deep-Dive into Delta Lake

•Presto 101: An Introduction to Open Source Presto

•Apache Spark AQE SkewedJoin Optimization and Practice in ByteDance

•Advanced Migrations: From Hive to SparkSQL

•Presto On Spark: A Unified SQL Experience

可下载 PPT 的议题

本次可下载视频的议题共 170 个。

•A Modern Approach to Big Data for Finance•A Practitioner's Guide to Unity Catalog A Technical Deep Dive•AI Fueled Forecasting The Next Generation of Financial Planning•AI powered Assortment Planning Solution•ALaSpark Gousto Recipe for Building Scalable PySpark Pipelines•Accelerating the Pace of Autism Diagnosis with Machine Learning Models•Achieve Machine Learning Hyper Productivity with Transformers and Hugging Face•Administrator Best Practices and Tips for Future Proofing your Databricks Account•Advanced Migrations From Hive to SparkSQL•Adversarial Drifts, Model Monitoring, and Feedback Loops Building Human in the Loop Machine Learning Systems for Content Moderation•Agile Data Engineering Reliability and Continuous Delivery at Scale•Amgen’s Journey To Building a Global 360 View of its Customers with the Lakehouse•An Advanced S3 Connector for Spark to Hunt for Cyber Attacks•Apache Arrow Flight SQL High Performance, Simplicity, and Interoperability for Data Transfers•Apache Spark SQL Aggregate Improvement at Meta (Facebook)•Apache Spark on Kubernetes—Lessons Learned from Launching Millions of Spark Executors•Apache Spark AQE SkewedJoin Optimization and Practice in ByteDance•Applied Predictive Maintenance in Aviation Without Sensor Data•Auto Encoder Decoder Based Anomaly Detection with the Lakehouse Paradigm•Automate Your Delta Lake or Practical Insights on Building Distributed Data Mesh•Automating Model Lifecycle Orchestration with Jenkins•Automating Business Decisions Using Event Streams•Backfill Streaming Data Pipelines in Kappa Architecture•Best Practices of Maintaining High Quality Data•Big Data in the Age of Moneyball•Build an Enterprise Lakehouse for Free with Trino and Delta Lake•Building Enterprise Scale Data and Analytics Platforms at Amgen•Building Metadata and Lineage Driven Pipelines on Kubernetes•Building Production Ready Recommender Systems with Feature Stores•Building a Data Science as a Service platform in Azure with Databricks•Building a Lakehouse for Data Science at DoorDash•Building an Analytics Lakehouse at Grab•Building and Scaling Machine Learning Based Products in the World's Largest Brewery•Building Spatial Applications with Apache Spark and CARTO•Building an Operational Machine Learning Organization from Zero and Leveraging ML for Crypto Security•Case Study in Rearchitecting an On Premises Pipeline in the Cloud•Challenges in Time Series Forecasting•Chaos Engineering in the World of Large Scale Complex Data Flow•Cloud Native Geospatial Analytics at JLL•Cloud and Data Science Modernization of Veterans Affairs Financial Service Center with Azure Databricks•Computational Data Governance at Scale•Connecting the Dots with DataHub Lakehouse and Beyond•Coral and Transport Portable SQL and UDFs for the Interoperability of Spark and Other Engines•Correlation Over Causation Cracking the Relationship Between User Engagement and User Happiness•Customer centric Innovation to Scale Data AI Everywhere•Cutting the Edge in Fighting Cybercrime Reverse Engineering a Search Language to Cross Compile it to PySpark•DBA Perspective Optimizing Performance Table by Table•Data Boards A Collaborative and Interactive Space for Data Science•Data Centric Principles for AI Engineering•Data Lakehouse and Data Mesh Two Sides of the Same Coin•DataFusion and Arrow Supercharge Your Data Analytical Tool with a Rusty Query Engine•Databricks Meets Power BI•Databricks and Enterprise Observability with Overwatch•Deep Dive into Delta Lake•Deep Dive into the New Features of Apache Spark•Delta Lake Overview•Delta Sharing A New Paradigm for Secure Data Sharing and Data Collaboration on Lakehouse•Democratizing Metrics at Airbnb•Designing Better MLOps Systems•Destination Lakehouse All Your Data Analytics and AI on One Platform•Detecting Financial Crime Using an Azure Advanced Analytics Platform and MLOps Approach•Discover Data Lakehouse With End to End Lineage•Disrupting the Prescription Drug Market with AI and Data•Distributed Machine Learning at Lyft•Doubling the Capacity of the Data Platform Without Doubling the Cost•Elixir The Wickedly Awesome Batch and Stream Processing Language You Should Have in Your Toolbox•Embedding Privacy by Design Into Data Infrastructure Through Open Source Extensible Tooling•Enable Production ML with Databricks Feature Store•Enabling BI in a Lakehouse Environment•Enabling Learning on Confidential Data•Ensuring Correct Distributed Writes to Delta Lake in Rust with Formal Verification•Evolution of Data Architectures and How to Build a Lakehouse•Fugue Tune Distributed Hybrid Hyperparameter Tuning•FutureMetrics Using Deep Learning to Create a Multivariate Time Series Forecasting Platform for Economic Strategic Planning•GIS Pipeline Acceleration with Apache Sedona•Git for Data Lakes How lakeFS Scales Data Versioning to Billions of Objects•Hassle Free Data Ingestion into the Lakehouse•How to Automate the Modernization and Migration of Your Data Warehousing Workloads to Databricks Lakehouse•How EPRI Uses Computer Vision to Mitigate Wildfire Risks for Electric Utilities•How Robinhood Built a Streaming Lakehouse to Bring Data Freshness from 24h to Less Than 15 Mins•How To Make Apache Spark on Kubernetes Run Reliably on Spot Instances•How To Use Databricks SQL for Analytics on Your Lakehouse•How socat and UNIX Pipes Can Help Data Integration•How the Largest County in the US is Transforming Hiring with a Modern Data Lakehouse•How to Build a Complete Security and Governance Solution Using Unity Catalog•How to Implement a Semantic Layer for Your Lakehouse•Implementing Data Governance 3.0 for the Lakehouse Era Community Led and Bottom Up•Implementing a Framework for Data Security and Policy at a Large Public Sector Agency•Implementing an End to End Demand Forecasting Solution Through Databricks and MLflow•Improving Apache Spark Structured Streaming Application Processing Time•Improving Interactive Querying Experience on Spark SQL•Improving patient care with Databricks•Ingesting data into Lakehouse with COPY INTO•Integrating Apache Superset into a B2B Platform Why and How•Introducing Zipline An Open Source Feature Engineering Platform•Learn to Efficiently Test ETL Pipelines•Lessons Learned from Deidentifying 700 Million Patient Notes•Low Code Machine Learning on Databricks with AutoML•MLOps at DoorDash•MLflow Pipelines Accelerating MLOps from Development to Production•Mapping Data Quality Concerns to Data Lake Zones•Meshing About with Databricks•Migrate and Modernize your Data Platform with Confluent and Databricks•Migrating Complex SAS Processes to Databricks Case Study•Monitoring and Quality Assurance of Complex ML Deployments via Assertions•Mosaic A Framework for Geospatial Analytics at Scale•Multimodal Deep Learning Applied to E commerce Big Data•Near Real Time Analytics with Event Streaming, Live Tables, and Delta Sharing•Obfuscating Sensitive Information from Spark UI and Logs•Open Source Powers the Modern Data Stack•Opening the Floodgates Enabling Fast Unmediated End User Access to Trillion Row Datasets with SQL Data Warehouses•Optimizing Speed and Scale of User Facing Analytics Using Apache Kafka and Pinot•Polars Blazingly Fast DataFrames in Rust and Python•Power to the SQL People Python UDFs in DBSQL•Powering Up the Business with a Lakehouse•Practical Data Governance in a Large Scale Databricks Environment•Predicting Repeat Admissions to Substance Abuse Treatment with Machine Learning•Presto On Spark A Unified SQL Experience•Privacy Preserving Machine Learning and Big Data Analytics Using Apache Spark•Productionizing Ethical Credit Scoring Systems with Delta Lake, Feature Store and MLFlow•Protecting Personally Identifiable Information (PII) PHI Data in Data Lake via Column Level Encryption•PySpark in Apache Spark 3.3 and Beyond•Radical Speed on the Lakehouse Photon Under the Hood•Real Time Search and Recommendation at Scale Using Embeddings and Hopsworks•Real Time Cost Reduction Monitoring and Alerting•Realize the Promise of Streaming with the Databricks Lakehouse Platform•Recent Parquet Improvements in Apache Spark•Rethinking Orchestration as Reconciliation Software Defined Assets in Dagster•Running a Low Cost, Versatile Data Management Ecosystem with Apache Spark at Core•Scalable XGBoost on GPU Clusters•Scaling AI Workloads with the Ray Ecosystem•Scaling Your Workloads with Databricks Serverless•Scaling Deep Learning on Databricks•Scaling ML at CashApp with Tecton•Scaling Privacy Practical Architectures and Experiences•Security Best Practices for Lakehouse•Self Serve Automated and Robust CDC pipeline using AWS DMS DynamoDB Streams and Databricks Delta•Serverless Kafka and Apache Spark in a Multi Cloud Data Lakehouse Architecture•Serving Near Real Time Features at Scale•Setting up On Shelf Availability Alerts at Scale with Databricks and Azure•Simplify Global DataOps and MLOps Using Oktas FIG Automation Library•Simplifying Migrations to Lakehouse—the Databricks Way•Smart Manufacturing Real time Process Optimization with Databricks•So Fresh and So Clean Learn How to Build Real Time Warehouses on Lakehouse•Sound Data Engineering in Rust From Bits to DataFrames•Spark Data Source V2 Performance Improvement Aggregate Push Down•Spark Inception Exploiting the Apache Spark REPL to Build Streaming Notebooks•Spline Central Data Lineage Tracking Not Only For Spark•State of the Art Natural Language Processing with Apache Spark NLP•Streaming ML Enrichment Framework Using Advanced Delta Table Features•Survey of Production ML Tech Stacks•Technical and Tactical Football Analysis Through Data•The Databricks Notebook Front Door of the Lakehouse•The Modern Metadata Platform What Why and How•The Road to a Robust Data Lake 0•The Semantics of Biology Vaccine and Drug Research with Knowledge Graphs and Logical Inferencing on Apache Spark teblog.pdf•Time Series Forecasting with PyCaret•Tools for Assisted Apache Spark Version Migrations, From 2.1 to 3.2+•Towards Dynamic Microstructure The Role of Machine Learning in the Next Generation of Exchanges•Turning Big Biology Data into Insights on Disease The Power of Circulating Biomarkers•Turning Fan Data Into an Asset•UIMeta A 10X Faster Cloud Native Spark History Server•Unifying Data Science and Business•Vision AI Animal Health Industry Use Cases Using Databricks on Azure•What to Do When Your Job Goes OOM in the Night Flowcharts•X FIPE eXtended Feature Impact for Prediction Explanation•You Have BI Now What Activate Your Data•dbt Machine Learning What Makes a Great Baton Pass•dbt and Python Better Together

以上是关于Data + AI Summit 2022 PPT 下载的主要内容,如果未能解决你的问题,请参考以下文章

Data + AI Summit 2021 全部超清 PPT 下载

Data + AI Summit 2022 超清视频下载

DATA AI Summit 2022提及到的对 aggregate 的优化

DATA AI Summit 2022提及到的对 aggregate 的优化

WAVE SUMMIT+2022明日开场,六大看点不容错过!

WAVE SUMMIT+2022明日开场,六大看点不容错过!