Return to book
Review this book
About the author
Introduction
1. 最佳实践
- 1.1. 避免使用 GroupByKey
- 1.2. 不要将大型 RDD 的所有元素拷贝到请求驱动者
2. 常规故障处理
3. 性能 & 优化
- 3.1. 一个 RDD 有多少个分区
- 3.2. 数据本地性
4. Spark Streaming
- 4.1. ERROR OneForOneStrategy

Powered by GitBook

Databricks Spark Knowledge Base ZH-CN

最佳实践

避免使用 GroupByKey
勿在大型 RDD 上直接调用 collect