Spark相关学习链接(持续更新)
2017-08-20 本文已影响72人
分裂四人组
Spark
- 向Spark1.6开炮:问题总结与踩坑: http://www.tuicool.com/articles/2U36Zb
- Spark Summit 2017 2月份: https://spark-summit.org/east-2017/schedule/
- Trends for Big Data and Apache Spark in 2017 by Matei Zaharia: https://www.slideshare.net/SparkSummit/trends-for-big-data-and-apache-spark-in-2017-by-matei-zaharia
- Spark Streaming 源码解析:https://github.com/lw-lin/CoolplaySpark/tree/master/Spark%20Streaming%20源码解析系列
- Spark RDD/DataSet/DataFrame使用场景: https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html
- Spark配置文件说明:http://spark.apache.org/docs/latest/configuration.html
- Spark Sorting性能显著的原因(包含几个很重要的issue):https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html
- 关于Spark Shuffle非常不错的一篇文章:https://0x0fff.com/spark-architecture-shuffle/
- Yahoo关于Spark所做的优化:https://spark-summit.org/2013/wp-content/uploads/2013/10/Li-AEX-Spark-yahoo.pdf
- Databricks Spark操作手册:https://docs.databricks.com/spark/latest/data-sources/sql-databases.html
- spark内存管理:https://wongxingjun.github.io/2016/05/26/Spark内存管理/
TODO:
- Spark广告点击预测:http://lxw1234.com/archives/2016/01/595.htm
- Spark相关问题积累:http://blog.leanote.com/post/anglema/总结-spark-问题
Flink
- Flink Scheduler: http://chenyuzhao.me/2017/02/09/flink-scheduler/
- Apache Flink:特性、概念、组件栈、架构及原理分析: http://shiyanjun.cn/archives/1508.html
- Flink 原理与实现:架构和拓扑概览: http://wuchong.me/blog/2016/05/03/flink-internals-overview/
- ResourceManager源码分析:http://zengzhaozheng.blog.51cto.com/8219051/1438204
Java
- Java高编译低运行错误: http://www.jianshu.com/p/f4996b1ccf2f
Hadoop
- ResourceManager高可用配置: http://hadoop.apache.org/docs/r2.7.3/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html
- ResourceManager架构、原理:http://www.jianshu.com/p/626b66fa65db
- HDFS ZKFailOverController原理:http://blog.csdn.net/zkq_1986/article/details/54952738
- HadoopYarn 内存资源隔离实现原理:http://blog.csdn.net/a860mhz/article/details/50618555
- Yarn集群CGroup配置:http://www.jianshu.com/p/e283ab7e2530
- HDFS Recovery Processes:http://blog.cloudera.com/blog/2015/03/understanding-hdfs-recovery-processes-part-2/
Git
分享
- InfoQ的大牛分享: http://www.infoq.com/cn/netease/presentations/
HBase
- HBase Snaptshot流程详解:http://www.cnblogs.com/foxmailed/p/3914117.html
Mysql
- Mysql备份总结:http://www.cnblogs.com/liangshaoye/p/5464794.html
- Mysql BinLog概念:http://blog.csdn.net/wyzxg/article/details/7412777
- 数据库隔离级别论文:http://www.cs.umb.edu/~poneil/iso.pdf
- Mysql RedoLog以及Recover:http://www.cnblogs.com/liuhao/p/3714012.html
- 隔离级别介绍:http://blog.csdn.net/qq_33290787/article/details/51924963
Maven
- Maven Plugins: http://maven.apache.org/plugins/index.html
Kafka
- Kafka配置选项: http://blog.csdn.net/vegetable_bird_001/article/details/51858915
- Kafka性能参数与压力调优:http://blog.csdn.net/stark_summer/article/details/50203133
Hive
- Hive执行过程原理: http://tech.meituan.com/hive-sql-to-mapreduce.html
- 一起学Hive: http://lxw1234.com/archives/2015/09/476.htm