[PYTHON] Apache Spark Starter Kits
Target
Those who don't know where to start to do Apache Spark.
Here are some links related to Apache Spark. I mainly speak English. The Edx course is highly recommended. It is very easy to understand because it is explained in the video and you will learn by actually writing the code in Python.
I will keep you updated! Please comment if you have any other good resources.
Originator
- http://spark.apache.org/
- Quick start
- https://spark.apache.org/docs/latest/quick-start.html
Overview
- Stanford CS347
- http://www.cs.berkeley.edu/~rxin/talks/2015-05-18_cs347-stanford.pdf
Compile and Run Example
- http://qiita.com/giwa/items/d701ad1f9bda42654093
This post is 1.4, but 1.5 should be the same.
Edx
Introduction to Big Data with Apache Spark
https://www.edx.org/course/introduction-big-data-apache-spark-uc-berkeleyx-cs100-1x
Scalable Machine Learning
https://www.edx.org/course/scalable-machine-learning-uc-berkeleyx-cs190-1x
Bigdata university
- http://bigdatauniversity.com/bdu-wp/bdu-course/spark-fundamentals/
- http://bigdatauniversity.com/bdu-wp/bdu-course/spark-fundamentals-ii/
Papers
- RDD
- http://people.csail.mit.edu/matei/papers/2012/nsdi_spark.pdf
- Shark(Spark SQL)
- http://people.csail.mit.edu/matei/papers/2013/sigmod_shark.pdf
- Spark (1.4) profile
- https://www.usenix.org/system/files/conference/nsdi15/nsdi15-paper-ousterhout.pdf
- Spark streaming
- http://people.csail.mit.edu/matei/papers/2012/hotcloud_spark_streaming.pdf
- http://people.csail.mit.edu/matei/papers/2013/sosp_spark_streaming.pdf
Slide share of Japanese companies (NTT people are a lot.
- http://www.slideshare.net/hadoopxnttdata/apache-spark
- http://www.slideshare.net/hadoopxnttdata/apache-spark-spark
- http://www.slideshare.net/hadoopxnttdata/hadoop-14006572
- http://www.slideshare.net/hadoopxnttdata/hadoop-ecosystem-nttdata-osc15tk
- http://www.slideshare.net/hadoopxnttdata/hadoopsiliconvalleytechbusinessmeetup
- http://www.slideshare.net/taroleo/spark-internal-hadoop-source-code-reading-16-in-japan
Books
- Learning Spark
- http://shop.oreilly.com/product/0636920028512.do
- Advanced Analytics with Spark
- http://shop.oreilly.com/product/0636920035091.do
- Spark in Action (in progress)
- https://www.manning.com/books/spark-in-action
--Introduction to Apache Spark The latest parallel distributed processing framework to learn by moving (released on October 20, 2015 from NTT DATA)
- http://www.amazon.co.jp/Apache-Spark%E5%85%A5%E9%96%80-%E5%8B%95%E3%81%8B%E3%81%97%E3%81%A6%E5%AD%A6%E3%81%B6%E6%9C%80%E6%96%B0%E4%B8%A6%E5%88%97%E5%88%86%E6%95%A3%E5%87%A6%E7%90%86%E3%83%95%E3%83%AC%E3%83%BC%E3%83%A0%E3%83%AF%E3%83%BC%E3%82%AF-NEXT-ONE-%E6%A0%AA%E5%BC%8F%E4%BC%9A%E7%A4%BENTT%E3%83%87%E3%83%BC%E3%82%BF/dp/4798142662/ref=pd_rhf_ee_s_cp_30?ie=UTF8&refRID=18NG6YKBRET078VK7FRK
If you look for it, you will find various things, but what about it? I haven't read the following yet.
-
Spark cookbook
- https://www.packtpub.com/big-data-and-business-intelligence/spark-cookbook
-
Fast dataprocessing with Spark
- https://www.packtpub.com/big-data-and-business-intelligence/fast-data-processing-spark-second-edition
-
Machine Learning with Spark
- https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-spark
-
Apache Spark Graph processing
- https://www.packtpub.com/big-data-and-business-intelligence/apache-spark-graph-processing
-
Mastering Apache Spark
- https://www.packtpub.com/big-data-and-business-intelligence/mastering-apache-spark
Spark summit
- https://spark-summit.org/2013/
- https://spark-summit.org/2014/
- https://spark-summit.org/2015/
- https://spark-summit.org/east-2015/
- https://spark-summit.org/the-spark-spot/
Others
- AMP Lab, where spark was invented
- https://amplab.cs.berkeley.edu/
- Reynold personal page (Spark PMC)
- http://www.cs.berkeley.edu/~rxin/
- Matei Zaharia personal page (Spark inventor, PMC, CEO of Databricks and Associate Professor in MIT)
- http://people.csail.mit.edu/matei/
- MLbase, base of Spark MLlib
- http://mlbase.org/
Meetup in Japan
- http://connpass.com/event/8465/
- http://cloudera.connpass.com/event/18857/
- http://www.meetup.com/Tokyo-Spark-Meetup/
Commit email (Thanks to kou for making it.
- http://www.commit-email.info
Since the difference is colored, it is easier to see than the original commit email. You can subscribe below.
To: [email protected]
Cc: [email protected]
Subject: Subscribe
--
subscribe
Report any JIRA bugs here
- https://issues.apache.org/jira/browse/spark
Those who want to contribution
- https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
- http://www.slideshare.net/hadoopxnttdata/apache-spark-commnity-nttdata-sarutak