- Exclusive guide that covers how to get up and running with fast data processing using Apache Spark
- Explore and exploit various possibilities with Apache Spark using real-world use cases in this book
- Want to perform efficient data processing at real time? This book will be your one-stop solution.
Spark juggernaut keeps on rolling and getting more and more momentum each day. The core challenge are they key capabilities in Spark (Spark SQL, Spark Streaming, Spark ML, Spark R, Graph X) etc. Having understood the key capabilities, it is important to understand how Spark can be used, in terms of being installed as a Standalone framework or as a part of existing Hadoop installation and configuring with Yarn and Mesos.
The next part of the journey after installation is using key components, APIs, Clustering, machine learning APIs, data pipelines, parallel programming. It is important to understand why each framework component is key, how widely it is being used, its stability and pertinent use cases.
Once we understand the individual components, we will take a couple of real life advanced analytics examples like:
- Building a Recommendation system
- Predicting customer churn
The objective of these real life examples is to give the reader confidence of using Spark for real-world problems.
What you will learn
- Overview Big Data Analytics and its importance for organizations and data professionals.
- Delve into Spark to see how it is different from existing processing platforms
- Understand the intricacies of various file formats, and how to process them with Apache Spark.
- Realize how to deploy Spark with YARN, MESOS or a Stand-alone cluster manager.
- Learn the concepts of Spark SQL, SchemaRDD, Caching, Spark UDFs and working with Hive and Parquet file formats
- Understand the architecture of Spark MLLib while discussing some of the off-the-shelf algorithms that come with Spark.
- Introduce yourself to SparkR and walk through the details of data munging including selecting, aggregating and grouping data using R studio.
- Walk through the importance of Graph computation and the graph processing systems available in the market
- Check the real world example of Spark by building a recommendation engine with Spark using collaborative filtering
- Use a telco data set, to predict customer churn using Regression
About the Author
Asif Abbasi has worked in the industry for over 15 years, in a variety of roles starting from engineering solutions to selling solutions and everything in between. Asif is currently working with SAS a Market Leader in Analytic Solutions as a Principal Business Solutions Manager for the Global Technologies Practice.
Based out of London, Asif has vast experience in consulting for major organizations & industries across the globe, and running proof-of-concepts across various industries including but not limited to Telecommunications, Manufacturing, Retail, Finance, Services, Utilities and Government.
Asif has presented at various conferences and delivered workshops on topics such as Big Data, Hadoop, Teradata, and Analytics using Aster on Teradata and Hadoop. Asif is a Oracle Certified Java EE 5 Enterprise Architect, Teradata Certified Master, PMP, Hortonworks Hadoop Certified developer and Administrator. Asif also holds a Masters degree in Computer Science and Business Administration.
Table of Contents
Chapter 1. Architecture And Installation Chapter 2. Transformations And Actions With Spark Rdds Chapter 3. Etl With Spark Chapter 4. Spark Sql Chapter 5. Spark Streaming Chapter 6. Machine Learning With Spark Chapter 7. Graphx Chapter 8. Operating In Clustered Mode Chapter 9. Building A Recommendation System Chapter 10. Customer Churn Prediction
Title: Learning Apache Spark 2 Author: Muhammad Asif Abbasi Length: 356 pages Edition: 1 Language: English Publisher: Packt Publishing Publication Date: 2017-06-06 ISBN-10: B01M7RO7US