Analyzing Big Data in R using Apache Spark
Apache Spark is a popular cluster computing framework used for performing large scale data analysis. SparkR provides a distributed data frame API that enables structured data processing with a syntax familiar to R users.
RP0105EN
(0)

Beginnercourse
Language
- English
Topic
- R Programming
Organization
- BDU
Estimated Effort
- 6 hours
About This course
About This Course
Master Apache Spark, a popular cluster computing framework used for performing large scale data analysis. SparkR provides a distributed data frame API that enables structured data processing with a syntax familiar to R users.- Learn why R is a popular statistical programming language with a number of extensions that support data processing and machine learning tasks.
- Learn how SparkR, an R package that provides a light-weight frontend, uses Apache Spark from R.
Course Syllabus
- Module 1 - Introduction to SparkR
- Learn what SparkR is
- Understand why you would use SparkR
- List the features of SparkR
- Understand the interfaces into SparkR
- Module 2 - Data manipulation in SparkR
- Understand how to use dataframes
- Learn to select data
- Learn to filter data
- Learn to aggregate data
- Learn to operate on columns
- Understand how to write SQL queries
- Lab 1 - Getting started with SparkR
- Lab 2 - Data manipulation in SparkR
- Module 3 - Machine learning in SparkR
- Understand machine learning
- Learn how to use GLM model
- Lab 3 - Linear models in SparkR
Recommended skills prior to taking this course
- None
Requirements
- None
Course Staff
Alan Barnes
Alan Barnes is a Senior IBM Information Management Course Developer / Consultant. He has worked in several companies as a Senior Technical Consultant, Database Team Manager, Application Programmer, Systems Programmer, Business Analyst, DB2 Team Lead and more. His career in IT spans more than 35 years.

Beginnercourse
Language
- English
Topic
- R Programming
Organization
- BDU
Estimated Effort
- 6 hours