Data Science with Scala
BeginnerCourse
Apache Spark™ is a fast and general engine for large-scale data processing, with built-in modules for streaming, SQL, machine learning and graph processing. This course shows how to use Spark’s machine learning pipelines to fit models and search for optimal hyperparameters using a Spark cluster.

Language
- English
Topic
- Scala
Enrollment Count
- 7.48K
Skills You Will Learn
- Scala, Data Science, Apache Spark, Machine Learning
Offered By
- LightBend
Estimated Effort
- 6 hours
Platform
- SkillsNetwork
Last Update
- April 3, 2025
About this Course
ABOUT THIS SCALA COURSE
In this course you will learn about Basic statistics and data types, Preparing data, Feature engineering, Fitting a model and Pipelines and grid search. Apache Spark™ is a fast and general engine for large-scale data processing, with built-in modules for streaming, machine learning and graph processing. This course shows you how to use Spark’s machine learning pipelines to fit models and search for optimal hyperparameters using a Spark cluster.
COURSE SYLLABUS
Module 1 - Basic Statistics and Data Types
- Vectors and Labelled Points
- Local and Distributed Matrices
- Summary Statistics, Correlations, and Random Data
- Sampling
- Hypothesis Testing
Module 2 - Preparing Data
- Statistics, Random data and Sampling on Data Frames
- Handling Missing Data and Imputing Values
- Transformers and Estimators
- Data Normalization
- Identifying Outliers
Module 3 - Feature Engineering
- Feature Vectors
- Categorical Features
- Using Explode, User Defined Functions, and Pivot
- Principal Component Analysis (PCA) in Feature Engineering
- RFormulas
Module 4 - Fitting a Model
- Decision Trees
- Random Forests
- Gradient-Boosting Trees
- Linear Methods
- Evaluation
Module 5 - Pipeline and Grid Search
- Predicting Grant Applications: Introduction
- Predicting Grant Applications: Creating Features
- Predicting Grant Applications: Building a Pipeline
- Prediciting Grant Applications: Cross Validation and Model Tuning
- Predicting Grant Applications: Wrapping up
GENERAL INFORMATION
- This course is self-paced.
- It can be taken at any time.
- It can be audited as many times as you wish.
- There is only ONE chance to pass the course, but multiple attempts per question
RECOMMENDED SKILLS PRIOR TO TAKING THIS COURSE
- General understanding of Scala Experience with Java (preferred)
- Python, or another object oriented language
- General understanding of machine learning
COURSE STAFF
Petro Verkhogliad
Petro Verkhogliad is Consulting Manager at Akka. He holds a Masters degree in Computer Science with specialization in Intelligent Systems. He is passionate about functional programming and applications of AI.

Dr Priya Dev is a lecturer of statistics at ANU and UNSW and also a founder of a mobile commerce startup, Qhopper. She completed a PhD in probability theory from ANU and Columbia University and has been a data analytics consultant to ASX listed companies and global banks. Qhopper is a massively scalable mobile commerce platform built on the Akka platform using Scala and Spark. It bridges the technology gap for hospitality businesses, helping them create better experiences and connect with new and existing customers through their own online ordering, CRM and business intelligence suite.

Joseph has a Ph.D. in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.

Language
- English
Topic
- Scala
Enrollment Count
- 7.48K
Skills You Will Learn
- Scala, Data Science, Apache Spark, Machine Learning
Offered By
- LightBend
Estimated Effort
- 6 hours
Platform
- SkillsNetwork
Last Update
- April 3, 2025