Use Spark to process user activity logs of a 2-month duration and build classification models to predict churned users with Spark’s MLlib.
spark classification tutorial article code

This project will serve as an exploration of how to make a churn-prediction model using Spark, with the following steps included:

  • explore and manipulate our dataset
  • engineer relevant features for our problem
  • split data into train and test sets by sampling churn
  • build binary classifier models with Spark’s DataFrame-based MLlib
  • select and fine-tune the final model with Spark’s ML Pipelines and a StratifiedCrossValidator

Don't forget to tag @silviaclaire in your comment, otherwise they may not be notified.

Authors original post
AWS DevOps / Machine Learning Engineer / Full-Stack Developer / M.E of Architecture / Tokyo Univ.
Share this project