Regression Tabular Model for Kaggle Playground Series Season 3 Episode 1 Using Python and AutoKeras

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Playground Series Season 3 Episode 1 Dataset is a regression modeling situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions to give the Kaggle community a variety of reasonably lightweight challenges that can be used to learn and sharpen skills in different aspects of machine learning and data science. The dataset for this competition was generated from a deep learning model trained on the California Housing Dataset. Feature distributions are close to but different from the original.

ANALYSIS: After 100 trials, the best AutoKeras model processed the training dataset with a loss rate 0.6705. When we tested the final model using the test dataset, the model achieved an RMSE score of 0.7341.

CONCLUSION: In this iteration, AutoKeras appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Playground Series Season 3, Episode 1

Dataset ML Model: Regression with numerical features

Dataset Reference: https://www.kaggle.com/competitions/playground-series-s3e1

One source of potential performance benchmarks: https://www.kaggle.com/competitions/playground-series-s3e1/leaderboard

The HTML formatted report can be found here on GitHub.