Author: David Lowe

Binary Class Tabular Model for Kaggle Playground Series Season 3 Episode 2 Using Python and TensorFlow

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Playground Series Season 3 Episode 2 dataset is a binary-class modeling situation where we attempt to predict one of two possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions to give the Kaggle community a variety of reasonably lightweight challenges that can be used to learn and sharpen skills in different aspects of machine learning and data science. The dataset for this competition was generated from a deep learning model trained on the Stroke Prediction Dataset. Feature distributions are close to but different from the original.

ANALYSIS: The average performance of the cross-validated TensorFlow models achieved a ROC/AUC benchmark of 0.8641 after training. When we processed the test dataset with the final model, the model achieved a ROC/AUC score of 0.8742.

CONCLUSION: In this iteration, TensorFlow appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Playground Series Season 3, Episode 2

Dataset ML Model: Binary-Class classification with numerical and categorical features

Dataset Reference: https://www.kaggle.com/competitions/playground-series-s3e2

One source of potential performance benchmarks: https://www.kaggle.com/competitions/playground-series-s3e2/leaderboard

The HTML formatted report can be found here on GitHub.

Binary Class Tabular Model for Kaggle Playground Series Season 3 Episode 2 Using Python and XGBoost

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Playground Series Season 3 Episode 2 dataset is a binary-class modeling situation where we attempt to predict one of two possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions to give the Kaggle community a variety of reasonably lightweight challenges that can be used to learn and sharpen skills in different aspects of machine learning and data science. The dataset for this competition was generated from a deep learning model trained on the Stroke Prediction Dataset. Feature distributions are close to but different from the original.

ANALYSIS: The performance of the preliminary XGBoost model achieved a ROC/AUC benchmark of 0.8772 after training. When we processed the test dataset with the final model, the model achieved a ROC/AUC score of 0.8730.

CONCLUSION: In this iteration, the XGBoost model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Playground Series Season 3, Episode 2

Dataset ML Model: Binary-Class classification with numerical and categorical features

Dataset Reference: https://www.kaggle.com/competitions/playground-series-s3e2

One source of potential performance benchmarks: https://www.kaggle.com/competitions/playground-series-s3e2/leaderboard

The HTML formatted report can be found here on GitHub.

Binary Class Tabular Model for Kaggle Playground Series Season 3 Episode 2 Using Python and TensorFlow Decision Forests

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Playground Series Season 3 Episode 2 dataset is a binary-class modeling situation where we attempt to predict one of two possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions to give the Kaggle community a variety of reasonably lightweight challenges that can be used to learn and sharpen skills in different aspects of machine learning and data science. The dataset for this competition was generated from a deep learning model trained on the Stroke Prediction Dataset. Feature distributions are close to but different from the original.

ANALYSIS: The Random Forest model performed the best with the training dataset. The model achieved a ROC/AUC benchmark of 0.9914. When we processed the test dataset with the final model, the model achieved a ROC/AUC score of 0.8731.

CONCLUSION: In this iteration, the Random Forest model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Playground Series Season 3, Episode 2

Dataset ML Model: Binary-Class classification with numerical and categorical features

Dataset Reference: https://www.kaggle.com/competitions/playground-series-s3e2

One source of potential performance benchmarks: https://www.kaggle.com/competitions/playground-series-s3e2/leaderboard

The HTML formatted report can be found here on GitHub.

Binary Class Tabular Model for Kaggle Playground Series Season 3 Episode 2 Using Python and Scikit-Learn

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Playground Series Season 3 Episode 2 dataset is a binary-class modeling situation where we attempt to predict one of two possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions to give the Kaggle community a variety of reasonably lightweight challenges that can be used to learn and sharpen skills in different aspects of machine learning and data science. The dataset for this competition was generated from a deep learning model trained on the Stroke Prediction Dataset. Feature distributions are close to but different from the original.

ANALYSIS: The average performance of the machine learning algorithms achieved an AUC/ROC benchmark of 0.7836 after training. Furthermore, we selected Logistic Regression as the final model as it processed the training dataset with an AUC/ROC score of 0.8735. When we processed the test dataset with the final model, the model achieved an AUC/ROC score of 0.8662.

CONCLUSION: In this iteration, the Logistic Regression model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Playground Series Season 3, Episode 2

Dataset ML Model: Binary-Class classification with numerical and categorical features

Dataset Reference: https://www.kaggle.com/competitions/playground-series-s3e2

One source of potential performance benchmarks: https://www.kaggle.com/competitions/playground-series-s3e2/leaderboard

The HTML formatted report can be found here on GitHub.

Annie Duke on Quitting, Part 4

In her book, Quit: The Power of Knowing When to Walk Away, Annie Duke shares her inspiration and recommendations to help us make better decisions.

These are some of my favorite recommendations from reading the book.

Chapter 4 Escalating Commitment

“When we are in the losses, we are not only more likely to stick to a losing course of action, but also to double down. This tendency is called escalation of commitment.”

“Escalation of commitment is robust and universal, occurring in individuals, organizations, and governmental entities. All of us tend to get stuck in courses of action once started, especially in the face of bad news.”

“Escalation of commitment doesn’t just occur in high-stakes situations. It also happens when the stakes are low, demonstrating the pervasiveness of the error.”