Author: David Lowe

Binary Class Tabular Model for Kaggle Playground Series Season 3 Episode 4 Using Python and Scikit-Learn

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Playground Series Season 3 Episode 4 dataset is a binary-class modeling situation where we attempt to predict one of two possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions to give the Kaggle community a variety of reasonably lightweight challenges that can be used to learn and sharpen skills in different aspects of machine learning and data science. The dataset for this competition was generated from a deep learning model trained on the Credit Card Fraud Detection dataset. Feature distributions are close to but different from the original.

ANALYSIS: The average performance of the machine learning algorithms achieved a ROC/AUC benchmark of 0.8760 after training. Furthermore, we selected Extra Trees as the final model as it processed the training dataset with a ROC/AUC score of 0.9100. When we processed the test dataset with the final model, the model achieved a ROC/AUC score of 0.8072.

CONCLUSION: In this iteration, the Extra Trees model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Playground Series Season 3, Episode 4

Dataset ML Model: Binary-Class classification with numerical features

Dataset Reference: https://www.kaggle.com/competitions/playground-series-s3e4

One source of potential performance benchmarks: https://www.kaggle.com/competitions/playground-series-s3e4/leaderboard

The HTML formatted report can be found here on GitHub.

Annie Duke on Quitting, Part 6

In her book, Quit: The Power of Knowing When to Walk Away, Annie Duke shares her inspiration and recommendations to help us make better decisions.

These are some of my favorite recommendations from reading the book.

Chapter 6 Monkeys and Pedestals

“Monkeys and pedestals is a mental model that helps you quit sooner.”

“Pedestals are the part of the problem you know you can already solve, like designing the perfect business card or logo. The hardest thing is training the monkey.”

“When faced with a complex, ambitious goal, (a) identify the hard thing first; (b) try to solve for that as quickly as possible; and (c) beware of false progress.”

“Building pedestals creates the illusion that you are making progress toward your goal, but doing the easy stuff is a waste of time if the hard stuff is actually impossible.”

“Tackling the monkey first gets you to no faster, limiting the time, effort, and money you sink into a project, making it easier to walk away.”

“When we butt up against a hard problem we can’t solve, we have a tendency to turn to pedestal-building rather than choosing to quit.”

“Advance planning and precommitment contracts increase the chances you will quit sooner.”

“When you enter into a course of action, create a set of kill criteria. This is a list of signals you might see in the future that would tell you it’s time to quit.”

“Kill criteria will help inoculate you against bad decision-making when you’re “in it” by limiting the number of decisions you’ll have to make once you’re already in the gains or in the losses.”

“In organizations, kill criteria allow people a different way to get rewarded beyond dogged and blind pursuit of a project until the bitter end.”

“A common, simple way to develop kill criteria is with “states and dates:” “If by (date), I have/haven’t (reached a particular state), I’ll quit.””

Binary Class Tabular Model for Kaggle Playground Series Season 3 Episode 3 Using Python and AutoKeras

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Playground Series Season 3 Episode 3 dataset is a binary-class modeling situation where we attempt to predict one of two possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions to give the Kaggle community a variety of reasonably lightweight challenges that can be used to learn and sharpen skills in different aspects of machine learning and data science. The dataset for this competition was generated from a deep learning model trained on the IBM HR Analytics Employee Attrition & Performance dataset. Feature distributions are close to but different from the original.

ANALYSIS: After 74 trials, the best AutoKeras model processed the training dataset with the best ROC/AUC score of 0.9078. When we processed the test dataset with the final model, the model achieved a ROC/AUC score of 0.8345.

CONCLUSION: In this iteration, AutoKeras appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Playground Series Season 3, Episode 3

Dataset ML Model: Binary-Class classification with numerical and categorical features

Dataset Reference: https://www.kaggle.com/competitions/playground-series-s3e3

One source of potential performance benchmarks: https://www.kaggle.com/competitions/playground-series-s3e3/leaderboard

The HTML formatted report can be found here on GitHub.

Binary Class Tabular Model for Kaggle Playground Series Season 3 Episode 3 Using Python and TensorFlow

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Playground Series Season 3 Episode 3 dataset is a binary-class modeling situation where we attempt to predict one of two possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions to give the Kaggle community a variety of reasonably lightweight challenges that can be used to learn and sharpen skills in different aspects of machine learning and data science. The dataset for this competition was generated from a deep learning model trained on the IBM HR Analytics Employee Attrition & Performance dataset. Feature distributions are close to but different from the original.

ANALYSIS: The average performance of the cross-validated TensorFlow models achieved a ROC/AUC benchmark of 0.8179 after training. When we processed the test dataset with the final model, the model achieved a ROC/AUC score of 0.8754.

CONCLUSION: In this iteration, TensorFlow appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Playground Series Season 3, Episode 3

Dataset ML Model: Binary-Class classification with numerical and categorical features

Dataset Reference: https://www.kaggle.com/competitions/playground-series-s3e3

One source of potential performance benchmarks: https://www.kaggle.com/competitions/playground-series-s3e3/leaderboard

The HTML formatted report can be found here on GitHub.