Author: David Lowe

Binary Class Tabular Model for Kaggle Playground Series Season 3 Episode 3 Using Python and XGBoost

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Playground Series Season 3 Episode 3 dataset is a binary-class modeling situation where we attempt to predict one of two possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions to give the Kaggle community a variety of reasonably lightweight challenges that can be used to learn and sharpen skills in different aspects of machine learning and data science. The dataset for this competition was generated from a deep learning model trained on the IBM HR Analytics Employee Attrition & Performance dataset. Feature distributions are close to but different from the original.

ANALYSIS: The performance of the preliminary XGBoost model achieved a ROC/AUC benchmark of 0.8511 after training. When we processed the test dataset with the final model, the model achieved a ROC/AUC score of 0.8969.

CONCLUSION: In this iteration, the XGBoost model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Playground Series Season 3, Episode 3

Dataset ML Model: Binary-Class classification with numerical and categorical features

Dataset Reference: https://www.kaggle.com/competitions/playground-series-s3e3

One source of potential performance benchmarks: https://www.kaggle.com/competitions/playground-series-s3e3/leaderboard

The HTML formatted report can be found here on GitHub.

Binary Class Tabular Model for Kaggle Playground Series Season 3 Episode 3 Using Python and TensorFlow Decision Forests

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Playground Series Season 3 Episode 3 dataset is a binary-class modeling situation where we attempt to predict one of two possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions to give the Kaggle community a variety of reasonably lightweight challenges that can be used to learn and sharpen skills in different aspects of machine learning and data science. The dataset for this competition was generated from a deep learning model trained on the IBM HR Analytics Employee Attrition & Performance dataset. Feature distributions are close to but different from the original.

ANALYSIS: The Random Forest model performed the best with the training dataset. The model achieved an AUC/ROC benchmark of 0.9992. When we processed the test dataset with the final model, the model achieved an AUC/ROC score of 0.8734.

CONCLUSION: In this iteration, the Random Forest model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Playground Series Season 3, Episode 3

Dataset ML Model: Binary-Class classification with numerical and categorical features

Dataset Reference: https://www.kaggle.com/competitions/playground-series-s3e3

One source of potential performance benchmarks: https://www.kaggle.com/competitions/playground-series-s3e3/leaderboard

The HTML formatted report can be found here on GitHub.

Binary Class Tabular Model for Kaggle Playground Series Season 3 Episode 3 Using Python and Scikit-Learn

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Playground Series Season 3 Episode 3 dataset is a binary-class modeling situation where we attempt to predict one of two possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions to give the Kaggle community a variety of reasonably lightweight challenges that can be used to learn and sharpen skills in different aspects of machine learning and data science. The dataset for this competition was generated from a deep learning model trained on the IBM HR Analytics Employee Attrition & Performance dataset. Feature distributions are close to but different from the original.

ANALYSIS: The average performance of the machine learning algorithms achieved an AUC/ROC benchmark of 0.7566 after training. Furthermore, we selected Random Forest as the final model as it processed the training dataset with an AUC/ROC score of 0.8380. When we processed the test dataset with the final model, the model achieved an AUC/ROC score of 0.8797.

CONCLUSION: In this iteration, the Random Forest model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Playground Series Season 3, Episode 3

Dataset ML Model: Binary-Class classification with numerical and categorical features

Dataset Reference: https://www.kaggle.com/competitions/playground-series-s3e3

One source of potential performance benchmarks: https://www.kaggle.com/competitions/playground-series-s3e3/leaderboard

The HTML formatted report can be found here on GitHub.

Annie Duke on Quitting, Part 5

In her book, Quit: The Power of Knowing When to Walk Away, Annie Duke shares her inspiration and recommendations to help us make better decisions.

These are some of my favorite recommendations from reading the book.

Chapter 5 Sunk Cost and the Fear of Waste

“The sunk cost effect is a cognitive illusion where people take into account resources they have previously sunk into an endeavor when making decisions about whether to continue and spend more.”

“The sunk cost effect causes people to stick in situations that they ought to be quitting.”

“When deciding whether to stick or quit, we are worried that if we walk away, we will have wasted the resources we have spent in the trying.”

“You might be experiencing the sunk cost fallacy if you hear yourself thinking “If I don’t make this work I will have wasted years of my life!” or “We can’t fire her now, she’s been here for decades!””

“Sunk costs snowball, like a katamari. The resources you have already spent make it less likely you will quit, which makes it more likely you will accumulate additional sunk costs, which makes it again less likely you will quit, and so on. The growing debris of your prior commitment makes it increasingly harder to walk away.”

“We don’t like to close mental accounts in the losses.”

“Knowing about the sunk cost effect doesn’t keep you from falling prey to it.”

“You can’t trick yourself into not taking sunk costs into account by trying to view the situation as a new choice. Asking whether or not you would continue if the decision were a fresh one doesn’t mitigate the sunk cost effect the way you might intuitively think it would.”

是一場事件還是一段旅程?

(從我一個尊敬的作家,賽斯·高汀

它們很容易會被混淆。

一場事件在某個日期發生,然後就結束了,之後沒有什麼其它可做的了。

一段旅程可能包括某一場事件,但它比那事件還更大,而且還在繼續。

婚禮是一場事件,婚姻是一段旅程。

一本書出版的那一周是一場事件,而書中思想的創造、出版和生命週期則是一段旅程。

我們對某場事件所投入的關注和精力,經常會很容易分散我們對我們該關心的旅程上的注意力。