Binary Class Tabular Model for Kaggle Playground Series Season 3 Episode 7 Using Python and TensorFlow Decision Forests

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Playground Series Season 3 Episode 7 dataset is a binary-class modeling situation where we attempt to predict one of two possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions to give the Kaggle community a variety of reasonably lightweight challenges that can be used to learn and sharpen skills in different aspects of machine learning and data science. The dataset for this competition was generated from a deep learning model trained on the Reservation Cancellation Prediction dataset. Feature distributions are close to but different from the original.

ANALYSIS: The Random Forest model performed the best with the training dataset. The model achieved a ROC/AUC benchmark of 0.9376. When we processed the test dataset with the final model, the model achieved a ROC/AUC score of 0.8885.

CONCLUSION: In this iteration, the Random Forest model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Playground Series Season 3, Episode 7

Dataset ML Model: Binary-Class classification with numerical features

Dataset Reference: https://www.kaggle.com/competitions/playground-series-s3e7

One source of potential performance benchmarks: https://www.kaggle.com/competitions/playground-series-s3e7/leaderboard

The HTML formatted report can be found here on GitHub.

Binary Class Tabular Model for Kaggle Playground Series Season 3 Episode 7 Using Python and Scikit-Learn

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Playground Series Season 3 Episode 7 dataset is a binary-class modeling situation where we attempt to predict one of two possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions to give the Kaggle community a variety of reasonably lightweight challenges that can be used to learn and sharpen skills in different aspects of machine learning and data science. The dataset for this competition was generated from a deep learning model trained on the Reservation Cancellation Prediction dataset. Feature distributions are close to but different from the original.

ANALYSIS: The average performance of the machine learning algorithms achieved a ROC/AUC benchmark of 0.8470 after training. Furthermore, we selected Random Forest as the final model as it processed the training dataset with a ROC/AUC score of 0.8891. When we processed the test dataset with the final model, the model achieved a ROC/AUC score of 0.8716.

CONCLUSION: In this iteration, the Random Forest model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Playground Series Season 3, Episode 7

Dataset ML Model: Binary-Class classification with numerical features

Dataset Reference: https://www.kaggle.com/competitions/playground-series-s3e7

One source of potential performance benchmarks: https://www.kaggle.com/competitions/playground-series-s3e7/leaderboard

The HTML formatted report can be found here on GitHub.

Anders Ericsson and Robert Pool on Peak, Part 1

In their book, Peak: secrets from the new science of expertise, Anders Ericsson and Robert Pool share their research findings and recommendations to help us achieve expert-level performance in whatever we would like to do.

These are some of my favorite recommendations from reading the book.

Chapter 1. The Power of Purposeful Practice

“The most effective and most powerful types of practice in any fieldwork by harnessing the adaptability of the human body and brain to create, step by step, the ability to do things that were previously not possible.”

If we wish to develop a truly effective training method for anything, that method will need to consider what works and what doesn’t in driving changes in the body and brain.

For many skills, once we have reached a “generally acceptable” or satisfactory level and automated our performance, we have stopped improving. “Naïve practice” is essentially doing something repeatedly and expecting the repetition alone to improve our performance. To break through the plateau and reach a higher level of performance, we will need to engage in “purposeful practice.”

“Purposeful practice is all about putting a bunch of baby steps together to reach a longer-term goal.”

  • Purposeful practice is focused. We seldom improve much without giving the task our full attention.
  • Purposeful practice involves feedback. We need feedback to identify exactly where we are falling short. Without feedback, we have difficulty determining what we need to improve or how close we are to achieving our goals.
  • Purposeful practice requires getting out of one’s comfort zone. We will never improve if we never push ourselves beyond our comfort zone.

Getting out of our comfort zone means trying to do something we could not do before. Generally, the solution is not “try harder” but rather “try differently.” We need to pay attention to our practice techniques. The best way to overcome any barrier is to come from a different direction.

Although it is generally possible to improve to a certain degree with focused practice and to stay out of our comfort zone, more is needed. There are other equally important aspects to practicing and training ourselves to reach a higher level of performance.

選擇你的問題

(從我一個尊敬的作家,賽斯·高汀

也許您只承認並關注您熟悉的問題,並且對適當的回應能感到滿意。當否認某些事的存在會比要去對付它們來得更容易。

或者有可能是你去選擇想看到的問題,但那只是個實際情況,是無法解決的,而且還會增加我們的絕望感。

還有可能的是你更喜歡快速、緊急和簡單的問題,因為解決它們會令人感到興奮。

或者它也可能是在你雷達上出現的一種長期、困難和遙遠的問題,因為這樣的話,畢竟你怎麼能全權負責呢?

但是問題並不真正會關心我們是否去承認它們的存在。它們仍然會繼續存在。重要的是我們選擇如何引導我們的能量來解決,因為我們的明天才是我們今天使用資源方式的直接結果。

如何去選擇你的問題,也是去選擇你的未來。

Regression Tabular Model for Kaggle Playground Series Season 3 Episode 6 Using Python and Scikit-learn

NOTE: This notebook was produced using Amazon’s SageMaker Studio Lab. I re-created this notebook to compare it with an earlier notebook created using Google’s Colab.

SUMMARY: The project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Playground Series Season 3 Episode 6 Dataset is a regression modeling situation where we are trying to predict the value of a continuous variable.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions to give the Kaggle community a variety of reasonably lightweight challenges that can be used to learn and sharpen skills in different aspects of machine learning and data science. The dataset for this competition was generated from a deep learning model trained on the Paris Housing Price Prediction dataset. Feature distributions are close to but different from the original.

ANALYSIS: The average performance of the machine learning algorithms achieved an RMSE benchmark of 454,623 after preliminary training runs. Furthermore, we selected Random Forest Regressor as the final model as it processed the training dataset with an RMSE score of 133,540. When we tested the final model using the test dataset, the model achieved an RMSE score of 225,268.

CONCLUSION: In this iteration, the Random Forest model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Playground Series Season 3, Episode 6

Dataset ML Model: Regression with numerical features

Dataset Reference: https://www.kaggle.com/competitions/playground-series-s3e6

One source of potential performance benchmarks: https://www.kaggle.com/competitions/playground-series-s3e6/leaderboard

The HTML formatted report can be found here on GitHub. [https://github.com/daines-analytics/tabular-data-projects/tree/master/py_regression_kaggle_playground_series_s3e06]