Month: December 2021

Multi-Class Model for Kaggle Tabular Playground Series 2021 December Using XGBoost

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Tabular Playground December 2021 dataset is a multi-class modeling situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions on Kaggle with fun but less complex, tabular datasets. The dataset used for this competition is synthetic but based on a real dataset and generated using a CTGAN. The dataset is used for this competition is synthetic, but based on a real dataset and generated using a CTGAN. This dataset is based off of the original Forest Cover Type Prediction competition.

ANALYSIS: The performance of the preliminary XGBoost model achieved an accuracy benchmark of 0.9590. After a series of tuning trials, the final model processed the training dataset with an accuracy score of 0.9613. When we processed the test dataset with the final model, the model achieved an accuracy score of 0.9546.

CONCLUSION: In this iteration, the XGBoost model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Kaggle Tabular Playground 2021 December Data Set

Dataset ML Model: Multi-Class classification with numerical and categorical attributes

Dataset Reference: https://www.kaggle.com/c/tabular-playground-series-dec-2021

One potential source of performance benchmark: https://www.kaggle.com/c/tabular-playground-series-dec-2021/leaderboard

The HTML formatted report can be found here on GitHub.

Multi-Class Model for Kaggle Tabular Playground Series 2021 December Using Scikit-learn

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: This project aims to construct a predictive model using various machine learning algorithms and document the end-to-end steps using a template. The Kaggle Tabular Playground December 2021 dataset is a multi-class modeling situation where we attempt to predict one of several (more than two) possible outcomes.

INTRODUCTION: Kaggle wants to provide an approachable environment for relatively new people in their data science journey. Since January 2021, they have hosted playground-style competitions on Kaggle with fun but less complex, tabular datasets. The dataset used for this competition is synthetic but based on a real dataset and generated using a CTGAN. The dataset is used for this competition is synthetic, but based on a real dataset and generated using a CTGAN. This dataset is based off of the original Forest Cover Type Prediction competition.

ANALYSIS: The average performance of the machine learning algorithms achieved an accuracy benchmark of 0.9343 using the training dataset. Furthermore, we selected Random Forest as the final model as it processed the training dataset with a final accuracy score of 0.9581. When we processed the test dataset with the final model, the model achieved an accuracy score of 0.9513.

CONCLUSION: In this iteration, the RandomForest model appeared to be a suitable algorithm for modeling this dataset.

Dataset Used: Kaggle Tabular Playground 2021 December Data Set

Dataset ML Model: Multi-Class classification with numerical and categorical attributes

Dataset Reference: https://www.kaggle.com/c/tabular-playground-series-dec-2021

One potential source of performance benchmark: https://www.kaggle.com/c/tabular-playground-series-dec-2021/leaderboard

The HTML formatted report can be found here on GitHub.

Seth Godin on Survival Is Not Enough, Part 4

In his book, Survival Is Not Enough: Why Smart Companies Abandon Worry and Embrace Change, Seth Godin discusses how innovative organizations and individuals can apply prudent strategies in adapting and positioning themselves for the constant changes.

These are some of my favorite concepts and takeaways from reading the book.

Chapter 4, Do You Zoom?

In this chapter, Seth discusses the concept of Zooming and why it is essential to start zooming before a crisis comes. He offers the following observations and recommendations for us to think about:

  • Seth stated that “Zooming is about stretching your limits without threatening your foundation. It’s about handing new ideas, new opportunities, and new challenges without triggering the change-avoidance reflex.”
  • With the constant changes, many companies are now stretched beyond their zoom-width. As a result, they see everything new as a threat instead of an opportunity. If we can learn how to zoom and then hire people who want to zoom with the organization, the organization can grow, adapt, and perhaps even transform itself.
  • Zooming is different from change management. Change management is about making adjustments for a significant change or an urgent change with a purpose. Change management is a one-time event, followed by a period of healing.
  • On the other hand, Zooming is about developing the flexibility for constant change. Zooming prepares us to deal with changes for no particular reason or specific goals. We do not have to heal from Zooming any more than we need to recover from breathing.
  • Although every company zooms, some zoom more than others. Increasing our zoom width is a challenge, but the practice can build an asset that pays off for the organization.
  • More importantly, the best time to start zooming is before our company looks at a significant, life-threatening change. We should get into the habit of making frequent, small changes first. Then work our way up to bigger things.
  • Zooming is not the same as traditional re-engineering. A zooming organization is not worried about making today’s machine work better. It is more concerned with being flexible enough to put its assets to work building tomorrow’s machine
  • Re-engineering is a fancy term for layoff and labor force reduction. Flexible organizations make better use of their assets, and the first asset they maximize is their people. Unfortunately, the reality is that we cannot always shrink our way to greatness.
  • Most organizations get attached to the winning strategy that has made the organization’s assets valuable. As the world changed, the assets grew much less helpful, but the organization still hopes to reclaim the former glory by sticking with the old strategy that offers certainty. Unfortunately, fear and inertia keep a company standing still.
  • However, if our new winning strategy is that nothing is certain with change inevitable and welcome, we will likely not be disappointed by the reality.

自由職業者的為難處

(從我一個尊敬的作家,賽斯·高汀

你擁有什麼?

你真正擅長的是什麼?

你喜歡做什麼?

想要去參與市場,需要去為有選擇的人來創造價值。

然而決定為您的客戶去提供什麼,那是您的選擇。

如果您擁有某些東西(專利、建築物、流程、一系列關係),您可以比每次創造更多的價值,來簡單地去重新開始新工作。

如果你真的很擅長某件事,擁有多項的技能和聲譽,你更有可能從新的對象中獲益,也更容易創造價值。

如果你專注於為你真正喜歡的工作做好準備,你的日子會更好,也更容易完成出色的工作。

自由職業者的為難處是來自先要弄清楚自已的立場,而不是“你可以選擇任何人,我就是那個任何人”。

企業家的工作是建立足夠的資產,使每一筆的交易都變得更容易、也更有利可圖。

首先要清楚你擁有什麼、你擅長的是什麼,以及什麼事讓你能高高興興的去做。

Tabular Data Analytics Project Templates Using Python and XGBoost Version 3

As I work on practicing and solving machine learning (ML) problems, I find myself repeating a set of steps and activities repeatedly.

Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a project template that I use to experiment with modeling ML problems using Python and the XGBoost library.

Version 3 of the XGBoost templates contain updated structures and code like the previous XGBoost templates. I designed the templates to address regression, binary classification, and multi-class classification modeling exercises from beginning to end.

You will find the Python templates on the Analytics Project Templates page.