Month: March 2020

Binary Classification Model for Rain in Australia Using TensorFlow Take 4

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Rain in Australia dataset is a binary classification situation where we are trying to predict one of the two possible outcomes.

INTRODUCTION: This dataset contains daily weather observations from numerous Australian weather stations. The target variable RainTomorrow represents whether it rained the next day. We also should exclude the variable Risk-MM when training a binary classification model. By not eliminating the Risk-MM feature, we run a risk of leaking the answers into our model and reduce its effectiveness.

In iteration Take1, we constructed several traditional machine learning models using the linear, non-linear, and ensemble techniques. We also observed the best accuracy score that we could obtain with each of these models.

In iteration Take2, we constructed and tuned an XGBoost machine learning model for this dataset. We also observed the best accuracy score that we could obtain with the XGBoost model.

In iteration Take3, we constructed several Multilayer Perceptron (MLP) models with one, two, and three hidden layers. The one-layer MLP model serves as the baseline models as we build more complex MLP models in future iterations.

In this Take4 iteration, we will tune the single-layer MLP model and see whether we can improve our accuracy score.

ANALYSIS: In iteration Take1, the baseline performance of the machine learning algorithms achieved an average accuracy of 83.83%. Two algorithms (Extra Trees and Random Forest) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Random Forest turned in a better overall result than Extra Trees with a lower variance. Random Forest achieved an accuracy metric of 85.44%. When configured with the optimized parameters, the Random Forest algorithm processed the test dataset with an accuracy of 85.52%, which was consistent with the accuracy score from the training phase.

In iteration Take2, the XGBoost algorithm achieved a baseline accuracy of 84.69% by setting n_estimators to the default value of 100. After a series of tuning trials, XGBoost turned in an overall accuracy result of 86.21% with the n_estimators value set to 1000. When we apply the tuned XGBoost model to the test dataset, we obtained an accuracy score of 86.27%, which was consistent with the model performance from the training phase.

In iteration Take3, all one-layer models achieved an accuracy performance of around 86%. The eight-nodes model appears to overfit the least, when compared with models with 12, 16, and 20 nodes. The single-layer eight-nodes model also seems to work better than the two and three-layer models by processing the test dataset with an accuracy score of 86.10% after 20 epochs.

In this Take4 iteration, all models achieved an accuracy performance of around 86%. The model with the RMSprop optimizer appears to have the best accuracy score when predicting with the test dataset. It processed the test dataset with an accuracy score of 86.23% after 20 epochs.

CONCLUSION: For this iteration, the single-layer eight-nodes MLP model produced the accuracy score that is comparable to the XGBoost model. For this dataset, we should consider doing more tuning with the XGBoost and the MLP models.

Dataset Used: Rain in Australia Data Set

Dataset ML Model: Binary classification with numerical and categorical attributes

Dataset Reference: https://www.kaggle.com/jsphyg/weather-dataset-rattle-package

One potential source of performance benchmark: https://www.kaggle.com/jsphyg/weather-dataset-rattle-package/kernels

The HTML formatted report can be found here on GitHub.

Binary Classification Model for Rain in Australia Using TensorFlow Take 3

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Rain in Australia dataset is a binary classification situation where we are trying to predict one of the two possible outcomes.

INTRODUCTION: This dataset contains daily weather observations from numerous Australian weather stations. The target variable RainTomorrow represents whether it rained the next day. We also should exclude the variable Risk-MM when training a binary classification model. By not eliminating the Risk-MM feature, we run a risk of leaking the answers into our model and reduce its effectiveness.

In iteration Take1, we constructed several traditional machine learning models using the linear, non-linear, and ensemble techniques. We also observed the best accuracy score that we could obtain with each of these models.

In iteration Take2, we constructed and tuned an XGBoost machine learning model for this dataset. We also observed the best accuracy score that we could obtain with the XGBoost model.

In this Take3 iteration, we will construct several Multilayer Perceptron (MLP) models with one, two, and three hidden layers. These simple MLP models will serve as the baseline models as we build more complex MLP models in future iterations.

ANALYSIS: In iteration Take1, the baseline performance of the machine learning algorithms achieved an average accuracy of 83.83%. Two algorithms (Extra Trees and Random Forest) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Random Forest turned in a better overall result than Extra Trees with a lower variance. Random Forest achieved an accuracy metric of 85.44%. When configured with the optimized parameters, the Random Forest algorithm processed the test dataset with an accuracy of 85.52%, which was consistent with the accuracy score from the training phase.

In iteration Take2, the XGBoost algorithm achieved a baseline accuracy of 84.69% by setting n_estimators to the default value of 100. After a series of tuning trials, XGBoost turned in an overall accuracy result of 86.21% with the n_estimators value set to 1000. When we apply the tuned XGBoost model to the test dataset, we obtained an accuracy score of 86.27%, which was consistent with the model performance from the training phase.

In this Take3 iteration, all one-layer models achieved an accuracy performance of around 86%. The eight-nodes model appears to overfit the least, when compared with models with 12, 16, and 20 nodes. The single-layer eight-nodes model also seems to work better than the two and three-layer models by processing the test dataset with an accuracy score of 86.10% after 20 epochs.

CONCLUSION: For this iteration, the single-layer eight-nodes MLP model produced the accuracy score that is comparable to the XGBoost model. For this dataset, we should consider doing more tuning with the XGBoost and the MLP models.

Dataset Used: Rain in Australia Data Set

Dataset ML Model: Binary classification with numerical and categorical attributes

Dataset Reference: https://www.kaggle.com/jsphyg/weather-dataset-rattle-package

One potential source of performance benchmark: https://www.kaggle.com/jsphyg/weather-dataset-rattle-package/kernels

The HTML formatted report can be found here on GitHub.

Kathy Sierra on Making Users Awesome, Part 7

In the book, Badass: Making Users Awesome, Kathy Sierra analyzed and discussed the new ways of thinking about designing and sustaining successful products and services.

These are some of my takeaways from reading the book.

In this section, Kathy continues the discussion on how to help our users keep wanting to get better at a skill. We can help them move forward with two approaches.

The first approach is to remove the blocks to their progress. The second approach is to examine the elements that can pull the user forward.

To help users stayed motivated, we need to give them two things: progress and payoff.

We know what to do with managing the progress. What can we do about the payoff?

Kathy suggests that we need to provide ideas and tools to help users use their current skills early and often.

By asking the question, “What can they do within the first 30 minutes?” we seek to lower the initial threshold for “user-does-something-meaningful.”

However, fear can derail users before they start. If we want the users to feel powerful early, we need to anticipate and compensate for anything that keeps them from experimenting.

We can give users the ability to try things and provide them the information and tools to recover from their experiments without breaking anything.

The ideal user path is a continuous series of loops, each with a motivating “next superpower” goal, skill-building work with exposure-to-good-examples, followed by a payoff.

The best payoff of all is those intrinsically rewarding experiences when the users celebrate the experience reward for its own sake. Two kinds of intrinsic motivation can be powerful.

The first kind is the “High Resolution,” where the users develop an appreciation for increasingly more subtle details when others cannot perceive.

The second kind is the “Flow” where the users are so fully absorbed in a stimulating and challenging activity that they lose the sense of time.

The users need to reach those high-payoff goals for themselves, but we can give them some tips and tricks for the domain to help them get there faster.

The tips and tricks are not convenient, cut-the-corner short-cuts. They are about helping the users bypass the unnecessarily long way. We do not want our users to spend too much time reinforcing beginner skills. We need to help them to make progress on their paths continually.

當在混亂的時侯

(從我一個尊敬的作家,賽斯·高汀

我們有兩個選擇:

一條路是我們可以參加這壓力,噪音和瘋狂,並使之更加混亂。這就是傳播混亂的方式。在感覺上像是應該所做的事,去加入焦慮,但事實並非如此。實際上,這種焦慮並不能幫助任何人,可能會使一些真正有需要的人感到更困難。如果有人需要,請伸手幫個忙。但是如果不是這樣,放大混亂後會變得更糟,應該考慮採用其他方法。

另一條路是及時花一些時間來深入謀略並找出下一步該如何進行。在每次當市場給打斷期間,都會有人開始建立新的市場。在職業調整中,建立了新的職業。

學習的法力在於是你來決定的。甚至在混亂消退之後也是如此。

Binary Classification Model for Rain in Australia Using Python Take 2

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The Rain in Australia dataset is a binary classification situation where we are trying to predict one of the two possible outcomes.

INTRODUCTION: This dataset contains daily weather observations from numerous Australian weather stations. The target variable RainTomorrow represents whether it rained the next day. We also should exclude the variable Risk-MM when training a binary classification model. By not eliminating the Risk-MM feature, we run a risk of leaking the answers into our model and reduce its effectiveness.

In iteration Take1, we constructed several traditional machine learning models using the linear, non-linear, and ensemble techniques. We also observed the best accuracy score that we could obtain with each of these models.

In this Take2 iteration, we will construct and tune an XGBoost machine learning model for this dataset. We will observe the best accuracy score that we can obtain with the XGBoost model.

ANALYSIS: In iteration Take1, the baseline performance of the machine learning algorithms achieved an average accuracy of 83.83%. Two algorithms (Extra Trees and Random Forest) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Random Forest turned in a better overall result than Extra Trees with a lower variance. Random Forest achieved an accuracy metric of 85.44%. When configured with the optimized parameters, the Random Forest algorithm processed the test dataset with an accuracy of 85.52%, which was consistent with the accuracy score from the training phase.

In this Take2 iteration, the XGBoost algorithm achieved a baseline accuracy of 84.69% by setting n_estimators to the default value of 100. After a series of tuning trials, XGBoost turned in an overall accuracy result of 86.21% with the n_estimators value set to 1000. When we apply the tuned XGBoost model to the test dataset, we obtained an accuracy score of 86.27%, which was consistent with the model performance from the training phase.

CONCLUSION: For this iteration, the XGBoost algorithm achieved the best overall result using the training and test datasets. For this dataset, XGBoost should be considered for further modeling.

Dataset Used: Rain in Australia Data Set

Dataset ML Model: Binary classification with numerical and categorical attributes

Dataset Reference: https://www.kaggle.com/jsphyg/weather-dataset-rattle-package

One potential source of performance benchmark: https://www.kaggle.com/jsphyg/weather-dataset-rattle-package/kernels

The HTML formatted report can be found here on GitHub.