Month: November 2019

Binary Classification Model for MiniBooNE Particle Identification Using Python Take 6

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The MiniBooNE Particle Identification dataset is a binary classification situation where we are trying to predict one of the two possible outcomes.

INTRODUCTION: This dataset is taken from the MiniBooNE experiment and is used to distinguish electron neutrinos (signal) from muon neutrinos (background). The data file is set up as follows. The first line is the number of signal events followed by the number of background events. The records with the signal events come first, followed by the background events. Each line, after the first line, has the 50 particle ID variables for one event.

For this iteration, we will leverage TPOT, the automated machine learning tool got Python, that optimizes machine learning pipelines using genetic programming.

ANALYSIS: The baseline performance of the machine learning algorithms achieved the best accuracy of 91.11% after generation one. After generation 20, Gradient Boosting turned in the top overall result and achieved an accuracy metric of 92.44%. Furthermore, the Gradient Boosting algorithm processed the testing dataset with an accuracy of 92.83%, which was even better than the prediction result from the training data.

CONCLUSION: For this iteration, the Gradient Boosting algorithm achieved the best overall results using the training and test datasets. For this dataset, Gradient Boosting should be considered for further modeling.

Dataset Used: MiniBooNE Particle Identification Data Set

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/MiniBooNE+particle+identification

The HTML formatted report can be found here on GitHub.

The Free Market Has an Enemy

In his Akimbo podcast [https://www.akimbo.me/], Seth Godin teaches us how to adopt a posture of possibility, change the culture, and choose to make a difference. Here are my takeaways from the episode.

In this podcast, Seth discusses the concept of a free market and how it differs from capitalism.

A free market is a place where, the buyers and sellers try to figure out what others want or can provide. We like the opportunities that a free market can provide. In our complex society, the free market is fragile and hardly stable.

Capitalism is the idea that capital money can be invested to build systems and to make things that would improve our productivity. Many people use the terms of free market and capitalism interchangeably. While capitalism can fuel the free market, but it is not the free market.

Capitalism also can lead to a ratchet called progress. But capitalism also comes with three challenges.

The first challenge is that capitalism encourages monopolistic behaviors. A monopoly takes away people’s choices. When we do not have a choice, we must do what the capitalist wants us to do.

The second challenge is that capitalism encourages short-term thinking. The short-term thinking comes from that capitalism measures return on investment, and return on investment is time-based. As a result, the short-term thinking of capitalists combined with short-term thinking of consumers produces an environment where nobody is thinking about the long-term.

The third challenge for capitalism is corruption. Without boundaries and left to its own devices, bad actors in the market will attempt to use any means necessary to get an advantage. The outcome is Gresham’s law, where “bad money drives out good.”

On the other hand, the free market dislikes monopoly because the free market works when people have choices. The free market also does not work well when we make it very difficult to build things for the long haul. It is already difficult to focus on quality and meaningful things that will last. The free market also does poorly with the weight of corruption. When a capitalist acts like a bully who is trying to power its way through the rule and structures of the free market, we all lose.

If we care about choice, investing for the long-term, and making progress without the threat of corruptive influence, we must stand up and defend the free market. Defending the free market is not the same as defending capitalism. Crony capitalism is a selfish act that tries to make the market work for itself and walks away from the very idea of the free market. A free market is about making better things and creating a better future for everyone.

主動權

(從我一個尊敬的作家,賽斯·高汀

要獲得主動權的唯一方法就是自己採取主動,因為那不是他人來給的。

有些人猶豫不決去拿,也許是因為他們擔心這主動權會因某種原因而用完。

主動權不會耗盡。它是一種自我更新的資源。

從我們很小的時候開始,我們大多數人就被教導要避免這種情況。做完你的功課。把垃圾拿出去丟。等待著被挑選。等待著別人與你溝通。做的人見人愛。去適應。也許偶爾做的不太同,但不要太突出。失敗永遠會比沒有嘗試來糟糕得多。

另一種選擇是採取主動行動。去代表那些要你來服務的那群人。

放心去做吧,主動權還多的是。

Web Scraping Templates using Python with Selenium

As I work on practicing and solving web scraping problems, I find myself repeating a set of steps and activities repeatedly.

Thanks to Dr. Jason Brownlee’s suggestions on creating a machine learning template, I have pulled together a set of project templates that can be used to support web scraping tasks using Python and Selenium.

The Python scripts leverage the Selenium module. You can find the web scraping templates from the Project Templates page.

Binary Classification Model for MiniBooNE Particle Identification Using Python Take 5

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a predictive model using various machine learning algorithms and to document the end-to-end steps using a template. The MiniBooNE Particle Identification dataset is a binary classification situation where we are trying to predict one of the two possible outcomes.

INTRODUCTION: This dataset is taken from the MiniBooNE experiment and is used to distinguish electron neutrinos (signal) from muon neutrinos (background). The data file is set up as follows. The first line is the number of signal events followed by the number of background events. The records with the signal events come first, followed by the background events. Each line, after the first line, has the 50 particle ID variables for one event.

ANALYSIS: The baseline performance of the machine learning algorithms achieved an average accuracy of 91.74%. Two algorithms, k-Nearest Neighbors and eXtreme Gradient Boosting (XGBoost), achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, XGBoost turned in the top overall result and achieved an accuracy metric of 94.22%. By using the optimized parameters, the XGBoost algorithm processed the test dataset with an accuracy of 94.31%, which was consistent with the prediction performance from the training dataset.

CONCLUSION: For this iteration, the XGBoost algorithm achieved the best overall results using the training and test datasets. For this dataset, XGBoost should be considered for further modeling.

Dataset Used: MiniBooNE Particle Identification Data Set

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/MiniBooNE+particle+identification

The HTML formatted report can be found here on GitHub.