Month: January 2019

Drucker on Effective Decisions, Part 4

In his book, The Essential Drucker: The Best of Sixty Years of Peter Drucker’s Essential Writings on Management, Peter Drucker analyzed the ways that management practices and principles affect the performance of organizations, individuals, and society. The book covers the basic principles of management and gives professionals the tools to perform the tasks that the environment of tomorrow will require of them.

These are my takeaways from reading the book.

In the chapter “Effective Decisions,” Drucker discussed the five aspects of the effective decision-making process.

Step 1. The decision-maker reaches a clear realization that the problem was a generic situation and not a random exception.

Step 2. The decision-maker understand the specifications that the answer to the problem had to satisfy.

Step 3. The decision-maker thinks through what is the “right” solution.

Step 4. The decision-maker builds actions into the decision.

Step 5. The decision-maker gathers feedback that tests the validity and effectiveness of the decision.

After steps one through four, Drucker asserted that we must build feedback into the decision process. The purpose of the feedback process is to test our expectations that underlie the decision against actual events.

The feedback is necessary because humans make decisions and human beings are fallible. Even the best decision has a high probability of being wrong. Decisions also might have a long shelf life. Even the most effective one eventually becomes obsolete.

Drucker saw the feedback step would be even more critical with the information age. With the help of computers in decision-making, we run a risk that the decision-makers are removed from the reality. Drucker suggested that we verify the abstractions with constant checks against the concrete. Otherwise, we run the danger that we will be making decisions using assumptions that are not in alignment with the reality. Computers can make the laborious work of feedback verification easier through automation.

Drucker encouraged us to go out and look for evidence to test our assumptions about a decision or results of a decision compared against the reality. Reality never stands still for very long, so we all need organized information for the feedback.

In summary, Drucker believed that effective people do not make many decisions. Instead, they concentrate on important decisions. The important decisions will be strategic and generic, rather than tailored to solve one particular problem. Effective people also try to make the few important decisions on the highest level of conceptual understanding. They try to find the constants in a situation.

Most importantly, effective people know that the most time-consuming step in the process is not making the decision but putting it into action. Unless a decision has “degenerated into work,” it is not a decision; it is at best a good intention. While the effective decision is based on the highest level of conceptual understanding, the action to carry it out should be as close as possible to the working level and as simple as possible.

Time Series Model for Annual Water Usage in Baltimore Using Python

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

Code Credit: Adapted from a blog post made available by Dr. Jason Brownlee of Machine Learning Mastery.

PREFACE: This is a replication of Python code from Dr. Brownlee’s blog post on time series. I have combined all the code snippets into one script so that I can turn the whole process into a template. The comments and analysis were also part of the blog post and annotated here to explain each coding block.

SUMMARY: The purpose of this project is to construct a time series prediction model and document the end-to-end steps using a template. The Annual Water Usage in Baltimore dataset is a time series situation where we are trying to forecast future outcomes based on the past data points.

INTRODUCTION: The problem is to predict annual water usage. The dataset provides the annual water usage in Baltimore from 1885 to 1963, or 79 years of data. The dataset contains 79 observations in the units of liters per capita per day and is credited to Hipel and McLeod, 1994.

ANALYSIS: The baseline prediction (or persistence) for the dataset resulted in an RMSE of 21.975. The manually configured model was simplified to ARIMA(4,1,1) and produced an RMSE of 31.097, which was higher than the persistent model. After applying the grid search technique to the dataset, the final RMSE of the ARIMA(2,1,0) model was 21.733. This is only a slightly smaller error than the persistent model, and it may or may not be statistically different.

CONCLUSION: The final RMSE for the validation period is predicted at 16 liters per capita per day. This is not too different from the expected error of 21, but we would expect that it is also not too different from a simple persistence model. The forecast does have the characteristics of a persistence forecast. This suggests that although this time series does have an obvious trend, it is still a reasonably difficult problem.

Dataset Used: Annual Water Usage in Baltimore

Dataset ML Model: Time series forecast with numerical attributes

Dataset Reference: https://datamarket.com/data/set/22sl/baltmore-city-annual-water-use-liters-per-capita-per-day-1885-1968#!ds=22sl&display=line

One potential source of performance benchmark: https://machinelearningmastery.com/time-series-forecast-study-python-annual-water-usage-baltimore/

The HTML formatted report can be found here on GitHub.

The Chicken and the Egg, Part 1

In his podcast, Akimbo, Seth Godin teaches us how to adopt a posture of possibility, change the culture, and choose to make a difference. Here are my takeaways from the episode.

Through the mechanics of genetic inheritance and evolution of species, Seth explains how ideas and culture work in a very similar way.

For genes and species:

  • Two creatures get together and create a third creature. Another word, two sets of genes combined to create a third set. The baby creature is not the replica of either parent but inherits, randomly, many of the traits that the parents carry.
  • The traits that help a creature survive the environment get passed on to the offspring. The traits that did not help the species survive long enough to produce offspring die off eventually because the new environment isn’t hospitable to those baby creatures.
  • Over time, the randomized changes in trait add up. The species and the traits they carry either evolve successfully in surviving the environment, or they die off and become extinct.
  • Given enough time and trait changes, species can also evolve away from one branch to form an entirely different branch.
  • Often, we may say a species has adapted and survived. That sounds like a planned move on the species part, but that is not the case. The species were not responding to the outside world, as Mother Nature operates on her terms and timelines. The outside world determines whether those traits get passed on, and the species have little say in the evolution process.

looking at our culture through the eyes of genetics and evolution

  • Ideas are like genes. They are often inherited from other ideas but with some mutation.
  • Some ideas spread and some do not – nothing is guaranteed. When a culture adapts an idea, ideas that help the culture to sustain or prosper stand a much better chance to survive.
  • Ideas that are part of a dying or extinct culture eventually die off along with the culture.
  • The world keeps changing as ideas spread and cultures adapt or do not adapt
  • Changes in ideas and cultures can add up. Given enough time, a culture will spread and even evolve away from its original set of ideas and beliefs.

Along the way, the myth developed that one solid, well-formed idea born out of nowhere is how great ideas appear. “Oh yeah, a genius thought that one up.” But that is not actually what happens.

As a human society, we have built an incredibly fertile ground for ideas to replicate and to spread. Ideas continue to change the culture and may end up making themselves extinct. Along the way, ideas can replicate, mutate, and became something completely unrecognizable to the person who originally put the idea into the world. That is what our culture is, the sum-total of all the ideas we have intercepted and spread to others.

Multi-Class Classification Model for Human Activities and Postural Transitions Using Python Take 1

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a prediction model using various machine learning algorithms and to document the end-to-end steps using a template. The Human Activities and Postural Transitions dataset is a classic multi-class classification situation where we are trying to predict one of the 12 possible outcomes.

INTRODUCTION: The research team carried out experiments with a group of 30 volunteers who performed a protocol of activities composed of six basic activities. There are three static postures (standing, sitting, lying) and three dynamic activities (walking, walking downstairs and walking upstairs). The experiment also included postural transitions that occurred between the static postures. These are stand-to-sit, sit-to-stand, sit-to-lie, lie-to-sit, stand-to-lie, and lie-to-stand. All the participants were wearing a smartphone on the waist during the experiment execution. The research team also video-recorded the activities to label the data manually. The research team randomly partitioned the obtained data into two sets, 70% for the training data and 30% for the testing.

In the current iteration Take1, the script will focus on evaluating various machine learning algorithms and identifying the model that produces the best overall metrics. Because the dataset has many attributes that are collinear with other attributes, we will eliminate the attributes that have a collinearity measurement of 99% or higher. Iteration Take1 will establish the baseline performance for accuracy and processing time.

ANALYSIS: In the current iteration Take1, the baseline performance of the machine learning algorithms achieved an average accuracy of 88.52%. Two algorithms (Linear Discriminant Analysis and Stochastic Gradient Boosting) achieved the top accuracy metrics after the first round of modeling. After a series of tuning trials, Linear Discriminant Analysis turned in the top overall result and achieved an accuracy metric of 94.19%. By using the optimized parameters, the Linear Discriminant Analysis algorithm processed the testing dataset with an accuracy of 94.71%, which was even better than the training data.

From the model-building perspective, the number of attributes decreased by 108, from 561 down to 453.

CONCLUSION: For this iteration, the Linear Discriminant Analysis algorithm achieved the best overall results. For this dataset, we should consider using the Linear Discriminant Analysis algorithm for further modeling or production use.

Dataset Used: Smartphone-Based Recognition of Human Activities and Postural Transitions Data Set

Dataset ML Model: Multi-class classification with numerical attributes

Dataset Reference: https://archive.ics.uci.edu/ml/datasets/Smartphone-Based+Recognition+of+Human+Activities+and+Postural+Transitions

The HTML formatted report can be found here on GitHub.

Web Scraping of Merely Do It Blog Entries Using Python and Scrapy

SUMMARY: The purpose of this project is to practice web scraping by gathering specific pieces of information from a website. The web scraping code was written in Python and leveraged the Scrapy framework.

INTRODUCTION: David Lowe hosts his blog at merelydoit.blog. The purpose of this exercise is to practice web scraping by gathering the blog entries from Merely Do It’s RSS feed. This iteration of the script automatically traverses the RSS feed to capture all entries from the blog site.

Starting URLs: https://merelydoit.blog/feed or https://merelydoit.blog/feed/?paged=1

The source code and JSON output can be found here on GitHub.