Month: February 2019

Drucker on Functioning Communications, Part 1

In his book, The Essential Drucker: The Best of Sixty Years of Peter Drucker’s Essential Writings on Management, Peter Drucker analyzed the ways that management practices and principles affect the performance of organizations, individuals, and society. The book covers the basic principles of management and gives professionals the tools to perform the tasks that the environment of tomorrow will require of them.

These are my takeaways from reading the book.

Drucker believed that there are four fundamentals of communications. Three of them are:

  1. Communication is perception.
  2. Communication is expectation.
  3. Communication makes demands.

True communication can only begin when the receiver perceives something from the sender and responds back. This receiver-first concept means that it is the recipient who initiates communication, not sender. Until the receiver responds with something back to the sender, there is no communication, only noise.

Communication can happen only when the sender attempts to communicate in the recipient’s language. Therefore, Drucker suggested that the first question in communicating must be, Is this communication within the recipient’s range of perception? Can he receive it?

Effective communication also must account for expectation. As a rule, we see largely what we expect to see and hear largely what we expect to hear. We also tend to tune out sights and sounds that we did not expect to take in and process. For us, those extraneous sights and sounds are merely noise.

That is because the human mind attempts to fit impressions and stimuli into a frame of expectations. We resist vigorously any attempts to make us “change our minds” or perceive anything that is contrary to our expectations or breaks our psychological continuity.

It is imperative that we must know what the recipient expects to see and hear before we attempt to communicate. Only then can we know whether communication can utilize the other person’s expectations to receive the message or be more receptive to the message that might consider being contrary or disruptive to the expectations.

Communication always makes some demands. It always demands that the recipient become somebody, do something, or believe something. If communication fits in with the aspirations, the values, the purposes of the recipient, it is powerful because it appeals to motivation. If communication goes against the aspirations, the values, or the motivations, it is likely not to be received or to be resisted by the recipient.

Binary Classification Model for Customer Transaction Prediction Using Python (Decision Trees with Full Features)

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a prediction model using various machine learning algorithms and to document the end-to-end steps using a template. The Santander Bank Customer Transaction Prediction competition is a binary classification situation where we are trying to predict one of the two possible outcomes.

INTRODUCTION: Santander Bank’s data science team wants to identify which customers will make a specific transaction in the future, irrespective of the amount of money transacted. The bank is continually challenging its machine learning algorithms to make sure they can more accurately identify new ways to solve its most common challenges such as: Will a customer buy this product? Can a customer pay this loan?

For this iteration, we will examine the effectiveness of the Decision Trees algorithm with the full set of features for this problem. Submissions are evaluated on the area under the ROC curve between the predicted probability and the observed target.

ANALYSIS: The baseline performance achieved an average ROC-AUC score of 0.5525. After a series of tuning trials, the top result from the training data was a ROC-AUC score of 0.5646. By using the optimized parameters, the algorithm processed the test dataset with a ROC-AUC score of 0.5604.

CONCLUSION: To be determined after comparing the results from other machine learning algorithms.

Dataset Used: Santander Customer Transaction Prediction

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://www.kaggle.com/c/santander-customer-transaction-prediction/data

One potential source of performance benchmark: https://www.kaggle.com/c/santander-customer-transaction-prediction/overview

The HTML formatted report can be found here on GitHub.

Psephology

In his podcast, Akimbo, Seth Godin teaches us how to adopt a posture of possibility, change the culture, and choose to make a difference. Here are my takeaways from the episode.

Our current system of voting for decision-making has many flaws. Here are some thoughts on what those flaws are and what we can do about them.

Voting was a mechanism created where groups can speak up. When people vote, most of us do not necessarily want to own the outcome. What most people want are stability and dignity. We want dignity so we know that we can speak up about things that we care about.

We also want the stability that comes from living in a world that is predictable, where we can plan our lives. Many have proposed changes to the voting system, and those changes can be disruptive. That is one of the many reasons why the way we vote has not changed for a long time.

The current system is also broken in a couple of ways. The first reason is that candidates have figured out that the more extreme they are, it increases his odds at winning the election. By taking an extreme position, it is easier to get voters’ attention and to get the people on the other side not to vote.

Another reason why we have a broken system is the media. The special interests in the media have an incentive to make a loud noise because it sells papers and advances their interests. As a result, the amplification of the noise and the hatred for the other side over and over cannot help but divide us.

So can we improve the system to make the voting model more effective? Should we consider voting for a pool of candidates on a sliding scale? We give a nine to candidate #1 and 5 to candidate #2, and so on. The candidate that accumulate the most “likes” wins the election.

Should we consider giving different groups of people different weight when voting on an issue? Do all voters have an identical feeling or stake towards a certain issue? The following is my example (not Seth’s). If we were to vote on the issue of women’s reproductive rights, should we give more weight to women’s votes, or should we even consider letting women only to vote on that issue?

Should we consider revising our voting approach based on how we make decisions in a business setting? In a business setting, different issues get decided or voted on by different people, different groups of interest or authority. We also do not practice anonymous voting in a business setting, and we seem to be OK with all those arrangements.

The Internet is the biggest voting machine we ever built. We vote on many things with our like’s and opinions all the time. The technologies we use for social media can help facilitate different voting approaches if we can apply the technologies correctly. When we are trying to spread an idea, we should seriously consider how best to leverage the technologies to help us make more impact on the people that you seek to serve.

Binary Classification Model for Customer Transaction Prediction Using Python (Logistic Regression with Full Features)

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.

SUMMARY: The purpose of this project is to construct a prediction model using various machine learning algorithms and to document the end-to-end steps using a template. The Santander Bank Customer Transaction Prediction competition is a binary classification situation where we are trying to predict one of the two possible outcomes.

INTRODUCTION: Santander Bank’s data science team wants to identify which customers will make a specific transaction in the future, irrespective of the amount of money transacted. The bank is continually challenging its machine learning algorithms to make sure they can more accurately identify new ways to solve its most common challenges such as: Will a customer buy this product? Can a customer pay this loan?

For this iteration, we will examine the effectiveness of the Logistic Regression algorithm with the full set of features for this problem. Submissions are evaluated on the area under the ROC curve between the predicted probability and the observed target.

ANALYSIS: The baseline performance achieved an average ROC-AUC score of 0.8523. After a series of tuning trials, the top result from the training data was a ROC-AUC score of 0.8593. By using the optimized parameters, the algorithm processed the test dataset with a ROC-AUC score of 0.6276.

CONCLUSION: To be determined after comparing the results from other machine learning algorithms.

Dataset Used: Santander Customer Transaction Prediction

Dataset ML Model: Binary classification with numerical attributes

Dataset Reference: https://www.kaggle.com/c/santander-customer-transaction-prediction/data

One potential source of performance benchmark: https://www.kaggle.com/c/santander-customer-transaction-prediction/overview

The HTML formatted report can be found here on GitHub.

Kaggle Competition: Banco Santander Customer Transaction Prediction

If you are new to Python machine learning like me, you might find the current Kaggle competition “Santander Customer Transaction Prediction” interesting.

The competition is essentially a binary classification problem with a decently large dataset (200 attributes and 200,000 rows of training data). I have not participated in Kaggle competition before and will use this one to get some learning under the belt.

I plan to run the training data through a list of machine learning algorithms (see below) and iterate them through three stages. This blog post will serve as the meta post that summarizes the progress.

The current plan with the milestones are as follow:

Stage 1: Gather the Baseline Performance.

  • LogisticRegression: targeted Monday 25 February 2019
  • DecisionTreeClassifier: targeted Wednesday 27 February 2019
  • KNeighborsClassifier: targeted Friday 1 March 2019
  • BaggingClassifier: targeted Monday 4 March 2019
  • RandomForestClassifier: targeted Wednesday 6 March 2019
  • ExtraTreesClassifier: targeted Friday 8 March 2019
  • GradientBoostingClassifier: TBD

Stage 2: Feature Selection using the Attribute Importance Ranking technique

  • LogisticRegression: TBD
  • DecisionTreeClassifier: TBD
  • KNeighborsClassifier: TBD
  • BaggingClassifier: TBD
  • RandomForestClassifier: TBD
  • ExtraTreesClassifier: TBD
  • GradientBoostingClassifier: TBD

Stage 2: Feature Selection using the Recursive Feature Elimination technique

  • LogisticRegression: TBD
  • DecisionTreeClassifier: TBD
  • KNeighborsClassifier: TBD
  • BaggingClassifier: TBD
  • RandomForestClassifier: TBD
  • ExtraTreesClassifier: TBD
  • GradientBoostingClassifier: TBD

I will post all Python script in a folder on GitHub. The final submission deadline is 10 April 2019.

Feel free to take a look at the scripts and experiment. Who knows, you might have something you can turn in by the time April comes around. Happy learning and good luck!