Machine learning, a type of artificial intelligence that "learns" as it identifies new patterns in data, enables data scientists to effectively pinpoint revenue opportunities and create strategies to improve customer experiences using information hidden in huge data sets.
Selecting the right algorithm is a key part of any machine learning project, and because there are dozens to choose from, understanding their strengths and weaknesses in various business applications is essential. Below are five of the most common machine learning algorithms and some of their potential use cases.
Decision trees use directed graphs to model decision making; each node on the graph represents a question about the data (“Is income greater than $70,000?”) and the branches stemming from each node represent the possible answers to that question. Compounding hundreds or even thousands of these decision trees is an “ensemble” method called a random forest.
Though highly accurate, random forests are often dubbed black box models because they are complex to the point that they can be difficult to interpret. For example, understanding how a random forest model approves or denies a loan could involve sifting through thousands of finely-tuned decisions. Nevertheless, random forest models are popular due to their high accuracy and relatively low computational expense. They are used for a wide variety of applications including churn modeling and customer segmentation.
The goal of artificial neural network machine learning algorithms is to mimic the way the human brain organizes and understands information in order to arrive at various predictions. In artificial neural networks, information is passed through an input layer, a hidden layer, and an output layer. The input and output layers can be comprised of raw features and predictions, respectively. The hidden layer in between consists of many highly interconnected neurons capable of complex meta-feature engineering. As the neural network “learns” the data, the connections between these neurons are fine-tuned until the network yields highly accurate predictions.
This biological approach to computation allows neural networks to excel at some of the most challenging, high-dimensional problems in artificial intelligence, such as speech and object recognition, image segmentation, and natural language processing. Like random forests, neural networks are difficult — if not impossible — to interpret without the use of tools like Skater, an open source model interpretation package. This means that data scientists will often defer to simpler machine learning algorithms unless their analysis demands superior accuracy.
Logistic regression, which is borrowed from the field of classical statistics, is one of the simpler machine learning algorithms. This machine learning technique is commonly used for binary classification problems, meaning those in which there are two possible outcomes that are influenced by one or more explanatory variables. The algorithm estimates the probability of an outcome given a set of observed variables.
Where logistic regression differs from other methods is in its interpretability. Since this algorithm is derived from the highly interpretable linear regression algorithm, the influence of each data feature can be interpreted without much effort. As a result, logistic regression is often favored when interpretability and inference is paramount. This versatile algorithm is used to determine the outcome of binary events such as customer churn, marketing click-throughs, or fraud detection.
Kernel methods are a group of machine learning algorithms used for pattern analysis, which involves organizing raw data into rankings, clusters, or classifications. These methods allow data scientists to apply their domain knowledge of a given problem by building custom kernels that incorporate the data transformations that are most likely to improve the accuracy of the overall mode The most popular application of kernels is the support vector machine (SVM), which builds a model that classifies new data as belonging to one category or another based on a set of training examples. A SVM makes these determinations by representing each example as a point in a multi-dimensional space called a hyperplane. The points are then separated into categories by maximizing the distance (called a “margin”) between the different apparent groups in the data.
Kernel methods are useful you have domain knowledge pertaining to the decision boundaries beforehand, which usually isn't true except for the most common problems. As a result, practitioners usually opt for a more “out-of-the-box” machine learning algorithm.
Clustering is a type of unsupervised learning, which is used when working with data that does not have defined categories or groups (unlabeled data). The goal of k-means clustering is to find distinct groups in the data based on inherent similarities between them rather than predetermined labels. K represents the total number of unique groups the algorithm will create. Each example is assigned to one group or another based on similarity to other examples across a set of characteristics called features. K-means clustering is useful for business applications like customer segmentation, inventory categorization, and anomaly detection.
Ultimately, the best machine learning algorithm to use for any given project depends on the data available, how the results will be used, and the data scientist's domain expertise on the subject. Understanding how they differ is a key step to ensuring that every predictive model your data scientists build and deploy delivers valuable results.