代码人生

顶级的机器学习算法,你应该知道成为一名数据科学家

代码人生 http://www.she9.com 2018-10-08 09:05 出处:网络 编辑:@技术狂热粉
机器学习算法简介 有两种方法可以分类你可能在这个领域遇到的机器学习算法。

顶级的机器学习算法,你应该知道成为一名数据科学家

机器学习算法简介


有两种方法可以分类你可能在这个领域遇到的机器学习算法。


第一种是根据学习方式对算法进行分组。


第二种是通过相似的形式或功能对算法进行分组。


一般来说,这两种方法都是有用的。但是,我们将重点关注算法的相似度分组,并对各种不同的算法类型进行介绍。

机器学习算法按学习方式分组

顶级的机器学习算法,你应该知道成为一名数据科学家

有不同的方法,算法可以建模一个问题,因为它涉及到与经验的交互。但是,不管我们如何调用输入数据都没有关系。此外,算法在机器学习和人工智能教科书中也很流行。也就是说,首先要考虑算法能够适应的学习方式。一般来说,机器学习算法只能有几种主要的学习方式。我们也会讲到。同样,我们也没有几个算法和问题类型的例子。基本上,这种组织机器学习算法的方法是非常有用的。因为它迫使您考虑输入数据和模型准备过程的角色。此外,选择一个最适合你的问题,以得到最好的结果。让我们来看看机器学习算法中的三种不同的学习方式:

监督式学习

顶级的机器学习算法,你应该知道成为一名数据科学家

基本上,在这种有监督的机器学习中,输入数据被称为训练数据,并且有一个已知的标签或结果,例如垃圾邮件/非垃圾邮件或一次的股票价格。在这个过程中,通过一个训练过程来准备一个模型。同样,在这方面需要做出预测。当这些预测是错误的时候,它就会被修正。训练过程一直持续到模型达到期望的水平。


1、示例问题是分类和回归。


2、示例算法包括逻辑回归和反向传播神经网络。

无监督学习

顶级的机器学习算法,你应该知道成为一名数据科学家

In this Unsupervised Machine Learning, input data is not labeled and does not have a known result. We have to prepare model by deducing structures present in the input data. This may be to extract general rules. It may be through a mathematical process to reduce redundancy.

  • Example problems are clustering, dimensionality reduction, and association rule learning.

  • Example algorithms include the Apriori algorithm and k-Means.

Semi-Supervised学习

顶级的机器学习算法,你应该知道成为一名数据科学家

输入数据是标记和未标记示例的混合。有一个期望的预测问题。但是,模型必须学习组织数据和做出预测的结构。

  • Example problems are classification and regression.

  • Example algorithms are extensions to other flexible methods. That make assumptions about how to model the unlabeled data.

相似度分组算法

ML算法通常根据其功能的相似性进行分组。例如,基于树的方法,和神经网络启发的方法。我认为这是对机器学习算法进行分组最有用的方法这也是我们在这里要用到的方法。这是一种有用的分组方法,但并不完善。仍然有一些算法可以很容易地归入多个类别。比如学习矢量量化。这既是一种神经网络方法,也是一种基于实例的方法。还有一些类别具有相同的名称。这就描述了问题和算法。如回归和聚类。我们可以通过两次列出ML算法来处理这些情况。要么通过主观地选择最“最”适合的群体。我喜欢后一种方法,即不重复算法以保持简单。

Regression Algorithms

顶级的机器学习算法,你应该知道成为一名数据科学家


  • Ordinary Least Squares Regression (OLSR)

  • Linear Regression

  • Logistic Regression

  • Stepwise Regression

  • Multivariate Adaptive Regression Splines (MARS)

  • Locally Estimated Scatterplot Smoothing (LOESS)

Instance-based Algorithms

顶级的机器学习算法,你应该知道成为一名数据科学家

该模型是一个带有实例训练数据的决策问题。这对于模型来说是重要的或必需的。这些方法建立了一个示例数据数据库。它需要将新数据与数据库进行比较。为了进行比较,我们使用相似度测度来找到最佳匹配并进行预测。因此,基于实例的方法也称为赢者通吃的方法和基于记忆的学习。重点放在存储实例的表示上。因此,实例之间使用的相似性度量。最流行的基于实例的算法是:

  • k-Nearest Neighbor (kNN)

  • Learning Vector Quantization (LVQ)

  • Self-Organizing Map (SOM)

  • Locally Weighted Learning (LWL)

Regularization Algorithms

顶级的机器学习算法,你应该知道成为一名数据科学家

对另一个方法的扩展。这是在惩罚那些与其复杂性相关的模型。此外,倾向于更简单的模型,也更擅长泛化。我在这里列出了正则化算法因为它们很流行,很强大。通常对其他方法进行简单的修改。最流行的正则化算法是:

  • Ridge Regression

  • Least Absolute Shrinkage and Selection Operator (LASSO)

  • Elastic Net

  • Least-Angle Regression (LARS)

Decision Tree Algorithms

顶级的机器学习算法,你应该知道成为一名数据科学家

Decision tree methods construct a model of decisions. That is made based on the actual values of attributes in the data. Decisions fork in tree structures until a prediction decision is made for a given record. Decision trees are trained on data for classification and regression problems. Decision trees are often fast and accurate and a big favorite in Machine Learning. The most popular decision tree algorithms are:

  • Classification and Regression Tree (CART)

  • Iterative Dichotomiser 3 (ID3)

  • C4.5 and C5.0 (different versions of a powerful approach)

  • Chi-squared Automatic Interaction Detection (CHAID)

  • Decision Stump

  • M5

  • Conditional Decision Trees

Bayesian Algorithms

顶级的机器学习算法,你应该知道成为一名数据科学家

These methods are those that apply Bayes’ Theorem for problems. Such as classification and regression. The most popular Bayesian algorithms are:

  • Naive Bayes

  • Gaussian Naive Bayes

  • Multinomial Naive Bayes

  • Averaged One-Dependence Estimators (AODE)

  • Bayesian Belief Network (BBN)

  • Bayesian Network (BN)

Clustering Algorithms

顶级的机器学习算法,你应该知道成为一名数据科学家

Clustering, like regression, describes the class of problem and the class of methods. The Clustering methods are organized by the modeling approaches such as centroid-based and hierarchal. All methods are concerned with using the inherent structures in the data. That is a need to best organize the data into groups of maximum commonality. The most popular clustering algorithms are:

  • k-Means

  • k-Medians

  • Expectation Maximisation (EM)

  • Hierarchical Clustering

Association Rule Learning Algorithms

顶级的机器学习算法,你应该知道成为一名数据科学家

Association rule learning methods extract rules. That best explain observed relationships between variables in data. These rules can discover important and useful associations in large multidimensional datasets. That can be exploited by an organization. The most popular association rule learning algorithms are:

  • Apriori algorithm

  • Eclat algorithm

Artificial Neural Network Algorithms

顶级的机器学习算法,你应该知道成为一名数据科学家

These are models that are inspired by the structure of biological neural networks. They are a class of pattern matching. That we use for regression and classification problems. Although, there is an enormous subfield. As it combines hundreds of algorithms and variations. The most popular artificial neural network algorithms are:

  • Perceptron

  • Back-Propagation

  • Hopfield Network

  • Radial Basis Function Network (RBFN)

Deep Learning Algorithms

顶级的机器学习算法,你应该知道成为一名数据科学家

Deep Learning methods are a modern update to Artificial Neural Networks. That is exploiting abundant cheap computation. They are concerned with building much larger and more complex neural networks. The most popular Deep Learning algorithms are:

  • Deep Boltzmann Machine (DBM)

  • Deep Belief Networks (DBN)

  • Convolutional Neural Network (CNN)

  • Stacked Auto-Encoders

Dimensionality Reduction Algorithms

顶级的机器学习算法,你应该知道成为一名数据科学家

Like clustering methods, dimensionality reduction seeks an inherent structure in the data. Although, in this case, to order to summarize.

Generally, it can be useful to visualize dimensional data. Also, we can use it in a supervised learning method. Many of these methods we adopt for use in classification and regression.

  • Principal Component Analysis (PCA)

  • Principal Component Regression (PCR)

  • Partial Least Squares Regression (PLSR)

  • Sammon Mapping

  • Multidimensional Scaling (MDS)

  • Projection Pursuit

  • Linear Discriminant Analysis (LDA)

  • Mixture Discriminant Analysis (MDA)

  • Quadratic Discriminant Analysis (QDA)

  • Flexible Discriminant Analysis (FDA)

Ensemble Algorithms

顶级的机器学习算法,你应该知道成为一名数据科学家

Basically, these methods are models composed of weaker models. Also, as they are trained and whose predictions are combined in some way to make the prediction. Moreover, much effort is put into what types of weak learners to combine and the ways in which to combine them. Hence, this is a very powerful class of techniques and as such is very popular.

  • Boosting

  • Bootstrapped Aggregation (Bagging)

  • AdaBoost

  • Stacked Generalization (blending)

  • Gradient Boosting Machines (GBM)

  • Gradient Boosted Regression Trees (GBRT)

  • Random Forest

List of Common Machine Learning Algorithms

顶级的机器学习算法,你应该知道成为一名数据科学家

Naïve Bayes Classifier Machine Learning Algorithm

Generally, it would be difficult and impossible to classify a web page, a document, an email. Also, other lengthy text notes manually. This is where Naïve Bayes Classifier Machine Learning algorithm comes to the rescue. Also, a classifier is a function that allocates a population’s element value. For instance, Spam Filtering is a popular application of Naïve Bayes algorithm. Thus, spam filter here is a classifier that assigns a label “Spam” or “Not Spam” to all the emails. Basically, it is amongst the most popular learning method grouped by similarities. That works on the popular Bayes Theorem of Probability. It is a simple classification of words. Also, is defined for the subjective analysis of content.

K Means Clustering Machine Learning Algorithm

Generally, K-means is a used unsupervised Machine Learning algorithm for cluster analysis. Also, K-Means is a non-deterministic and iterative method. Besides, the algorithm operates on a given data set through a pre-defined number of clusters, k. Thus, the output of K Means algorithm is k clusters with input data that is separated among the clusters.

Support Vector Machine Learning Algorithm

Basically, it is a supervised Machine Learning algorithm for classification or regression problems. As in this, the dataset teaches SVM about the classes. So that SVM can classify any new data. Also, it works by classifying the data into different classes by finding a line. That we use to separates the training dataset into classes. Moreover, there are many such linear hyperplanes. Further, in this, SVM tries to maximize a distance between various classes. As that has to involve and this is referred to as margin maximization. Also, if the line that maximizes the distance between the classes is identified. Then the probability to generalize well to unseen data is increased. SVM’s are classified into two categories:

  • Linear SVM’s — Basically, in linear SVM’s the training data i.e. have to separate classifier by a hyperplane.

  • Non-Linear SVM’s- Basically, in non-linear SVM’s it is not possible to separate the training data using a hyperplane.

Apriori Machine Learning Algorithm

Basically, it is an unsupervised Machine Learning algorithm. That we use to generate association rules from a given data set. Also, association rule implies that if an item A occurs, then item B also occurs with a certain probability. Moreover, most of the association rules generated are in the IF_THEN format. For example, IF people buy an iPad THEN they also buy an iPad Case to protect it. The basic principle on which Apriori Machine Learning Algorithm works: If an item set occurs frequently then all the subsets of the item set, also occur frequently. If an item set occurs infrequently. Then all the supersets of the item set have infrequent occurrence.

Linear Regression Machine Learning Algorithm

It shows the relationship between 2 variables. Also, shows how the change in one variable impacts the other. Basically, the algorithm shows the impact on the dependent variable. That depends on changing the independent variable. Thus, the independent variables as explanatory variables. As they explain the factors impact the dependent variable. Moreover, a dependent variable has often resembled the factor of interest or predictor.

Decision Tree Machine Learning Algorithm

Basically, a decision tree is a graphical representation. That makes use of branching method to exemplify all possible outcomes of a decision. Basically, in a decision tree, the internal node represents a test on the attribute. As each branch of the tree represents the outcome of the test. And also the leaf node represents a particular class label. i.e. the decision made after computing all the attributes. Further, we have to represent classification through the path from a root to the leaf node.

Random Forest Machine Learning Algorithm

It is the go-to Machine Learning algorithm. That we use a bagging approach to create a bunch of decision trees with a random subset of the data. Although, we have to train a model several times on random sample of the dataset. That need to achieve good prediction performance from the random forest algorithm. Also, in this ensemble learning method, we have to combine the output of all the decision tree. That is to make the final prediction. Moreover, we derive the final prediction by polling the results of each decision tree.

Logistic Regression Machine Learning Algorithm

Generally, the name of this algorithm could be a little confusing. As Logistic Regression algorithm is for classification tasks and not regression problems. Also, the name "Regression" here implies that a linear model is fit into the feature space. Further, this algorithm applies a logistic function to a linear combination of features. That need to predict the outcome of a categorical dependent variable. Moreover, it was based on predictor variables. The probabilities that describe the outcome of a single trial are modeled as a function. Also, the function of explanatory variables.

结论

我们学习了机器学习算法,也学习了机器学习算法的分类:回归算法,实例化算法,正则化算法,决策树算法,贝叶斯算法,聚类算法,关联规则学习算法,人工神经网络算法,深度学习算法,降维算法,集成算法,监督学习,无监督学习,半监督学习,朴素贝叶斯分类器算法,K表示聚类算法,支持向量机算法,Apriori算法,线性回归和逻辑回归。我们还使用了易于理解机器学习算法的图像。此外,如果你有任何问题,请在评论区留言。


请关注公众号:程序你好
0

精彩评论

暂无评论...
验证码 换一张
取 消