《 From machine learning to deep learning 》 note （2） Unsupervised learning - Blog

[{"createTime":1735734952000,"id":1,"img":"hwy_ms_500_252.jpeg","link":"https://activity.huaweicloud.com/cps.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=V1g3MDY4NTY=&utm_medium=cps&utm_campaign=201905","name":"华为云秒杀","status":9,"txt":"华为云38元秒杀","type":1,"updateTime":1735747411000,"userId":3},{"createTime":1736173885000,"id":2,"img":"txy_480_300.png","link":"https://cloud.tencent.com/act/cps/redirect?redirect=1077&cps_key=edb15096bfff75effaaa8c8bb66138bd&from=console","name":"腾讯云秒杀","status":9,"txt":"腾讯云限量秒杀","type":1,"updateTime":1736173885000,"userId":3},{"createTime":1736177492000,"id":3,"img":"aly_251_140.png","link":"https://www.aliyun.com/minisite/goods?userCode=pwp8kmv3","memo":"","name":"阿里云","status":9,"txt":"阿里云2折起","type":1,"updateTime":1736177492000,"userId":3},{"createTime":1735660800000,"id":4,"img":"vultr_560_300.png","link":"https://www.vultr.com/?ref=9603742-8H","name":"Vultr","status":9,"txt":"Vultr送$100","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":5,"img":"jdy_663_320.jpg","link":"https://3.cn/2ay1-e5t","name":"京东云","status":9,"txt":"京东云特惠专区","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":6,"img":"new_ads.png","link":"https://www.iodraw.com/ads","name":"发布广告","status":9,"txt":"发布广告","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":7,"img":"yun_910_50.png","link":"https://activity.huaweicloud.com/discount_area_v5/index.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=aXhpYW95YW5nOA===&utm_medium=cps&utm_campaign=201905","name":"底部","status":9,"txt":"高性能云服务器2折起","type":2,"updateTime":1735660800000,"userId":3}]

The premise of supervised learning to solve the classification problem is that there must be a sample set with label data , But the price of getting data label is often very expensive . meanwhile , These labels are usually manually labeled , Marking errors also occur from time to time . This promotes the development of unsupervised learning strategies , In short, it is ：
Machine learning method for reasoning on unlabeled data .
1. scene

Because the premise of unsupervised learning is that there is no need for early human judgment , Therefore, it is generally used as a pre-step of a learning task , For specification data ; After unsupervised learning , Human knowledge needs to be added to make the results useful . chart 1-10 The two learning strategies are compared from the time point of human knowledge .

chart 1-10 Supervised learning and unsupervised learning
Generally speaking, it is easier for human beings to understand the data regulated by unsupervised learning than to organize the labels in the sample data , So in general, unsupervised learning requires less human participation .
Unsupervised learning algorithm is rich , There are two main branches in the way data are organized ：
 clustering （Clustering）： Is the most important unsupervised learning method , It refers to dividing the existing sample data into several subsets . The generated model can also be used to categorize new samples .
 Dimension reduction （Dimensionality Reduction）： That is to keep the existing distance relationship between data unchanged , Converting high dimensional data to low dimensional data ,.
In addition, there are some small groups of algorithms, such as covariance analysis （Covariance Estimation）, edge detection （Outlier Detection） etc .

chart 1-11 An example is given to illustrate the application scenarios of clustering as the most important unsupervised learning method . It is a clustering diagram of bank customers , It divides the existing customers into two subsets . After cluster training , New customers can also be divided into corresponding subsets by existing models .

chart 1-11 Examples of clustering scenarios

Clustering only provides a subset partition scheme , The logical meaning of division needs human beings to distinguish . In the figure 1-11 in , From the results, the algorithm divides all customers into two categories according to the amount of deposits and loans . For most banks , Possible subset 1 Corresponding to ordinary users , subset 2 Corresponding to important customers .
2. clustering algorithm
Clustering algorithm is still a developing field , Various methods are complicated . This book mainly studies several mature clustering strategies at present , They are ：
 Distance segmentation method （Partition Methods）： It is a basic algorithm , Clustering is performed according to the distance between features . The specific algorithm mainly refers to K-means Sum and its derivation algorithm .
 Density method （Density
Methods）： The partition is realized by defining the minimum number of members and the distance between members of each subset . The most typical algorithm is DBSCAN, Namely Density-Based Spatial
Clustering of Applications with Noise.
 Model method （Model Methods）： Using probability model （ Gaussian mixture model is a typical model , Namely Gaussian Mixture
Model） And neural network model （SOM,Self Organizing
Maps） As the main representative . It is characterized by incomplete identification of samples as belonging to a subset , It points out the possibility that the sample belongs to each subset .
 hierarchical method （Hierarchical
Methods）： Unlike other clusters, the population is divided into subsets with equal status , Finally, the hierarchical method divides the data set into tree structure with parent-child relationship . In this way, we can study the relationship between the subclasses at the same time of clustering , It's typical birch Model .
3. Dimension reduction algorithm
As mentioned above , Dimension reduction is usually used to compress the number of features for subsequent processing , Compared with clustering, it is a little abstract . This book introduces two types of dimension reduction strategies ：
 Linear dimensionality reduction ： As the name suggests, it is used to deal with linear problems . The model is simple , Including the common principal component analysis （PCA,Principle Component
Analysis） And linear discriminant analysis （LDA,Linear Discriminant Analysis）
 Popular learning （Manifold
Learning）： It is a hot spot in recent academic circles , It can deal with non-linear dimension reduction . At present, more mature algorithms include Isomap, Local linear embedding （LLE,Locally Linear
Embedding） etc .
This book 4,5 The main algorithms of clustering and dimensionality reduction are discussed in detail .

** Learning from machines , To deep learning
Learning from depth , To intensive learning
From reinforcement learning , To intensive learning
From optimization model , Transfer learning to model
One book is done !
**

Technology

Java296 blogs
Python265 blogs
Vue125 blogs
C Language122 blogs
Algorithm108 blogs
MySQL96 blogs
Flow Chart85 blogs
JavaScript79 blogs
More...