One , summary
Bayesian network is a graphical model used to represent the connection probability between variables , It provides a natural way to express causal information , Used to discover potential relationships between data . In this network , Using nodes to represent variables , Directed edges represent the dependence of variables .
Bayesian method has its unique expression of uncertain knowledge , Rich probability expression ability , The incremental learning feature of prior knowledge has become one of the most attractive focuses in many data mining methods .
1.1 The history of Bayesian Networks
1.2 Basic viewpoints of Bayesian method
The characteristic of Bayesian method is to use probability to express all forms of uncertainty , Learning and other forms of reasoning are implemented by probability rules .
The result of Bayesian learning is expressed as the probability distribution of random variables , It can be explained by the degree of trust we have in different possibilities .
The starting point of Bayesian school is two works of Bayes ： Bayesian theorem and Bayesian hypothesis .
Bayesian theorem relates the prior probability and the posterior probability of events .
Supplementary knowledge ：
（1） Prior probability ： The prior probability refers to the probability of occurrence of each event determined by historical data or subjective judgment . This kind of probability has not been proved by experiment , Probability before test , So it's called a priori probability . A priori probability is generally divided into two categories , One is objective prior probability , It refers to the probability calculated by using historical data in the past ; The second is subjective prior probability , It refers to when there is no or incomplete historical data , The probability of acquisition can only be judged by people's subjective experience .
（2） Posterior probability ： Posteriori probability generally refers to the use of Bayesian formula , New additional information was obtained by means of investigation , By modifying the prior probability, the more realistic probability can be obtained .
（3） joint probability ： Joint probability is also called multiplication formula , Is the probability of the product of two arbitrary events , Or the probability of intersection events .
Assumed random vector x,θ The joint distribution density of is p(x,θ), Their marginal densities are p(x),P(θ). In general, the x It's the observation vector ,θ
Is an unknown parameter vector , The estimation of unknown parameter vector is obtained by observing vector , Notes on Bayes theorem ：
, among π(θ) yes θ Prior distribution of .
The general method of Bayesian method for unknown parameter vector estimation is as follows ：
（1） The unknown parameters are regarded as random vectors , This is the biggest difference between Bayesian method and traditional parameter estimation method .
（2） According to the previous parameters θ Knowledge of , Determine prior distribution π(θ), It is a controversial step in Bayesian method , Therefore, it is attacked by the classical statistical community .
（3） Calculate the posterior distribution density , Infer the unknown parameters .
In the （2） step , If there is no prior knowledge to help determine π(θ)
, Bayes proposed that uniform distribution can be used as its distribution , That is, the parameter is within its range of variation , The opportunity to take every worthwhile opportunity is the same , This assumption is called Bayesian hypothesis .
1.3 Application fields of Bayesian Networks
Assistant intelligent decision making ：
Data fusion ：
pattern recognition ：
medical diagnosis ：
Text understanding ：
data mining ：1, Bayesian method for classification and regression analysis ;2, It is used for causal reasoning and uncertain knowledge expression ;3, For clustering pattern discovery .
Two , Basis of Bayesian probability theory
2.1, Fundamentals of probability theory
2.2, Bayesian probability
（1） Prior probability ：
（2） Posterior probability ：
（3） joint probability ：
（4） Total probability formula ： set up B1,B2,⋅⋅⋅,Bn Two events are mutually exclusive , And P(Bi)>0,i=1,2,⋅⋅⋅,n,B1+B2+⋅⋅⋅+Bn=Ω
be A=AB1+AB2+⋅⋅⋅+ABn , Namely P(A)=∑ni=1P(Bi)P(A|Bi)
. From this we can see the full probability company as “ Infer the result from the reason ”, Each cause has a certain effect on the outcome “ effect ”, That is, the possibility of the results and various reasons “ effect ” It's about size . The total probability formula expresses the relationship between them .
（5） Bayes formula ： Bayes formula is also called posterior probability formula , Also called inverse probability formula , It has a wide range of uses . Let the prior probability be P(Bi), The new additional information obtained from the survey is P(Aj|Bi),(i=1,2,⋅⋅
⋅,n;j=1,2,⋅⋅⋅,m) , Then the posterior probability calculated by Bayes formula is ( The formula doesn't look like this , To be verified , Please tell me why )：
* Any complete probability model must have representation ( Direct or indirect ) The ability of joint distribution of variables in this field . A complete enumeration requires an exponential scale ( Relative to the number of domain variables ).
* Bayesian networks provide a compact representation of this joint probability distribution ： Decompose the joint distribution into the product of several local distributions ：P(x1,x2,⋅⋅⋅,xn)=∏iP(xi|π)
* It can be seen from the formula , The number of parameters required increases linearly with the number of nodes in the network , However, the calculation of joint distribution increases exponentially
* The specification of independence among variables in network is the key to achieve compact representation . This independence relationship is particularly effective in Constructing Bayesian networks from human experts .
Three , Simple Bayesian learning model
Simple Bayesian learning model will train examples I Decomposed into eigenvectors X And decision category variables C
. The simple Bayesian model assumes that the components of the eigenvector are relatively independent relative to the decision variables , In other words, each component acts independently on the decision variables . Although this assumption limits the application of simple Bayesian model to some extent , But in practice , It not only reduces the complexity of Bayesian network construction by exponential level , And in many areas , Against this assumption , Simple Bayes also show considerable robustness and efficiency , It has been successfully applied to classification , Clustering and model selection are important tasks in data mining .
- The structure is simple – There are only two layers
- The inference complexity is linear with the number of network nodes
Design sample A Expressed as an attribute vector , If the property is independent of a given category , that P(A|Ci) It can be decomposed into the product of several components ：
P(a1|Ci)∗P(a2|Ci)∗⋅⋅⋅∗P(am|Ci) ai It's a sample A Of the i Properties . that ,P(Ci|A)=P(Ci)P(A)∏j=1mP(aj|Ci)
This process is called simple Bayesian classification (SBC:Simple Bayesian
Classifier). people say that , Only when the independence hypothesis holds ,SBC In order to obtain the classification efficiency with the best accuracy ; Or when the attribute correlation is small , It can obtain the approximate optimal classification effect .