One , summary

Bayesian network is a graphical model used to represent the connection probability between variables , It provides a natural way to express causal information , Used to discover potential relationships between data . In this network , Using nodes to represent variables , Directed edges represent the dependence of variables .

Bayesian method has its unique expression of uncertain knowledge , Rich probability expression ability , The incremental learning feature of prior knowledge has become one of the most attractive focuses in many data mining methods .

1.1 The history of Bayesian Networks

1.2 Basic viewpoints of Bayesian method
The characteristic of Bayesian method is to use probability to express all forms of uncertainty , Learning and other forms of reasoning are implemented by probability rules .

The result of Bayesian learning is expressed as the probability distribution of random variables , It can be explained by the degree of trust we have in different possibilities .

The starting point of Bayesian school is two works of Bayes : Bayesian theorem and Bayesian hypothesis .

Bayesian theorem relates the prior probability and the posterior probability of events .

Supplementary knowledge :

(1) Prior probability : The prior probability refers to the probability of occurrence of each event determined by historical data or subjective judgment . This kind of probability has not been proved by experiment , Probability before test , So it's called a priori probability . A priori probability is generally divided into two categories , One is objective prior probability , It refers to the probability calculated by using historical data in the past ; The second is subjective prior probability , It refers to when there is no or incomplete historical data , The probability of acquisition can only be judged by people's subjective experience .
(2) Posterior probability : Posteriori probability generally refers to the use of Bayesian formula , New additional information was obtained by means of investigation , By modifying the prior probability, the more realistic probability can be obtained .
(3) joint probability : Joint probability is also called multiplication formula , Is the probability of the product of two arbitrary events , Or the probability of intersection events .

Assumed random vector x,θ The joint distribution density of is p(x,θ), Their marginal densities are p(x),P(θ). In general, the x It's the observation vector ,θ
Is an unknown parameter vector , The estimation of unknown parameter vector is obtained by observing vector , Notes on Bayes theorem :
p(θ|x)=π(θ)p(x|θ)p(x)=π(θ)p(x|θ)∫π(θ)p(x|θ)dθ
, among π(θ) yes θ Prior distribution of .

The general method of Bayesian method for unknown parameter vector estimation is as follows :
(1) The unknown parameters are regarded as random vectors , This is the biggest difference between Bayesian method and traditional parameter estimation method .
(2) According to the previous parameters θ Knowledge of , Determine prior distribution π(θ), It is a controversial step in Bayesian method , Therefore, it is attacked by the classical statistical community .
(3) Calculate the posterior distribution density , Infer the unknown parameters .
In the (2) step , If there is no prior knowledge to help determine π(θ)
, Bayes proposed that uniform distribution can be used as its distribution , That is, the parameter is within its range of variation , The opportunity to take every worthwhile opportunity is the same , This assumption is called Bayesian hypothesis .

1.3 Application fields of Bayesian Networks
Assistant intelligent decision making :
Data fusion :
pattern recognition :
medical diagnosis :
Text understanding :
data mining :1, Bayesian method for classification and regression analysis ;2, It is used for causal reasoning and uncertain knowledge expression ;3, For clustering pattern discovery .

Two , Basis of Bayesian probability theory

2.1, Fundamentals of probability theory

2.2, Bayesian probability
(1) Prior probability :
(2) Posterior probability :
(3) joint probability :
(4) Total probability formula : set up B1,B2,⋅⋅⋅,Bn Two events are mutually exclusive , And P(Bi)>0,i=1,2,⋅⋅⋅,n,B1+B2+⋅⋅⋅+Bn=Ω
be A=AB1+AB2+⋅⋅⋅+ABn , Namely P(A)=∑ni=1P(Bi)P(A|Bi)
. From this we can see the full probability company as “ Infer the result from the reason ”, Each cause has a certain effect on the outcome “ effect ”, That is, the possibility of the results and various reasons “ effect ” It's about size . The total probability formula expresses the relationship between them .
(5) Bayes formula : Bayes formula is also called posterior probability formula , Also called inverse probability formula , It has a wide range of uses . Let the prior probability be P(Bi), The new additional information obtained from the survey is P(Aj|Bi),(i=1,2,⋅⋅
⋅,n;j=1,2,⋅⋅⋅,m) , Then the posterior probability calculated by Bayes formula is ( The formula doesn't look like this , To be verified , Please tell me why ):
P(Bi|Aj)=P(Bi)P(Aj|Bi)∑mk=1P(Bi)P(Ak|Bi)

* Any complete probability model must have representation ( Direct or indirect ) The ability of joint distribution of variables in this field . A complete enumeration requires an exponential scale ( Relative to the number of domain variables ).
* Bayesian networks provide a compact representation of this joint probability distribution : Decompose the joint distribution into the product of several local distributions :P(x1,x2,⋅⋅⋅,xn)=∏iP(xi|π)
* It can be seen from the formula , The number of parameters required increases linearly with the number of nodes in the network , However, the calculation of joint distribution increases exponentially
* The specification of independence among variables in network is the key to achieve compact representation . This independence relationship is particularly effective in Constructing Bayesian networks from human experts .
Three , Simple Bayesian learning model

Simple Bayesian learning model will train examples I Decomposed into eigenvectors X And decision category variables C
. The simple Bayesian model assumes that the components of the eigenvector are relatively independent relative to the decision variables , In other words, each component acts independently on the decision variables . Although this assumption limits the application of simple Bayesian model to some extent , But in practice , It not only reduces the complexity of Bayesian network construction by exponential level , And in many areas , Against this assumption , Simple Bayes also show considerable robustness and efficiency , It has been successfully applied to classification , Clustering and model selection are important tasks in data mining .

- The structure is simple – There are only two layers
- The inference complexity is linear with the number of network nodes

Design sample A Expressed as an attribute vector , If the property is independent of a given category , that P(A|Ci) It can be decomposed into the product of several components :
P(a1|Ci)∗P(a2|Ci)∗⋅⋅⋅∗P(am|Ci) ai It's a sample A Of the i Properties . that ,P(Ci|A)=P(Ci)P(A)∏j=1mP(aj|Ci)
This process is called simple Bayesian classification (SBC:Simple Bayesian
Classifier). people say that , Only when the independence hypothesis holds ,SBC In order to obtain the classification efficiency with the best accuracy ; Or when the attribute correlation is small , It can obtain the approximate optimal classification effect .

Technology
©2020 ioDraw All rights reserved
Dormitory information management system JQ set up cookie(3 We'll do it in five minutes )VUE Arrangement of basic knowledge points Embedded Software Engineer 2019 Summary of school recruitment First order low pass filter - Continuous to discrete python Basic exercises ( One )Java——char Types and strings Six Star Education :PHP Programmer's book , Let you quickly become a technical master springboot+vue+shiro Function authority 【Pytorch】BCELoss and BCEWithLogitsLoss Detailed explanation of loss function