about CART I'm sure you've all seen it , Is it dizzy ? No problem , Let me give you an example , You understand CART It's so easy to generate regression tree ...
First, set up a data set , for convenience , Take a small amount of data , See the table below , Data creation is for reference only

Arm length (m) Age ( year ) weight (kg) height (m)( Tag value )
0.55201.1
0.77301.3
0.921701.7
Arm length in training data , Age , Body weight was the characteristic variable X, Height is the label value Y, Start planting trees below
1, First, the first value of the first feature is taken as the cutting point (0.5), Then the two spaces are marked as R1,R2
R 1 = { 0.5 , 5 , 20 } R_{1}=\left \{ 0.5,5,20 \right \} R1​={0.5,5,20} R 2
= { ( 0.7 , 7 , 30 ) , ( 0.9 , 21 , 70 ) } R_{2}=\left \{
(0.7,7,30),(0.9,21,70) \right \}R2​={(0.7,7,30),(0.9,21,70)}
c 1 = { 1.1 } c_{1}=\left \{ 1.1 \right \} c1​={1.1} c 2 = 1 2 ( 1.3 + 1.7 )
= 1.5 c_{2}=\frac{1}{2}\left ( 1.3+1.7 \right )=1.5c2​=21​(1.3+1.7)=1.5
Then square error (( True value - Estimate ) Square of )

m ( 0.5 ) = ( 1.1 − 1.1 ) 2 + ( 1.5 − 1.3 ) 2 + ( 1.5 − 1.7 ) 2 = 0.08 m\left
( 0.5 \right )=(1.1-1.1)^2 +(1.5-1.3)^2+(1.5-1.7)^2 =0.08m(0.5)=(1.1−1.1)2+(1.5−
1.3)2+(1.5−1.7)2=0.08
2, Add the first feature to the second variable (0.7) As cutting point , The first step of analogy , The two spaces divided are marked as R1,R2
R 1 = { ( 0.5 , 5 , 20 ) , ( 0.7 , 7 , 30 ) } R_{1}=\left \{
(0.5,5,20),(0.7,7,30) \right \}R1​={(0.5,5,20),(0.7,7,30)} R 2 = { ( 0.9 , 21
, 70 ) } R_{2}=\left \{(0.9,21,70) \right \}R2​={(0.9,21,70)}
c 1 = 1 2 ( 1.1 + 1.3 ) = 1.2 c_{1}=\frac{1}{2}\left ( 1.1+1.3 \right )=1.2 c1
​=21​(1.1+1.3)=1.2 c 2 = { 1.7 } c_{2}=\left \{ 1.7 \right \} c2​={1.7}
Then square error m ( 0.5 ) = 0.02 + 0 = 0.02 m\left ( 0.5 \right )=0.02+0 =0.02 m(0.5)=0.
02+0=0.02

So for the fixed features , From above MSE obtain , So the characteristics “ Arm length =0.7” Is the syncopation point . The same is true . For characteristic age , We can also take the above way to find the best cut point , This traverses all the features , Find the pair with the least square error (j,s),j Denotes the second j Features ,s Denotes the second j The second characteristic s Values . The best cut point in this example is 0.7, So the feature space is divided into two regions (R1,R2).
3, For the second step R1h and R2, Find the best cut point again , Recursive operation , The process is the same 1~2.

Technology