python决策树预测模型,决策树分类的基本原理

似乎没有将数据集分成单独的培训和测试数据集。因此，分类器可能过度适合数据集，无法成功处理数据集外部的样本。在

随机选择75%的数据进行训练，用剩下的25%试着测试精度。例如，替换代码的最后一部分： import random

数据集，labels=load _ CSV (data/basketball.train.CSV ) )。

random.shuffle (数据) )。

split_index=int(len(dataset ) * 0.75 ) ) ) ) ) ) ) )。

train _ dataset=dataset [ : split _ index ]

test _ dataset=dataset [ split _ index : ]

(我的树=创建树(train _ dataset，labels ) ) ) ) ) ) ) ) )

predictions=[]

for row in test_dataset:

prediction=classify (我的树、['location '、' w '、' final_margin '、' shot_number '、' period '、' game _ cl

' shot_dist '、' pts_type '、' close_def_dist'] ' '、[row[0]、row[1]、row[2]、row[3]、和row [ 4

row[9]、row[10]、row[11] )

#print('expected=%s，got=%s'% ) row[-1]，prediction ) )

predictions.append(prediction )

actual=[ row [-1 ] forrowintest _ dataset ]

accuracy=accuracy _ metric (actual，predictions )。

打印(访问)

(注)未测试) ) ) )。