利用python建立预测模型,python神经网络预测模型

python简单预测模型

步骤1 :导入所需的库，然后导入测试和训练数据集。

导入pandas，numpy包，然后导入标签编码器、random、RandomForestClassifier和GradientBoostingClassifier函数

import pandas as pd

导入编号为NP

froms klearn.preprocessingimportlabelencoder

导入随机

froms klearn.ensembleimportrandomforestclassifier

froms klearn.ensembleimportgradientboostingclassifier

#训练，读取测试数据集

train=PD.read _ CSV (c :/users/analyticsvidhya/desktop/challenge/train.CSV ) )。

test=PD.read _ CSV (c :/users/analyticsvidhya/desktop/challenge/test.CSV ) )

#培训，创建测试数据集标志

train='Train '

test='Test '

fullData=pd.concat (，axis=0) #联合培训，测试数据集

步骤2 :该框架的第二步不需要使用python，而是进行到下一步。

步骤3 :显示数据集的列名称或摘要

fullData.columns #表示所有列的名称

fulldata.head(10 )表示数据框中的前10条记录

可以使用fulldata.describe(#describe )函数查看数值字段的概要

步骤4 ) a ) ID变量b )目标变量c )分类变量d )数值变量e )确定其他变量。

ID_col=

target_col=

cat_cols=

num_cols=list(set(list ) fulldata.columns ) ) set ) cat_cols )-set ) id_col )-set ) target _ col-set

other_col=#训练，设置测试数据集的标识符

步骤5 :识别缺少的值变量并创建标志

fullData.isnull ().any ) #返回True或False，True表示存在缺少值，而False则相反

num_cat_cols=num_cols cat_cols #组合数值变量和分类变量

#为具有缺少值的变量创建新变量

#缺少值标志为1，否则为0

for var in num_cat_cols:

iffulldata.isnull(.any )=True:

fulldata=fulldata.isnull(*1

步骤6 :填充缺少的值

用平均值填充缺少的值

full data=full data.fill na (full data.mean )，inplace=True ) )。

#用-9999填充分类变量的缺少值

full data=full data.fill na (value=-9999 ) )。

步骤7 :建立分类变量的标签编码器，将数据集划分为训练和测试集，再将训练数据集划分为训练集和测试集。

#创建分类特征的标签编码器

for var in cat_cols:

number=标签编码器()

full data=number.fit _ transform (full data.as type (' str ' ) )

#目标变量也是分类变量，因此通过标签编码器进行转换

full data=number.fit _ transform (full data.as type (' str ' ) )

train=fullData='Train']

test=fullData='Test']

train=NP.random.uniform (0，1，Len ) ) ).75

Train，Validate=train=True]，train=False]

步骤8 :将填充和虚假(缺失值标志)变量传递给模型，并使用随机林预测类。

e atures=list (set (list (full data.columns ) )-set (id _ col )-set (set ) other_col ) )

x_train=Train.values

y_train=Train.values

x_validate=Validate.values

y_validate=Validate.values

x_test=test.values

Random.seed(100 )。

RF=randomforestclassifier (n _ estimators=1000 ) )

RF.fit(x_train，y_train ) )。

步骤9 :检查并预测绩效

status=RF.predict _ proba (x _ validate )

fpr，tpr，_=roc_curve(y_validate，status ) ) ) ) ) )。

ROC_AUC=AUC(FPR，tpr ) )。

print roc_auc

final _ status=RF.predict _ proba (x _ test )

test=final_status

test.to _ CSV (c :/users/analyticsvidhya/desktop/model _ output.CSV '，columns=)

《来源于科技文献，经本人整理归纳，仅供学习和分享，如有侵权请联系删除》