本文主要研究内容
作者田冰(2019)在《经典统计学与机器学习中变量选择方法的比较分析》一文中研究指出:当今时代是一个大数据的时代。从计量生物学,基因组学到金融工程,风险管理等诸多学科,都面临着高维性问题。在高维数据面前,变量选择是知识发现的关键。经典统计学研究高维问题历史悠久,新兴的机器学习方法在高维数据处理方面向传统经典统计学发起了挑战。本文的目的在于比较经典统计学中变量选择方法和新兴的机器学习方法在变量选择问题上的表现情况。经典统计学的变量选择方法,我们选择了四种基于系数压缩的方法,分别是Lasso,Adaptive lasso,Elastic net,SCAD方法。机器学习中我们主要研究了决策树方法。论文的第一部分首先对经典统计学变量选择方法和机器学习变量选择方法做了一个比较全面的介绍。第二部分详细介绍了 Lasso,Adaptive lasso,Elastic net,SCAD方法能进行变量选择的原理,参数选择标准,求解算法与其统计性质。在求解算法上,对于前三种方法我们除了介绍了经典的最小角回归方法对问题进行求解之外,也将近端梯度下降算法用到了问题的求解中,而对SCAD方法则用了局部二次逼近对其进行了求解。并且细致的分析了这四种基于系数压缩的方法之间的区别与联系。第三部分我们介绍了决策树方法。决策树的变量选择准则主要介绍了信息增益,信息增益率,基尼指数,DKM准则和基于距离的方法,并比较了这些准则的性能。针对前三种准则我们介绍了其对应的决策树生成方法,分别是ID3算法,C4.5算法,CART算法。此外,我们将第二部分的压缩思想运用到了决策树的剪枝问题上。最后分析了决策树的优缺点,并针对分类树和回归树提出了其对应的性能加强算法。第四部分是数值模拟。数值模拟用了四个模型来生成数据。我们选择了全面且合理的模型评价指标。通过数值模拟我们发现,对于基于系数压缩的那四种方法来说,Lasso和Adaptive lasso选择的变量大致相同,但是Adaptive lasso比Lasso具有更小的标准差和均方误差;Elastic net倾向于选择更多的变量;SCAD方法不仅在其剔除无关变量方面要优于其他三种方法,标准差和均方误差也都要小于其他三种方法,并且样本量越大,SCAD方法选出的变量越接近于真实的模型,这也验证了其Oracle性质。决策树虽然并不擅长做回归问题,但是也能很准确的选出真正的变量,并且通过决策树的性能加强算法得出的变量重要性排序中,真正变量的得分要远远高于无关变量。第五部分是实证分析。数值模拟部分我们用的是回归模型,实证部分我们则选用的是分类模型。该部分首先介绍了如何用Lasso,Adaptive lasso,Elastic net,SCAD方法去做分类问题,即将这四种方法运用到logistic模型上。对于实证一,为了对变量加入模型的顺序进行分析,我们选用了变量个数较小的乳腺癌分类数据集。我们在测试集上拟合模型,在验证集上测试模型的分类正确率。对于经典统计学方法,我们首先给出了基于一次模拟的系数路径图和对应的CV误差图。之后重复模拟 100次,得出 Lasso,Adaptive lasso,Elastic net,SCAD在测试集上的分类正确率分别为96.5366%,96.5877%,96.4781%,96.7756%,并且前三个选入模型的变量都为变量2,3,6,最后加入模型的两个变量都为变量5,9。对于决策树方法,我们先在测试集生成一棵树,之后在验证集上测试得到分类正确率为94.7619%,对该决策树剪枝后得到了相同的结果。之后我们在训练集上生成100棵树,用决策树性能加强算法在测试集上的分类正确率提高到了96.1905%,并且该加强算法得出的前三个重要的变量与经典统计学方法得出的结果相同,为变量2,3,6,但是决策树认为最不重要要的两个变量为4,9,不同于经典统计学方法得出的变量5,9。实证二的实施过程与实证一基本相同,得出Lasso,Adaptive lasso,Elastic net,SCAD在测试集上基于100次模拟的分类正确率分别为90.5807%,91.7963%,90.9354%,99.8387%,决策树性能加强算法在测试集上的分类正确率为93.5484%,并且我们也详细的分析了每种方法选择的变量。第六部分为总结与展望。该部分对经典统计学方法和机器学习方法进行了比较总结,并对本文的不足提出了改进思路。
Abstract
dang jin shi dai shi yi ge da shu ju de shi dai 。cong ji liang sheng wu xue ,ji yin zu xue dao jin rong gong cheng ,feng xian guan li deng zhu duo xue ke ,dou mian lin zhao gao wei xing wen ti 。zai gao wei shu ju mian qian ,bian liang shua ze shi zhi shi fa xian de guan jian 。jing dian tong ji xue yan jiu gao wei wen ti li shi you jiu ,xin xing de ji qi xue xi fang fa zai gao wei shu ju chu li fang mian xiang chuan tong jing dian tong ji xue fa qi le tiao zhan 。ben wen de mu de zai yu bi jiao jing dian tong ji xue zhong bian liang shua ze fang fa he xin xing de ji qi xue xi fang fa zai bian liang shua ze wen ti shang de biao xian qing kuang 。jing dian tong ji xue de bian liang shua ze fang fa ,wo men shua ze le si chong ji yu ji shu ya su de fang fa ,fen bie shi Lasso,Adaptive lasso,Elastic net,SCADfang fa 。ji qi xue xi zhong wo men zhu yao yan jiu le jue ce shu fang fa 。lun wen de di yi bu fen shou xian dui jing dian tong ji xue bian liang shua ze fang fa he ji qi xue xi bian liang shua ze fang fa zuo le yi ge bi jiao quan mian de jie shao 。di er bu fen xiang xi jie shao le Lasso,Adaptive lasso,Elastic net,SCADfang fa neng jin hang bian liang shua ze de yuan li ,can shu shua ze biao zhun ,qiu jie suan fa yu ji tong ji xing zhi 。zai qiu jie suan fa shang ,dui yu qian san chong fang fa wo men chu le jie shao le jing dian de zui xiao jiao hui gui fang fa dui wen ti jin hang qiu jie zhi wai ,ye jiang jin duan ti du xia jiang suan fa yong dao le wen ti de qiu jie zhong ,er dui SCADfang fa ze yong le ju bu er ci bi jin dui ji jin hang le qiu jie 。bing ju xi zhi de fen xi le zhe si chong ji yu ji shu ya su de fang fa zhi jian de ou bie yu lian ji 。di san bu fen wo men jie shao le jue ce shu fang fa 。jue ce shu de bian liang shua ze zhun ze zhu yao jie shao le xin xi zeng yi ,xin xi zeng yi lv ,ji ni zhi shu ,DKMzhun ze he ji yu ju li de fang fa ,bing bi jiao le zhe xie zhun ze de xing neng 。zhen dui qian san chong zhun ze wo men jie shao le ji dui ying de jue ce shu sheng cheng fang fa ,fen bie shi ID3suan fa ,C4.5suan fa ,CARTsuan fa 。ci wai ,wo men jiang di er bu fen de ya su sai xiang yun yong dao le jue ce shu de jian zhi wen ti shang 。zui hou fen xi le jue ce shu de you que dian ,bing zhen dui fen lei shu he hui gui shu di chu le ji dui ying de xing neng jia jiang suan fa 。di si bu fen shi shu zhi mo ni 。shu zhi mo ni yong le si ge mo xing lai sheng cheng shu ju 。wo men shua ze le quan mian ju ge li de mo xing ping jia zhi biao 。tong guo shu zhi mo ni wo men fa xian ,dui yu ji yu ji shu ya su de na si chong fang fa lai shui ,Lassohe Adaptive lassoshua ze de bian liang da zhi xiang tong ,dan shi Adaptive lassobi Lassoju you geng xiao de biao zhun cha he jun fang wu cha ;Elastic netqing xiang yu shua ze geng duo de bian liang ;SCADfang fa bu jin zai ji ti chu mo guan bian liang fang mian yao you yu ji ta san chong fang fa ,biao zhun cha he jun fang wu cha ye dou yao xiao yu ji ta san chong fang fa ,bing ju yang ben liang yue da ,SCADfang fa shua chu de bian liang yue jie jin yu zhen shi de mo xing ,zhe ye yan zheng le ji Oraclexing zhi 。jue ce shu sui ran bing bu shan chang zuo hui gui wen ti ,dan shi ye neng hen zhun que de shua chu zhen zheng de bian liang ,bing ju tong guo jue ce shu de xing neng jia jiang suan fa de chu de bian liang chong yao xing pai xu zhong ,zhen zheng bian liang de de fen yao yuan yuan gao yu mo guan bian liang 。di wu bu fen shi shi zheng fen xi 。shu zhi mo ni bu fen wo men yong de shi hui gui mo xing ,shi zheng bu fen wo men ze shua yong de shi fen lei mo xing 。gai bu fen shou xian jie shao le ru he yong Lasso,Adaptive lasso,Elastic net,SCADfang fa qu zuo fen lei wen ti ,ji jiang zhe si chong fang fa yun yong dao logisticmo xing shang 。dui yu shi zheng yi ,wei le dui bian liang jia ru mo xing de shun xu jin hang fen xi ,wo men shua yong le bian liang ge shu jiao xiao de ru xian ai fen lei shu ju ji 。wo men zai ce shi ji shang ni ge mo xing ,zai yan zheng ji shang ce shi mo xing de fen lei zheng que lv 。dui yu jing dian tong ji xue fang fa ,wo men shou xian gei chu le ji yu yi ci mo ni de ji shu lu jing tu he dui ying de CVwu cha tu 。zhi hou chong fu mo ni 100ci ,de chu Lasso,Adaptive lasso,Elastic net,SCADzai ce shi ji shang de fen lei zheng que lv fen bie wei 96.5366%,96.5877%,96.4781%,96.7756%,bing ju qian san ge shua ru mo xing de bian liang dou wei bian liang 2,3,6,zui hou jia ru mo xing de liang ge bian liang dou wei bian liang 5,9。dui yu jue ce shu fang fa ,wo men xian zai ce shi ji sheng cheng yi ke shu ,zhi hou zai yan zheng ji shang ce shi de dao fen lei zheng que lv wei 94.7619%,dui gai jue ce shu jian zhi hou de dao le xiang tong de jie guo 。zhi hou wo men zai xun lian ji shang sheng cheng 100ke shu ,yong jue ce shu xing neng jia jiang suan fa zai ce shi ji shang de fen lei zheng que lv di gao dao le 96.1905%,bing ju gai jia jiang suan fa de chu de qian san ge chong yao de bian liang yu jing dian tong ji xue fang fa de chu de jie guo xiang tong ,wei bian liang 2,3,6,dan shi jue ce shu ren wei zui bu chong yao yao de liang ge bian liang wei 4,9,bu tong yu jing dian tong ji xue fang fa de chu de bian liang 5,9。shi zheng er de shi shi guo cheng yu shi zheng yi ji ben xiang tong ,de chu Lasso,Adaptive lasso,Elastic net,SCADzai ce shi ji shang ji yu 100ci mo ni de fen lei zheng que lv fen bie wei 90.5807%,91.7963%,90.9354%,99.8387%,jue ce shu xing neng jia jiang suan fa zai ce shi ji shang de fen lei zheng que lv wei 93.5484%,bing ju wo men ye xiang xi de fen xi le mei chong fang fa shua ze de bian liang 。di liu bu fen wei zong jie yu zhan wang 。gai bu fen dui jing dian tong ji xue fang fa he ji qi xue xi fang fa jin hang le bi jiao zong jie ,bing dui ben wen de bu zu di chu le gai jin sai lu 。
论文参考文献
论文详细介绍
论文作者分别是来自山东大学的田冰,发表于刊物山东大学2019-07-16论文,是一篇关于变量选择论文,决策树论文,算法论文,算法论文,算法论文,提升方法论文,回归论文,山东大学2019-07-16论文的文章。本文可供学术参考使用,各位学者可以免费参考阅读下载,文章观点不代表本站观点,资料来自山东大学2019-07-16论文网站,若本站收录的文献无意侵犯了您的著作版权,请联系我们删除。