Servicio al cliente

Areas
Serie de libros (96)	1358
Nachhaltigkeit	3
Gesundheitswesen	1
Letra	2346
Ciencias Naturales	5392
Matemática	228
Informática	318
Física	979
Química	1362
Geociencias	131
Medicina humana	243
Estomatología	10
Veterinaria	102
Farmacia	147
Biología	835
Bioquímica, biología molecular, tecnología genética	120
Biofísica	25
Nutrición	45
Agricultura	1001
Silvicultura	201
Horticultura	20
Ecología y conservación de la tierra	146
Ciencias Ingeniería	1776
General	95
Leitlinien Unfallchirurgie 5. Auflage bestellen

Erweiterte Suche

Statistical Issues in Machine Learning Towards Reliable Split Selection and Variable Importance Measures

Impresion
EUR 28,00 EUR 26,60

E-Book
EUR 19,60

Statistical Issues in Machine Learning Towards Reliable Split Selection and Variable Importance Measures (Tienda española)

Carolin Strobl (Autor)

Previo

Indice, Datei (56 KB)
Lectura de prueba, Datei (84 KB)

Recursive partitioning methods from machine learning are being widely applied in many scientific fields such as, e.g., genetics and bioinformatics. The present work is concerned with the two main problems that arise in recursive partitioning, instability and biased variable selection, from a statistical point of view. With respect to the first issue, instability, the entire scope of methods from standard classification trees over robustified classification trees and ensemble methods such as TWIX, bagging and random forests is covered in this work.
While ensemble methods prove to be much more stable than single trees, they also loose most of their interpretability. Therefore an adaptive cutpoint selection scheme is suggested with which a TWIX ensemble reduces to a single tree if the partition is sufficiently stable. With respect to the second issue, variable selection bias, the statistical sources of this artifact in single trees and a new form of bias inherent in ensemble methods based on bootstrap samples are investigated. For single trees, one unbiased split selection criterion is evaluated and another one newly introduced here. Based on the results for single trees and further findings on the effects of bootstrap sampling on association measures, it is shown that, in addition to using an unbiased split selection criterion, subsampling instead of bootstrap sampling should be employed in ensemble methods to be able to reliably compare the variable importance scores of predictor variables of different types. The statistical properties and the null hypothesis of a test for the random forest variable importance are critically investigated. Finally, a new, conditional importance measure is suggested that allows for a fair comparison in the case of correlated predictor variables and better reflects the null hypothesis of interest.

ISBN-10 (Impresion)	3867276617
ISBN-13 (Impresion)	9783867276610
ISBN-13 (E-Book)	9783736926615
Idioma	Inglés
Numero de paginas	204
Laminacion de la cubierta	Brillante
Edicion	1 Aufl.
Volumen	0
Lugar de publicacion	Göttingen
Lugar de la disertacion	München
Fecha de publicacion	30.07.2008
Clasificacion simple	Tesis doctoral
Area	Matemática Informática Bioquímica, biología molecular, tecnología genética
Palabras claves	CART, Bagging, Random Forest, Gini Index, Variable Importance