You can edit almost every page by Creating an account and confirming your email.

Boruta (algorithm)

From EverybodyWiki Bios & Wiki



Boruta is an algorithm in the field of machine-learning, and more specifically, a feature-selection algorithm. The aim of the algorithm as presented in the original paper describing it[1] is to find all relevant features (compare with minimal-optimal features set). The Boruta algorithm is not a stand-alone algorithm, but is implemented as a wrapper algorithm around the random-forest classification algorithm. In its essence, Boruta works in an iterative manner, and in each iteration the aim is to remove features which, according to a statistical test, are less relevant than what is defined by the authors as a random probe. One of the fundamental components of Boruta is the use of shadow attributes. Shadow attributes are pseudo-features that are added to the information system, and produced by taking existing features from the original data-set and shuffling the values of those features between the original samples (data points). After generating the shadow attributes the procedure proceeds with building random-forest trees and comparing the Z-scores obtained by original features to Z-scores obtained by the shadow attributes. This comparison is the foundation for Boruta to decide whether a feature is important or not.

 High level pseudo-code:

1.  Copy all variables (features)
2.  Shuffle values in each feature
3.  Run random-forest on the extended system (shuffled features), gather Z scores
4.  Find maximum MSZA (max Z-score among shadow attributes)
5.  Run random-forest on original features
6.  Assign each original feature a hit if feature Z-score > MSZA
7.  If Z-score <= MSZA, perform two-sided equality test against MSZA
8.  If Z-score < MSZA significantly, drop feature as unimportant
9.  If Z-score > MSZA significantly, keep feature as important
10. Repeat from step 5 until all importance is determined for all features or max RF runs have been reached


References

  1. Miron B. Kursa, Witold R. Rudnicki (2010). "Feature Selection with the Boruta Package". Journal of Statistical Software. 36 (11).

External links


This article "Boruta (algorithm)" is from Wikipedia. The list of its authors can be seen in its historical. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.