Deep feature synthesis
Deep Feature Synthesis is an algorithm developed by James Max Kanter and Kalyan Veeramachaneni in their paper "Deep Feature Synthesis: Towards Automating Data Science Endeavors" [1] It is known as "the first system that automates feature engineering from a database of multiple tables." [2]
Definition[edit]
Quoting the above paper: "Deep Feature Synthesis is an algorithm that automatically generates features for relational datasets. In essence, the algorithm follows relationships in the data to a base field, and then sequentially applies mathematical functions along that path to create the final feature."
Practical results[edit]
Kanter and Veeramachaneni implemented the Deep Feature Synthesis algorithm in their Data Science Machine and proceeded to enter the automated results in several competitions:
Their results competed against human teams to find predictive patterns in unfamiliar data sets. Of the 906 teams participating in the three competitions, the researchers' "Data Science Machine" finished ahead of 615. In two of the three competitions, the predictions made by the Data Science Machine were 94 percent and 96 percent as accurate as the winning submissions. In the third, the figure was a more modest 87 percent. But where the teams of humans typically labored over their prediction algorithms for months, the Data Science Machine took somewhere between two and 12 hours to produce each of its entries.[3]
Characteristics[edit]
Little to no human intervention.
Results in hours not weeks.
Relies on SQL schema and normalized table relationships.
Applications[edit]
Quickly create feature sets of predictive value.
Critique[edit]
The process of feature synthesis from relational data is known as propositionalization, which is known at least from 1991.[4] The employed algorithm in Deep feature synthesis was for the first time described by Knobbe in 2001 [5] and is known as RollUp. RollUp was later on enhanced in PRORED.[6] A commercial version of RollUp is sold under the name Safarii.
Related Work[edit]
IBM created OneBM to "extend the Data Science Machine" [2].
See also[edit]
Some use of "" in your query was not closed by a matching "".Some use of "" in your query was not closed by a matching "".
References[edit]
- ↑ Kanter, Max; Veeramachaneni, Kalyan. "Deep Feature Synthesis: Towards Automating Data Science Endeavors" (PDF).
- ↑ 2.0 2.1 Lam, Hoang T. (2017). "One button machine for automating feature engineering in relational databases". arXiv:1706.00327.
- ↑ Hardesty, Larry. "System that replaces human intuition with algorithms outperforms human teams".
- ↑ (ed.), European Working Session on Learning, Porto, Portugal, March 6–8, 1991 ; Y. Kodratoff (1991). Machine learning--EWSL-91 : proceedings. Berlin: Springer-Verlag. ISBN 0-387-53816-X.CS1 maint: Extra text: authors list (link) Search this book on
- ↑ Knobbe, Arno (2001). "Propositionalisation and Aggregates". Principles of Data Mining and Knowledge Discovery: 277–288. doi:10.1007/3-540-44794-6_23.
- ↑ Gjorgjioski, Valentin. "Stochastic propositionalization of relational data using aggregates" (PDF).
Further reading[edit]
External links[edit]
- Feature Labs the author's spin off for algorithm applications
This article "Deep feature synthesis" is from Wikipedia. The list of its authors can be seen in its historical. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.