Tuesday 28 July 2015

Seven Techniques for Data Dimensionality Reduction via @knime and @DMR_Rosaria

The recent explosion of data set size, in number of records and attributes, has triggered the development of a number of big data platforms as well as parallel data analytics algorithms. At the same time though, it has pushed for usage of data dimensionality reduction procedures.

Great blog from Knime.  I recommend reading the PDF which is linked to from the blog.

Here is an example of the R code you can use to remove a column when there are NAs in the data. You can change the ==0 if you want to change the tolerance level:

trainData <- trainData[, colSums(is.na(trainData)) == 0];

There's a great guide to PCA here from R-Bloggers

Good luck and have fun :-)

No comments:

Post a Comment

Note: only a member of this blog may post a comment.