Machine & Deep Learning Compendium

Searchβ¦

Features

- 1.

- 1.β
**Correlation is between -1 to 1, covariance is -inf to inf, units in covariance affect the scale, so correlation is preferred, it is normalized.****Correlation is a measure of association. Correlation is used for bivariate analysis. It is a measure of how well the two variables are related. Covariance is also a measure of association. Covariance is a measure of the relationship between two random variables.** - 2.β

- 1.
**Association vs correlation - correlation is a measure of association and a yes no question without assuming linearity** - 2.β
**A great article in medium****, covering just about everything with great detail and explaining all the methods plus references.** - 3.
**Heat maps for categorical vs target - groupby count per class, normalize by total count to see if you get more grouping in a certain combination of cat/target than others.** - 4.β
**Anova****/****log regression****2*,****git****,****3****, for numeric/****cont vs categorical****- high F score from anova hints about association between a feature and a target, i.e., the importance of the feature to separating the target.** - 5.
- 6.

β

β**Paper**** - we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R2) of the data relative to the regression function.
**

β**Normalized MI score**** - Normalized Mutual Information (NMI) is a normalization of the Mutual Information (MI) score to scale the results between 0 (no mutual information) and 1 (perfect correlation). In this function, mutual information is normalized by some generalized mean of H(labels_true) and H(labels_pred)), defined by the average_method.
**

β

- 1.β
**How to parallelize feature selection on several CPUs,****do it per label on each cpu and average the results.** - 4.β
**Text analysis for sentiment, doing feature selection****a tutorial with chi2(IG?),****part 2 with bi-gram collocation in ntlk**β - 5.
**What is collocation? - βthe habitual juxtaposition of a particular word with another word or words with a frequency greater than chance.β** - 7.
- 8.
- 11.β
**Stability selection and recursive feature elimination (RFE).****are wrapper methods in sklearn for the purpose of feature selection.****RFE in sklearn**β - 12.
- 13.
- 1.
**Features with a high percentage of missing values** - 2.
**Collinear (highly correlated) features** - 3.
**Features with zero importance in a tree-based model** - 4.
**Features with low importance** - 5.
**Features with a single unique value**

- 14.
- 1.
**Univariate Selection.** - 2.
**Recursive Feature Elimination.** - 3.
**Principle Component Analysis.** - 4.
**Feature Importance.**

- 15.
- 1.
**Low variance** - 2.
**Univariate kbest** - 3.
**RFE** - 4.
**selectFromModel using _coef _important_features** - 5.
**Linear models with L1 (svm recommended L2)** - 6.
**Tree based importance**

- 16.
- 1.
**(reduction) LDA: Linear discriminant analysis is used to find a linear combination of features that characterizes or separates two or more classes (or levels) of a categorical variable.** - 2.
**(selection) ANOVA: ANOVA stands for Analysis of variance. It is similar to LDA except for the fact that it is operated using one or more categorical independent features and one continuous dependent feature. It provides a statistical test of whether the means of several groups are equal or not.** - 3.
**(Selection) Chi-Square: It is a is a statistical test applied to the groups of categorical features to evaluate the likelihood of correlation or association between them using their frequency distribution.** - 4.
**Wrapper methods:**- 1.
**Forward Selection: Forward selection is an iterative method in which we start with having no feature in the model. In each iteration, we keep adding the feature which best improves our model till an addition of a new variable does not improve the performance of the model.** - 2.
**Backward Elimination: In backward elimination, we start with all the features and removes the least significant feature at each iteration which improves the performance of the model. We repeat this until no improvement is observed on removal of features.** - 3.
**Recursive Feature elimination: It is a greedy optimization algorithm which aims to find the best performing feature subset. It repeatedly creates models and keeps aside the best or the worst performing feature at each iteration. It constructs the next model with the left features until all the features are exhausted. It then ranks the features based on the order of their elimination.**

- 5.β

- 1.
- 2.

- 2.
- 3.
- 4.
- 5.β
**Text data****- unigrams, bag of words, N-grams (2,3,..), tfidf matrix, cosine_similarity(tfidf) ontop of a tfidf matrix, unsupervised hierarchical clustering with similarity measures on top of (cosine_similarity), LDA for topic modelling in sklearn - pretty awesome, Kmeans(lda),.** - 7.
- 9.
- 10.

- 1.

- 1.β
**Max_features in tf idf****-Sometimes it is not effective to transform the whole vocabulary, as the data may have some exceptionally rare words, which, if passed to TfidfVectorizer().fit(), will add unwanted dimensions to inputs in the future. One of the appropriate techniques in this case, for instance, would be to print out word frequences accross documents and then set a certain threshold for them. Imagine you have set a threshold of 50, and your data corpus consists of 100 words. After looking at the word frequences 20 words occur less than 50 times. Thus, you set max_features=80 and you are good to go. If max_features is set to None, then the whole corpus is considered during the TF-IDFtransformation. Otherwise, if you pass, say, 5 to max_features, that would mean creating a feature matrix out of the most 5 frequent words accross text documents.**

- 1.
- 1.
- 2.
- 3.

- 2.
**Edit distance similarity** - 3.
- 4.

β

Distance

- 1.
- 1.Role of Distance Measures
- 2.Hamming Distance
- 3.Euclidean Distance
- 4.Manhattan Distance (Taxicab or City Block)
- 5.Minkowski Distance

- 2.Cosine distance = 1 - cosine similarity
- 3.

Distance Tools

- 1.

- 2.β
**Non parametric feature impact and importance****- while there are nonparametric feature selection algorithms, they typically provide feature rankings, rather than measures of impact or importance.In this paper, we give mathematical definitions of feature impact and importance, derived from partial dependence curves, that operate directly on the data.** - 3.β
**Paper****(****pdf****,****blog post****): (****GITHUB****) how to "explain the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally around the prediction." they want to understand the reasons behind the predictions, itβs a new field that says that many 'feature importance' measures shouldnβt be used. i.e., in a linear regression model, a feature can have an importance rank of 50 (for example), in a comparative model where you duplicate that feature 50 times, each one will have 1/50 importance and wonβt be selected for the top K, but it will still be one of the most important features. so new methods needs to be developed to understand feature importance. this one has git code as well.**

- 2.

β

- 1.
- 2.
- 3.
- 4.

- 1.
- 2.
- 3.β

Last modified 4mo ago