Classic Machine Learning

- 1.
- 2.
- 3.

- 1.
- 2.
- 3.
- 4.

- 1.
- 2.
- 3.
- 5.

- 1.
- 2.
**The same example, but with a graph that shows that lower support cost less for fp-growth in terms of calc time.** - 3.
- 4.
**Another clip video** - 5.

**high support: should apply to a large amount of cases****high confidence: should be correct often****high lift: indicates it is not just a coincidence**

- 1.
- 2.
- 3.

- 1.
- 2.
- 3.
- 4.
- 5.β
- 6.
- 7.

**A variable is 'random'.****A process is 'stochastic'.**

**A random vector is a generalization of a single random variables to many.****A stochastic process is a sequence of random variables, or a sequence of random vectors (and then you have a vector-stochastic process).**

**It provides a way to model the dependencies of current information (e.g. weather) with previous information.****It is composed of states, transition scheme between states, and emission of outputs (discrete or continuous).****Several goals can be accomplished by using Markov models:****Learn statistics of sequential data.****Do prediction or estimation.****Recognize patterns.**

**to be effective the current state has to be dependent on the previous state in some way****if it looks cloudy outside, the next state we expect is rain.****If the rain starts to subside into cloudiness, the next state will most likely be sunny.****Not every process has the Markov Property, such as the Lottery, this weeks winning numbers have no dependence to the previous weeks winning numbers.**

- 1.
**They show how to build an order 1 markov table of probabilities, predicting the next state given the current.** - 2.
**Then it shows the state diagram built from this table.** - 3.
**Then how to build a transition matrix from the 3 states, i.e., from the probabilities in the table** - 4.
**Then how to calculate the next state using the βcurrent state vectorβ doing vec*matrix multiplications.** - 5.
**Then it talks about the setting always into the rain prediction, and the solution is using two last states in a bigger table of order 2. He is not really telling us why the probabilities don't change if we add more states, it stays the same as in order 1, just repeating.**

- 2.
**Medium**- 1.
- 2.

- 3.

- 1.
- 2.
- 3.
- 1.
**General mixture models** - 2.
**Hmm** - 3.
**Basyes classifiers and naive bayes** - 4.
**Markov changes** - 5.
**Bayesian networks** - 6.
**Markov networks** - 7.
**Factor graphs**

- 5.

**transition probability formula - the probability of going from Zk to Zk+1****emission probability formula - the probability of going from Zk to Xk****(Pi) Initial distribution - the probability of Z1=i for i=1..m**

- 1.
- 2.
**This****video****explains that building blocks of the needed knowledge in HMM, starting probabilities P0, transitions and emissions (state probabilities)** - 3.

**where P(X0) is the initial state for happy or sad****Where P(Xt | X t-1) is the transition model from time-1 to time****Where P(Yt | Xt) is the observation model for happy and sad (X) in 4 situations (w, sad, crying, facebook)**

- 1.β
**Incomplete python code****for unsupervised / semi-supervised / supervised IOHMM - training is there, prediction is missing.**

- 2.
- 3.
- 4.
- 5.
- 7.
- 8.
- 9.

- 1.

- 1.
**Sk-lego monotonic**

- 1.β
**Lightning****- lightning is a library for large-scale linear classification, regression and ranking in Python.**β - 2.
**Linear regression TBC** - 3.
**CART -****classification and regression tree****, basically the diff between classification and regression trees - instead of IG we use sum squared error** - 4.
**SVR - regression based svm, with kernel only.** - 5.
- 6.β
**LOGREG****- Logistic regression - is used as a classification algo to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables. Output is BINARY. I.e., If the likelihood of killing the bug is > 0.5 it is assumed dead, if it is < 0.5 it is assumed alive.**

**Assumes binary outcome****Assumes no outliers****Assumes no intercorrelations among predictors (inputs?)**

- 1.
- 1.
**Too many variables** - 2.
**Overfitting** - 3.
**Time series - seasonality trends can cause this**

- 2.

**variance, sigma^2. Informally, this parameter will control the smoothness of your approximated function.****Smaller values of sigma will cause the function to overfit the data points, while larger values will cause it to underfit****There is a proposed method to find sigma in the post!****Gaussian Kernel Regression is equivalent to creating an RBF Network with the following properties: - described in the post**

β**Principal component regression (PCR) Partial least squares and (PLS)**** - basically PCA and linear regression , however PLS makes use of the response variable in order to identify the new features.**

Last modified 1yr ago

Copy link

On this page

ASSOCIATION RULES

PROBABILISTIC ALGORITHMS

REGRESSION ALGORITHMS