Supervised machine learning algorithms:

egin{itemize}

item Linear regression (regression problems).

item Random forest (classification problems).

item Support vector machines (classification problems).

end{itemize}

extbf{Supervised learning problems fall into:}

egin{itemize}

item Classification Problem: The output variable (Y) is a category, such as compliant taxpayer or non compliant taxpayer.

item Regression Problem: The output variable is a real value, such as the amount of tax yield from tax audit.

end{itemize}

subsubsection{Unsupervised Machine Learning}

Unsupervised learning, means that the algorithm is provided only with input variables (X) without the corresponding outputs variables (Y).

ewline

ewline

The goal of the algorithm is to learn the underlying structure or distribution in the data. There are no correct answers and no teacher, its a self-learning process.

ewline

ewline

ewline

ewline

ewline

ewline

ewline extbf{ extit Unsupervised learning problems fall into:}

egin{itemize}

item Clustering Problems: Find the groups (clusters) in the data, such as grouping taxpayers by compliance behavior.

item Association Problems: Find the rules which describe large parts of the data, such as whether taxpayers that do not file tax returns on time also tend stop filing returns.

end{itemize}

Examples of unsupervised learning algorithms are:

egin{itemize}

item k-means for clustering problems.

item A-priori algorithm for association rule learning problems.

end{itemize}

subsubsection{Semi-Supervised Machine Learning}

The problem where a plethora of input variables (X) are available and only few output variables (Y) are labeled, is called semi supervised. A good example is the taxpayers audit yield where only few of the taxpayers have been audited (Y) and the majority are unaudited (X).

ewline

ewline

Semi supervised problems are very common because unlabelled input variables. (X) are freely available and it is very time consuming and costly to label output variables (Y)

ewline

ewline

The unsupervised learning algorithm discovers the structure of the input variables (X), and uses the predictions it made about the unlabeled data as input for the supervised learning algorithm. The semi supervised model makes predictions on new unseen data.

chapter{Approach}

A semi supervised model uses the unlabeled data to extract latent features and pairs these with labels to learn an associate classifier.