• The opportunemoment of fusion: Existing MvC adopt three fusion strategies for multi-viewdata in the clustering process, i.e., fusion in the data, fusion in theprojected features, and fusion in the results.

Most of the current researchworks of MvC focus on the second fusion strategy. However, there is notheoretical foundation to decide which one is the best. Theoretical andmethodological research needs to be conductedto uncover the essence of them.

• IncompleteMvC: Although some attempts have been done for incomplete multi-view data, aswe mentioned in each section of the category, incomplete MvC is still achallenging problem. In real-life, data loss occurs frequently. While the research in incomplete MvC has notbeen extensive. It is expected to put the effortin the research of incomplete MvC.• Multi-taskmulti-view clustering: This direction is a new trend in the research of MvC. Afew challengesare accompaniedby this trend, e.g.

, how to explore the relationshipsbetween different tasks and different views, and how to transfer theknowledge between each other views.the firstchallenge of multi-view clustering is how to discriminate different views inclustering algorithm 6. how to maximize the clustering quality withineach view, meanwhile, take the clustering consistency across different viewsinto consideration. Besides, incomplete multi-view data, wheresome data objects could be missing their observation on one view (i.e., missingobjects) or could be available only for their partial features on that view(i.

e., missing feature), also brings challenges to MvC1.1. Multi-view ClusteringAlgorithmsco-EM 3, co-testing 4, and robustco-training 5 belong to the co-trainingstyle algorithm. Sparse multi-view SVMs 6, multi-viewTSVMs 7, multi-view Laplacian SVMs 8 and multi-view Laplacian TSVMs 9 arerepresentative algorithms for co-regularization style algorithms margin-consistencystyle algorithms are recently proposed to make use of the latent consistency ofclassification results from multiple views 10–13.

Besides the latest proposedmulti-view learning strategies, some detailed multi-view learning algorithmsare successively put forward for specific machine learning tasks. These algorithmscan be summarized as multi-view transfer learning 15–17, multi-view dimensionality reduction 18–20,multi-view clustering 21– 28, multi-view discriminant analysis 29,30,multi-view semi-supervised learning 8,9and multi-task multi-view learning 31– 35Equation 11.1. Rough Set TheoryWhen asking to acomputer scientist about rough set the first two common words they use are lower and the upperapproximation.

In fact, beyond the commons words rough set theory deal withuncertainty, vagueness and discernibility. In 1991, Pawlak introduced the roughset theory toward its fundamental concept of funding the lower and upperapproximation. However, over the time the concept evolved. A different set theoretic approach which also uses theconcept of membership functions, namely rough sets (introduced by Pawlak in1982 668), is sometimes confused with fuzzy sets. While both fuzzy sets andrough sets make use of membership functions, rough sets differ in the sensethat a lower and upper approximation to the rough set is determined. The lowerapproximation consists of all elements that belong with full certainty to thecorresponding set, while the upper approximation consists of elements that maypossibly belong to the set. Rough sets are frequently used in machine learningas classifier, where they are used to find the smallest number of features todiscern between classes. Rough sets are also used for extracting knowledge fromincomplete data Computational Intelligence Second edition p.

452. Forthe good understanding of RST let first define an information system then weuse that to give more detail.Information system:Let assumethat, an ordered pair ? = (U, A), where U is the universeof discourse and A is a non-empty set of attributes. The universe of discourse is a set of objects(or patterns, examples), while the attributes define the characteristics of asingle object. Each attribute a ? A is afunction a: U ? Va, where Va is the range of values forattribute a.We call lowerapproximation the region with the highest probability to find the object andupper approximation its opposite. In some case it may be not available enoughinformation whether the object belong to the upper or the lower region, suchobjects are regrouped in the boundary region which is full of uncertainty. Inother case it may appear that two objects have the same values for theseattributes.

If so, they are indiscernible. The indiscernibilityrelation is defined as: (2)where B ? A. With U/IND(B) is denoted the set of equivalenceclasses in the relation IND(B). That is, U/IND(B) contains oneclass for each set of objects that satisfy IND(B) over all attributes in B.

Objects are therefore groupedtogether, where the objects in different groups cannot be discerned between.A discernibility matrix is atwo-dimensional matrix where the equivalence classes form the indices, and eachelement is the set of attributes that can be used to discern between thecorresponding classes. Formally, for a set of attributes B ? A in A = (U, A), thediscernibility matrix MD(B) is defined as (3)for 1 ? i,j ? n, and n = |U/IND(B)|, with (4)for i,j = 1,···,n; a(Ei) indicates that attribute a belongs to equivalence class Ei.Using the discernibilitymatrix, discernibility functions can be defined to compute the minimal numberof attributes necessary to discern equivalence classes from one another. Thediscernibility function f(B), with B ? A, is defined as (5)Equation6where (23.4) (7)and is the Boolean variableassociated with a, and isthe disjunction over the set of Boolean variables, and ? denotes conjunction.

The discernibility function f(B)finds the minimal set of attributes required to discern any equivalence classfrom all others. Alternatively, the relative discernibility function f(E,B) finds the minimal set of attributes required to discern a given class, E, from the other classes, using the setof attributes, B. That is, (8)