Approximatingand Eliminating Search Algorithm (AESA) makes use of metricproperties of given distance. The algorithm consists of (1) an efficientelimination rules and (2) an appropriate search for available rules by which maximumefficiency is maintained. For N prototypes, the it uses precompution of triangulararray of (N2-N)/2 distances between prototypes. By which Nearest Neighbour (NN) Search is carried outthrough a very small number of distance computations. Furthermore, for large N,this number of computations, tends to be independent of N (asymptotic constanttime-complexity). However, the great storage complexity and preprocessing time(O (N2)), severely limit the practical use of the AESA for largesets of prototypes.
Reference: E. Vidal,”An algorithm for finding nearest neighbors in (approximately) constant averagetime,” Pattern Recog. Lett., vol. 4, no.
3, pp. 145–157, 1986. LAESA (Linear AESA) a new version of the AESA, which uses a linear array ofdistance, but is strictly based on metric arguments like the original AESA. Theprocedure starts by selecting “Base Prototypes” (BP) from the givenset of prototypes which is relatively a small set there by computing thedistances between these BP’s and the completing set of prototypes.Reference: L.
Mico,J.Oncina, E. Vidal An algorithm for finding nearest neighbours in constantaverage time with a linear space complexity Proceedings, 11th IAPR InternationalConference on Pattern Recognition. Vol. II. Conference B: Pattern RecognitionMethodoly and Systems Year; 1992 Tree LAESA (TLAESA) is based on the LAESA which reduces itsaverage time complexity to sublinear.
The TLAESA algorithm consists of twoparts: (1) preprocessing and (2) search. In the preprocessing algorithm, twostructures are built: a binary search tree storing all prototypes and a table ofdistances. The search algorithm is essentially a traversal of the binary treewhere the table of distances is used in order to avoid the exploration of some branchesof the tree.
Reference: L.Mico, J. Oncina, and R.
C. Carrasco, “A fast branch & bound nearestneighbour classifier in metric spaces,” Pattern Recog. Lett., vol.
17, no. 7,pp. 731–739, 1996. BAESA Approximatingk-LAESA(Ak-LAESA) is a fast classifier for general metricspaces (no vector space required) based on the LAESA algorithm The aim is toachieve classification rates similar to those of a k-NN classifier, using kneighbours that may not be the k-NNs and preserving the main properties ofLAESA (distance computations and time and space complexities). It obtains errorrates very close to those of a k-NN classifier, while calculating a much lowernumber of distances (the number of distances of the Ak-LAESA algorithm isexactly the same that the computed by the LAESA algorithm). Reference: Amodification of the LAESA algorithm for approximated k-NN classification,Pattern Recognition Letters, Vol.
24, Issue: 1-3, January 2003,Pages: 47-53 FixedQueries Trees(FQ Trees) are an evolution where the same pivotis used for all the nodes of the same level of the tree. In this case the pivotdoes not need to belong to the subtree. The main goal of this structure is tominimize the number of element comparisons, as opposed to other data structureoperations (such as tree traversal). This is especially important inapplications where computing distances between two elements is a much moreexpensive operation than, say following pointers. This tree structure differsfrom other tree structures where the keys on each level of the tree are all thesame, so we have just one key per level. In other words, which comparisons wemake does not depend on the results of previous comparisons up the tree. Thusthe name fixed-queries tree or FQ-tree. A major strength of FQ-trees is that the datastructure and the search algorithms are independent of the distance function,as long as it satisfies the triangle inequality.
Reference: R.Baeza-Yates, W. Cunto, U. Manber, and S. Wu. Proximity matching using fixed-queriestrees. In Proc.
CPM’94, LNCS 807, pages 198–212, 1994. FixedHeight fq-tree(fhq-tree) A variant called FQ Tree where all theleaves are at the same depth h, regardless of the bucket sizeReference:Ricardo Baeza-Yates Gonzalo Navarro “Fastapproximate string matching in a dictionary” Published in: StringProcessing and Information Retrieval: A South American Symposium, 1998.Proceedings IEEE FixedQueries Array (FQA), has two interesting properties. First, it is the first datastructure which is able to achieve a sub linear (in the database size) numberof side computations without using any extra space.
Second, it is able to tradenumber of pivots k for their precision, so as to optimize the usage of theavailable space. FQAare based in a common idea: k pivots are selected and each object is mapped tok coordinates which are its distances to the pivots. Later, the query q is alsomapped and if it differs from an object in more than r along some coordinatethen the element is filtered out by the triangle inequality. That is, if forsome pivot pi and some element v of the set it holds |d(q,pi) ? d(v, pi)| > r, then weknow that d(q, v) > r without need to evaluate d(v, q). The elements thatcannot be filtered out using this rule are directly compared.Reference: E. Ch´avez, J.
L. Marroquin, and G.Navarro. Fixed queries array: A fast and economical data structure forproximity searching. Multimedia Tools and Applications (MTAP), 14(2):113–135,2001. Sparse SpatialSelection (SSS),a new pivot-based technique.
The main characteristic of this method is that itguarantees a good pivot selection more efficiently. In addition, SSS adaptsitself to the dimensionality of the metric space we are working with, withoutbeing necessary to specify in advance the number of pivots to use. Furthermore,SSS is dynamic, that is, it is capable to support object insertions in thedatabase efficiently, it can work with both continuous and discrete distancefunctions, and it is suitable for secondary memory storage.
The maincontribution of the SSS is the use of a new pivot selection strategy whichgenerates a comparatively small number of good-quality pivots.The main contribution of SSS is the generating a number of pivots that dependson the intrinsic dimensionality of the space (something interesting from boththe theoretical and practical points of view.SSScan be extended to deal with more complex metric spaces where the distancesamong subsets of objects depend on specific dimensions that are not relevantfor other set of objects. That is, in some applications areas, objects can begrouped in different clusters with different associated metric spaces, all ofthem nested into the general metric space that explains the distances amongclusters. To deal with these complex spaces we propose the extension of SSSbecoming Sparse Spatial Selection for Nested Metric Spaces (SSSNMS)Reference: Spatial Selection of Sparse Pivots for SimilaritySearch in Metric Spaces, JCS&T Vol.
7 No. 1, April2007 Vantage PointTree (VP Tree) Like theKD-Tree, each VP-Tree node cuts/divides the space. A VP-Tree node employs distancefrom a selected vantage point rather than using coordinate values. Near pointsmake up the left/inside subspace while the right/outside subspace consists offar points. A binary tree is formed by recursion. Each of the tree nodesidentifies a vantage point, and for its children (left/right), the nodecontains bounds associated to subspace by the vantage point. Other forms of theVP-Tree include additional subspace bounds and may employ buckets near leaflevel.
VPS-Tree an element of S is compared with the vantage pointbelonging to each of its ancestors in the tree. This information is also notcaptured in the simple tree. It consists of a database element identifier ‘id’,and a list ‘hist’ of distances from the item to each vantage point tracing backto the root. A list of these structures is initialized with the history set tonull from the entire database. The algorithm then recursively splits the listinto two lists L and R, while adding to the history of each item. Each resultingtree node contains a list ‘bnds’ which have a range interval for itscorresponding subspace which is seen by each ancestral vantage point. VPSB-Tree Here we consider one way to form buckets of leavesin order to save space while preserving the notion of ancestral history presentin the VPS-Tree. Buckets are formed by collapsing subtrees near leaflevel into a flat structure.
Each bucket contains say no element records. Eachrecord must specify an id, and in addition holds distance entries for everyancestor. We call the resulting structure a VPSB-Tree. Reference: P. N.Yianilos, “Data structures and algorithms for nearest neighbor search ingeneral metric spaces,” in Proc. Annu. ACM-SIAM Symp.
Discrete Algorithms,1993, pp. 311–321. Multi-VantagePoint Tree(MVP Tree) A distance based index structure calledmulti-vantage point (MVP) tree for similarity queries on high-dimensional metricspaces.
The MVP Tree uses more than one vantage point to partition the spaceinto spherical cuts at each level. It also utilizes the pre-computed (atconstruction time) distances between the data points and the vantage points.Reference: T.
Bozkaya and M. Ozsoyoglu. Distance-basedindexing for high-dimensional metric spaces. In Proc. SIGMOD’97, pages 357–368, 1997.
Sigmod Record 26(2). GridFiles A file data structure designed tomanage a disk allocation storage in terms of fixed size units like disk blocks,pages or buckets depending on level of description. It is a variant of gridmethod where the requirement is cell division lines mustbe equidistant.
Hashing method for multidimensional points. It is an extensionof extensible hashing.Multipaging is similar to the grid file. Ituses a directory called axial arrays which is in the form of linearscales instead of using a grid director. There are two variants of multipaging.
In dynamic multipaging, performance is controlled by setting a bound on theprobe factor which is defined as the average number of pages accessed in aprobe. In static multipaging, performance is controlled by setting a bound inthe load factor, or the average page occupancy.Reference: A Survey onMultidimensional Access Methods Hee-Kap Ahn Nikos Mamoulis Ho Min WongUU-CS-2001-14 May, 2001 EXCELLmethod (Extendible CELL) related tothe grid file. It is a bintree with a directory in the form of an arrayproviding access by address computation. It can also be viewed as an adaptationof extendible hashing to multidimensional point data. In contrast to the gridfile, where the partitioning hyperplanes may be spaced arbitrarily, the EXCELLmethod decomposes the space regularlyad all grid cells are of equal size.
Two-Level GridFile. The basic idea of it is to use a secondgrid file to manage the grid directory. The first level is called the rootdirectory. Entries of the root directory contains pointers to the directorypages of the lower level, which in turn contain pointers to the data pages. By havinga second level, splits are often confined to the subdirectory regions withoutaffecting too much of their surroundings.
TwinGrid File It is other hashing method. It tries toincrease space utilization compared to the original grid file by introducing asecond grid file. The relationship between these two grid files is not hierarchicalbut somewhat more balanced. Both grid files span the whole space (universe).
The distribution of the data among the two files is performed dynamically.Quadtreesare one of the first data structures for higher-dimensional data. A quadtree is arooted tree in which every internal node has four children. Every node in the quadtreecorresponds to a square. If a node v has children, then theircorresponding squares are the four quadrants of the square of v ¾ hencethe name of the tree.
This implies that the squares of the leaves together forma subdivision of the square of the root. We call this subdivision the quadtreesubdivision. The children of the root are labeled NE, NW, SW, and SE toindicate to which quadrant they belong to; NE stands for the north-eastquadrant, NW for the north-west quadrant, and so on.