Team Imedia

Members
Overall Objectives
Scientific Foundations
Application Domains
Software
New Results
Other Grants and Activities
Dissemination
Bibliography

Section: Software

PMH Library

Participants : Alexis Joly, Olivier Buisson [ INA ] .

PMH is a generalist software library dedicated to locality sensitive hashing in metric spaces for approximate similarity search. It allows to index and exploit efficiently large datasets of content descriptors, usually represented by high dimensional feature vectors. The construction of the index and the required memory space are linear in dataset size. The nearest neighbour search algorithm is sublinear in dataset size.

PMH is globally related to Locality Sensitive Hashing methods (LSH) that have been proved to be the most efficient ones for approximate similarity search in large and high-dimensional datasets. Contrary to classical LSH method (such as the ones used in MIT E2LSH package), PMH includes a multi-probe search algorithm which allows to drastically reduce the memory space complexity enabling to deal with datasets of several order of magnitude larger. Our multi-probe algorithm being based on a probabilistic control of buckets success probability also offers to control accurately the quality of the approximate search. Finally, PMH library is widely more generic than concurrent libraries (such as FLANN or LSHKIT). It allows the use of different metric types (L1, L2, Hamming, inner product, weighted distances, etc.), different data types (binary, float, sparse, non vectorial, etc.), different query types (K nearest neighbours, range queries, probabilist queries, empirical models, etc.), differentes hashing functions families (random projections with different distributions, kernel based projections, optimized projections such as PCA or LDA, etc.).

Notably, PMH library is the core technology for the scalability issues addressed by VITALAS European project and is fully integrated in the resulting VITALAS multimedia search engine. It has been successfully applied to multi-users real-time content-based retrieval in 20 millions Flickr images and to real-time local search of small objects in a 100K images collection (including 120 millions SIFT features).


previous
next

Logo Inria