›› 2014, Vol. 23 ›› Issue (8): 80203-080203.doi: 10.1088/1674-1056/23/8/080203

• GENERAL • 上一篇    下一篇

Improved locality-sensitive hashing method for the approximate nearest neighbor problem

陆颖华a, 马廷淮a b, 钟水明a, 曹杰c, 王新d, Abdullah Al-Dhelaane   

  1. a Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing 210044, China;
    b School of Computer & Software, Nanjing University of Information Science & Technology, Nanjing 210044, China;
    c School of Economic & Management, Nanjing University of Information Science & Technology, Nanjing 210044, China;
    d Huafeng Meteorological Media Group, Beijing 100086, China;
    e Computer Science Department, College of Computer and Information Science, King Saud University, Riyadh 11362, Saudi Arabia
  • 收稿日期:2013-10-16 修回日期:2014-02-10 出版日期:2014-08-15 发布日期:2014-08-15
  • 基金资助:
    Project supported by the National Natural Science Foundation of China (Grant No. 61173143) and the Special Public Sector Research Program of China (Grant No. GYHY201206030). The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding this work through research group No. RGP-VPP-264.

Improved locality-sensitive hashing method for the approximate nearest neighbor problem

Lu Ying-Hua (陆颖华)a, Ma Ting-Huai (马廷淮)a b, Zhong Shui-Ming (钟水明)a, Cao Jie (曹杰)c, Wang Xin (王新)d, Abdullah Al-Dhelaane   

  1. a Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing 210044, China;
    b School of Computer & Software, Nanjing University of Information Science & Technology, Nanjing 210044, China;
    c School of Economic & Management, Nanjing University of Information Science & Technology, Nanjing 210044, China;
    d Huafeng Meteorological Media Group, Beijing 100086, China;
    e Computer Science Department, College of Computer and Information Science, King Saud University, Riyadh 11362, Saudi Arabia
  • Received:2013-10-16 Revised:2014-02-10 Online:2014-08-15 Published:2014-08-15
  • Contact: Ma Ting-Huai E-mail:thma@nuist.edu.cn
  • Supported by:
    Project supported by the National Natural Science Foundation of China (Grant No. 61173143) and the Special Public Sector Research Program of China (Grant No. GYHY201206030). The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding this work through research group No. RGP-VPP-264.

摘要: In recent years, the nearest neighbor search (NNS) problem has been widely used in various interesting applications. Locality-sensitive hashing (LSH), a popular algorithm for the approximate nearest neighbor problem, is proved to be an efficient method to solve the NNS problem in the high-dimensional and large-scale databases. Based on the scheme of p-stable LSH, this paper introduces a novel improvement algorithm called randomness-based locality-sensitive hashing (RLSH) based on p-stable LSH. Our proposed algorithm modifies the query strategy that it randomly selects a certain hash table to project the query point instead of mapping the query point into all hash tables in the period of the nearest neighbor query and reconstructs the candidate points for finding the nearest neighbors. This improvement strategy ensures that RLSH spends less time searching for the nearest neighbors than the p-stable LSH algorithm to keep a high recall. Besides, this strategy is proved to promote the diversity of the candidate points even with fewer hash tables. Experiments are executed on the synthetic dataset and open dataset. The results show that our method can cost less time consumption and less space requirements than the p-stable LSH while balancing the same recall.

关键词: approximate nearest neighbor problem, locality-sensitive hashing

Abstract: In recent years, the nearest neighbor search (NNS) problem has been widely used in various interesting applications. Locality-sensitive hashing (LSH), a popular algorithm for the approximate nearest neighbor problem, is proved to be an efficient method to solve the NNS problem in the high-dimensional and large-scale databases. Based on the scheme of p-stable LSH, this paper introduces a novel improvement algorithm called randomness-based locality-sensitive hashing (RLSH) based on p-stable LSH. Our proposed algorithm modifies the query strategy that it randomly selects a certain hash table to project the query point instead of mapping the query point into all hash tables in the period of the nearest neighbor query and reconstructs the candidate points for finding the nearest neighbors. This improvement strategy ensures that RLSH spends less time searching for the nearest neighbors than the p-stable LSH algorithm to keep a high recall. Besides, this strategy is proved to promote the diversity of the candidate points even with fewer hash tables. Experiments are executed on the synthetic dataset and open dataset. The results show that our method can cost less time consumption and less space requirements than the p-stable LSH while balancing the same recall.

Key words: approximate nearest neighbor problem, locality-sensitive hashing

中图分类号:  (Probability theory)

  • 02.50.Cw
02.50.Ey (Stochastic processes) 02.50.Fz (Stochastic analysis) 02.60.Gf (Algorithms for functional approximation)