Please wait a minute...
Chin. Phys. B, 2014, Vol. 23(8): 080203    DOI: 10.1088/1674-1056/23/8/080203
GENERAL Prev   Next  

Improved locality-sensitive hashing method for the approximate nearest neighbor problem

Lu Ying-Hua (陆颖华)a, Ma Ting-Huai (马廷淮)a b, Zhong Shui-Ming (钟水明)a, Cao Jie (曹杰)c, Wang Xin (王新)d, Abdullah Al-Dhelaane
a Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing 210044, China;
b School of Computer & Software, Nanjing University of Information Science & Technology, Nanjing 210044, China;
c School of Economic & Management, Nanjing University of Information Science & Technology, Nanjing 210044, China;
d Huafeng Meteorological Media Group, Beijing 100086, China;
e Computer Science Department, College of Computer and Information Science, King Saud University, Riyadh 11362, Saudi Arabia
Abstract  In recent years, the nearest neighbor search (NNS) problem has been widely used in various interesting applications. Locality-sensitive hashing (LSH), a popular algorithm for the approximate nearest neighbor problem, is proved to be an efficient method to solve the NNS problem in the high-dimensional and large-scale databases. Based on the scheme of p-stable LSH, this paper introduces a novel improvement algorithm called randomness-based locality-sensitive hashing (RLSH) based on p-stable LSH. Our proposed algorithm modifies the query strategy that it randomly selects a certain hash table to project the query point instead of mapping the query point into all hash tables in the period of the nearest neighbor query and reconstructs the candidate points for finding the nearest neighbors. This improvement strategy ensures that RLSH spends less time searching for the nearest neighbors than the p-stable LSH algorithm to keep a high recall. Besides, this strategy is proved to promote the diversity of the candidate points even with fewer hash tables. Experiments are executed on the synthetic dataset and open dataset. The results show that our method can cost less time consumption and less space requirements than the p-stable LSH while balancing the same recall.
Keywords:  approximate nearest neighbor problem      locality-sensitive hashing  
Received:  16 October 2013      Revised:  10 February 2014      Accepted manuscript online: 
PACS:  02.50.Cw (Probability theory)  
  02.50.Ey (Stochastic processes)  
  02.50.Fz (Stochastic analysis)  
  02.60.Gf (Algorithms for functional approximation)  
Fund: Project supported by the National Natural Science Foundation of China (Grant No. 61173143) and the Special Public Sector Research Program of China (Grant No. GYHY201206030). The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding this work through research group No. RGP-VPP-264.
Corresponding Authors:  Ma Ting-Huai     E-mail:  thma@nuist.edu.cn

Cite this article: 

Lu Ying-Hua (陆颖华), Ma Ting-Huai (马廷淮), Zhong Shui-Ming (钟水明), Cao Jie (曹杰), Wang Xin (王新), Abdullah Al-Dhelaan Improved locality-sensitive hashing method for the approximate nearest neighbor problem 2014 Chin. Phys. B 23 080203

[1] Lü Q, Josephson W, Wang Z and Charikar M 2007 Proceedings of the 33rd International Conference on Very Large Data Bases, September 23-28, 2007, Vienna, Austria, p. 950
[2] Friedman J H, Bentley J L and Finkel R A 1977 ACM Transactions on Mathematical Software 3 209
[3] Guttman A 1984 Proceedings of ACM Conference on Management of Data (SIGMOD), June 18-21, 1984, Boston, Massachusetts, p. 47
[4] Tsaparas P 1999 Qualifying Depth Oral Report 31902
[5] Jégou H, Douze M and Schmid C 2011 IEEE Transaction Pattern Analysis and Machine Intelligence 33 117
[6] Lejsek H, Asmundsson F H, Jónsson B and Amsaleg L 2009 IEEE Transaction Pattern Analysis and Machine Intelligence 31 869
[7] Indyk P and Motwani R 1998 Proceedings of the 30th on Symposium on Theory of Computing (STOC), May 23-26, 1998, Dallas, Texas, USA, p. 604
[8] Gionis A, Indyk P and Motwani R 1999 Proceedings of the 25th on Very Large Data Bases (VLDB), September 7-10, 1999, Edinburgh, Scotland, UK, p. 518
[9] Datar M, Immorlica N, Indyk P and Mirrokni V S 2004 Proceedings of the 20th Annual Symposium on Computational Geometry, June 8-11, 2004, Brooklyn, New York, USA, p. 253
[10] Liu F, Leung H Y, Cheng L M and Ji X Y 2012 Chin. Phys. B 21 040204
[11] Cao Y, ZhangH and Guo J 2011 Proceedings of the 18th International Conference on Image Processing, September 11-14, 2011, Brussels, Belgium, p. 2461
[12] Liu F, Leung H Y, Cheng L M and Ji X Y 2012 Chin. Phys. B 21 040204
[13] Hou J, Yan G F and Fan Z 2011 Chin. Phys. B 20 048103
[14] Paula L B, Villaca R S and Magalhaes M F 2010 Proceedings of the 4th International Conference on Semantic Computing (ICSC), September 22-24, 2010, Pittsburgh, PA, USA, p. 408
[15] Ma J F, Hou K, Bao S L and Chen C 2011 Chin. Phys. B 20 028701
[16] Zhang Z, Cao C, Zhang R and Zou J 2010 Proceedings of the International Conference on Automation and Logistics (ICAL), August 16-20, 2010, Hong Kong and Macau, China, p. 13
[17] Long M, Peng F and Chen G R 2008 Chin. Phys. B 17 3588
[18] Chen C, Horng S and Huang C 2011 Expert Systems with Applications 38 12388
[19] Matei B, Shan Y, Sawhney H S, Tan Y, Kumar R, Huber D and Hebert M 2006 IEEE Transactions on Pattern Analysis and Machine Intelligence 28 1111
[20] Indyk P and Thaper N 2003 Proceedings of the International Workshop Statistical and Computational Theories of Vision, July 13-15, 2003, Vancouver, Canada, p. 33
[21] Gorisse D, Cord M and Precioso F 2012 IEEE Transaction on Pattern Analysis and Machine Intelligence 34 402
[22] Charikar M S 2002 Proceedings of the 34th Symposium on Theory of Computing (STOC), May 19-21, 2002, Montréal, Québec, Canada, p. 380
[23] Jain P, Kulis B and Grauman K 2008 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 23-28, 2008, Anchorage, Alaska, USA, p. 1
[24] Kulis B and Grauman K 2012 IEEE Transactions on Pattern Analysis and Machine Intelligence 34 1092
[25] Panigrahy R 2006 Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithm (SODA), January 22-26, 2006, Miami, Florida, USA, p. 1186
[26] Jégou H, Amsaleg L, Schmid C and Gros P 2008 Proceedings of International Conference on Acoustics, Speech, and Signal Processing, March 31-April 4, 2008, Las Vegas, USA, p. 825
[27] Gu X, Zhang L, Zhang D, Zhang Y, Li J and Bao N 2012 Proceedings of the 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, August 8-10, 2012, Kyoto, Japan, p. 3
[28] Joly A and Buisson O 2008 Proceedings of the 16th ACM International Conference on Multimedia, October 26-31, 2008, Vancouver, Canada, p. 209
[29] Weiss Y, Torralba A and Fergus R 2008 Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems, December 8-11, 2008, Vancouver, British Columbia, Canada, p. 134
[30] Wang J, Kumar S and Chang S 2010 Proceedings of 2010 Conference on Computer Vision and Pattern Recognition (CVPR), June 13-18, 2010, San Francisco, USA, p. 3424
[31] Wang J, Kumar S and Chang S 2010 Proceedings of the 27th International Conference on Machine Learning, June 21-24, 2010, Haifa, Israel, p. 1127
[32] Liu W, Wang J, Kumar S and Chang S 2011 Proceedings of the 28th International Conference on Machine Learning, June 28-July 2, 2011, Bellevue, Washington, USA, p. 567
[33] Pan J and Manocha D 2011 Proceedings of GIS, November 1-4, 2011, Chicago, USA, p. 211
[34] Koga H, Oguri M and Watanabe T 2012 Proceedings of the 26th International Conference on Advanced Information Networking and Applications, March 26-29, 2012, Fukuoka, Japan, p. 175
[35] Slaney M, Lifshits Y and He J 2012 Proc. IEEE 100 2604
[36] Bawa M, Condie T and Ganesan P 2005 Proceedings of the 14th International World Wide Web Conference (WWW), May 10-14, 2005, Chiba, Japan, p. 651
[37] Gu X, Zhang Y, Zhang L and Zhang D 2013 Signal Processing 93 2244
[38] Paulevé L, Jégoub H and Amsalegc L 2010 Pattern Recognition Letters 11 1348
[39] Stable distribution, http://en.wikipedia.org/wiki/Stable_distribution, 2013
[40] Normal distribution, http://en.wikipedia.org/wiki/Normal_distribution, 2013
[41] Cauchy distribution, http://en.wikipedia.org/wiki/Cauchy_distribution, 2013
[42] Lévy distribution, http://en.wikipedia.org/wiki/L%C3%A9vy_distribution, 2013
[43] Heavy-tailed distribution, http://en.wikipedia.org/wiki/Heavy-tailed_distribution, 2013
[44] Zolotarev V M 1986 Translation of Mathematical Monographs 65 270
[45] Slaney M and Casey M 2008 IEEE Signal Processing Magazine 25 128
[46] Datasets for approximate nearest neighbor search. http://corpus-texmex.irisa.fr/, 2013
[1] Pedestrian evacuation simulation in multi-exit case:An emotion and group dual-driven method
Yong-Xing Li(李永行), Xiao-Xia Yang(杨晓霞), Meng Meng(孟梦), Xin Gu(顾欣), Ling-Peng Kong(孔令鹏). Chin. Phys. B, 2023, 32(4): 048901.
[2] Preliminary abnormal electrocardiogram segment screening method for Holter data based on long short-term memory networks
Siying Chen(陈偲颖), Hongxing Liu(刘红星). Chin. Phys. B, 2020, 29(4): 040701.
[3] Stationary response of stochastic viscoelastic system with the right unilateral nonzero offset barrier impacts
Deli Wang(王德莉), Wei Xu(徐伟), Xudong Gu(谷旭东). Chin. Phys. B, 2019, 28(1): 010203.
[4] Pedestrian choice behavior analysis and simulation of vertical walking facilities in transfer station
Yong-Xing Li(李永行), Hong-Fei Jia(贾洪飞), Jun Li(李军), Ya-Nan Zhou(周亚楠), Zhi-Lu Yuan(原志路), Yan-Zhong Li(李延忠). Chin. Phys. B, 2016, 25(10): 108901.
[5] Ghost imaging based on Pearson correlation coefficients
Yu Wen-Kai (俞文凯), Yao Xu-Ri (姚旭日), Liu Xue-Feng (刘雪峰), Li Long-Zhen (李龙珍), Zhai Guang-Jie (翟光杰). Chin. Phys. B, 2015, 24(5): 054203.
[6] Modeling walking behavior of pedestrian groups with floor field cellular automaton approach
Lu Li-Li (陆丽丽), Ren Gang (任刚), Wang Wei (王炜), Wang Yi (王义). Chin. Phys. B, 2014, 23(8): 088901.
[7] Self-organized phenomena of pedestrian counter flow in a channel under periodic boundary conditions
Li Xiang (李翔), Duan Xiao-Yin (段晓茵), Dong Li-Yun (董力耘). Chin. Phys. B, 2012, 21(10): 108901.
[8] Continuum modeling for two-lane traffic flow with consideration of the traffic interruption probability
Tian Chuan(田川) and Sun Di-Hua(孙棣华). Chin. Phys. B, 2010, 19(12): 120501.
[9] Topological probability and connection strength induced activity in complex neural networks
Wei Du-Qu(韦笃取), Zhang Bo(张波), Qiu Dong-Yuan(丘东元), and Luo Xiao-Shu(罗晓曙). Chin. Phys. B, 2010, 19(10): 100513.
[10] Probabilistic joint remote preparation of a high-dimensional equatorial quantum state
Zhan You-Bang(詹佑邦),Zhang Qun-Yong(张群永), and Shi Jin(施锦). Chin. Phys. B, 2010, 19(8): 080310.
[11] Multi-agent coordination in directed moving neighbourhood random networks
Shang Yi-Lun (尚轶伦). Chin. Phys. B, 2010, 19(7): 070201.
[12] A novel evolving scale-free model with tunable attractiveness
Liu Xuan (刘绚), Liu Tian-Qi (刘天琪), Wang Hao (王昊), Li Xing-Yuan (李兴源). Chin. Phys. B, 2010, 19(7): 070204.
[13] A new form of Tsallis distribution based on the probabilistically independent postulate
Du Jiu-Lin(杜九林). Chin. Phys. B, 2010, 19(7): 070501.
[14] Effect of following strength on pedestrian counter flow
Kuang Hua (邝华), Li Xing-Li (李兴莉), Wei Yan-Fang (韦艳芳), Song Tao (宋涛), Dai Shi-Qiang (戴世强). Chin. Phys. B, 2010, 19(7): 070517.
[15] The Wigner function and phase properties of superposition of two coherent states with the vacuum state
Wang Yue-Yuan (王月媛), Wang Ji-Cheng (王继成), Liu Shu-Tian (刘树田). Chin. Phys. B, 2010, 19(7): 074206.
No Suggested Reading articles found!