|
|
Improved locality-sensitive hashing method for the approximate nearest neighbor problem |
Lu Ying-Hua (陆颖华)a, Ma Ting-Huai (马廷淮)a b, Zhong Shui-Ming (钟水明)a, Cao Jie (曹杰)c, Wang Xin (王新)d, Abdullah Al-Dhelaane |
a Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing 210044, China; b School of Computer & Software, Nanjing University of Information Science & Technology, Nanjing 210044, China; c School of Economic & Management, Nanjing University of Information Science & Technology, Nanjing 210044, China; d Huafeng Meteorological Media Group, Beijing 100086, China; e Computer Science Department, College of Computer and Information Science, King Saud University, Riyadh 11362, Saudi Arabia |
|
|
Abstract In recent years, the nearest neighbor search (NNS) problem has been widely used in various interesting applications. Locality-sensitive hashing (LSH), a popular algorithm for the approximate nearest neighbor problem, is proved to be an efficient method to solve the NNS problem in the high-dimensional and large-scale databases. Based on the scheme of p-stable LSH, this paper introduces a novel improvement algorithm called randomness-based locality-sensitive hashing (RLSH) based on p-stable LSH. Our proposed algorithm modifies the query strategy that it randomly selects a certain hash table to project the query point instead of mapping the query point into all hash tables in the period of the nearest neighbor query and reconstructs the candidate points for finding the nearest neighbors. This improvement strategy ensures that RLSH spends less time searching for the nearest neighbors than the p-stable LSH algorithm to keep a high recall. Besides, this strategy is proved to promote the diversity of the candidate points even with fewer hash tables. Experiments are executed on the synthetic dataset and open dataset. The results show that our method can cost less time consumption and less space requirements than the p-stable LSH while balancing the same recall.
|
Received: 16 October 2013
Revised: 10 February 2014
Accepted manuscript online:
|
PACS:
|
02.50.Cw
|
(Probability theory)
|
|
02.50.Ey
|
(Stochastic processes)
|
|
02.50.Fz
|
(Stochastic analysis)
|
|
02.60.Gf
|
(Algorithms for functional approximation)
|
|
Fund: Project supported by the National Natural Science Foundation of China (Grant No. 61173143) and the Special Public Sector Research Program of China (Grant No. GYHY201206030). The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding this work through research group No. RGP-VPP-264. |
Corresponding Authors:
Ma Ting-Huai
E-mail: thma@nuist.edu.cn
|
Cite this article:
Lu Ying-Hua (陆颖华), Ma Ting-Huai (马廷淮), Zhong Shui-Ming (钟水明), Cao Jie (曹杰), Wang Xin (王新), Abdullah Al-Dhelaan Improved locality-sensitive hashing method for the approximate nearest neighbor problem 2014 Chin. Phys. B 23 080203
|
[1] |
Lü Q, Josephson W, Wang Z and Charikar M 2007 Proceedings of the 33rd International Conference on Very Large Data Bases, September 23-28, 2007, Vienna, Austria, p. 950
|
[2] |
Friedman J H, Bentley J L and Finkel R A 1977 ACM Transactions on Mathematical Software 3 209
|
[3] |
Guttman A 1984 Proceedings of ACM Conference on Management of Data (SIGMOD), June 18-21, 1984, Boston, Massachusetts, p. 47
|
[4] |
Tsaparas P 1999 Qualifying Depth Oral Report 31902
|
[5] |
Jégou H, Douze M and Schmid C 2011 IEEE Transaction Pattern Analysis and Machine Intelligence 33 117
|
[6] |
Lejsek H, Asmundsson F H, Jónsson B and Amsaleg L 2009 IEEE Transaction Pattern Analysis and Machine Intelligence 31 869
|
[7] |
Indyk P and Motwani R 1998 Proceedings of the 30th on Symposium on Theory of Computing (STOC), May 23-26, 1998, Dallas, Texas, USA, p. 604
|
[8] |
Gionis A, Indyk P and Motwani R 1999 Proceedings of the 25th on Very Large Data Bases (VLDB), September 7-10, 1999, Edinburgh, Scotland, UK, p. 518
|
[9] |
Datar M, Immorlica N, Indyk P and Mirrokni V S 2004 Proceedings of the 20th Annual Symposium on Computational Geometry, June 8-11, 2004, Brooklyn, New York, USA, p. 253
|
[10] |
Liu F, Leung H Y, Cheng L M and Ji X Y 2012 Chin. Phys. B 21 040204
|
[11] |
Cao Y, ZhangH and Guo J 2011 Proceedings of the 18th International Conference on Image Processing, September 11-14, 2011, Brussels, Belgium, p. 2461
|
[12] |
Liu F, Leung H Y, Cheng L M and Ji X Y 2012 Chin. Phys. B 21 040204
|
[13] |
Hou J, Yan G F and Fan Z 2011 Chin. Phys. B 20 048103
|
[14] |
Paula L B, Villaca R S and Magalhaes M F 2010 Proceedings of the 4th International Conference on Semantic Computing (ICSC), September 22-24, 2010, Pittsburgh, PA, USA, p. 408
|
[15] |
Ma J F, Hou K, Bao S L and Chen C 2011 Chin. Phys. B 20 028701
|
[16] |
Zhang Z, Cao C, Zhang R and Zou J 2010 Proceedings of the International Conference on Automation and Logistics (ICAL), August 16-20, 2010, Hong Kong and Macau, China, p. 13
|
[17] |
Long M, Peng F and Chen G R 2008 Chin. Phys. B 17 3588
|
[18] |
Chen C, Horng S and Huang C 2011 Expert Systems with Applications 38 12388
|
[19] |
Matei B, Shan Y, Sawhney H S, Tan Y, Kumar R, Huber D and Hebert M 2006 IEEE Transactions on Pattern Analysis and Machine Intelligence 28 1111
|
[20] |
Indyk P and Thaper N 2003 Proceedings of the International Workshop Statistical and Computational Theories of Vision, July 13-15, 2003, Vancouver, Canada, p. 33
|
[21] |
Gorisse D, Cord M and Precioso F 2012 IEEE Transaction on Pattern Analysis and Machine Intelligence 34 402
|
[22] |
Charikar M S 2002 Proceedings of the 34th Symposium on Theory of Computing (STOC), May 19-21, 2002, Montréal, Québec, Canada, p. 380
|
[23] |
Jain P, Kulis B and Grauman K 2008 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 23-28, 2008, Anchorage, Alaska, USA, p. 1
|
[24] |
Kulis B and Grauman K 2012 IEEE Transactions on Pattern Analysis and Machine Intelligence 34 1092
|
[25] |
Panigrahy R 2006 Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithm (SODA), January 22-26, 2006, Miami, Florida, USA, p. 1186
|
[26] |
Jégou H, Amsaleg L, Schmid C and Gros P 2008 Proceedings of International Conference on Acoustics, Speech, and Signal Processing, March 31-April 4, 2008, Las Vegas, USA, p. 825
|
[27] |
Gu X, Zhang L, Zhang D, Zhang Y, Li J and Bao N 2012 Proceedings of the 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, August 8-10, 2012, Kyoto, Japan, p. 3
|
[28] |
Joly A and Buisson O 2008 Proceedings of the 16th ACM International Conference on Multimedia, October 26-31, 2008, Vancouver, Canada, p. 209
|
[29] |
Weiss Y, Torralba A and Fergus R 2008 Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems, December 8-11, 2008, Vancouver, British Columbia, Canada, p. 134
|
[30] |
Wang J, Kumar S and Chang S 2010 Proceedings of 2010 Conference on Computer Vision and Pattern Recognition (CVPR), June 13-18, 2010, San Francisco, USA, p. 3424
|
[31] |
Wang J, Kumar S and Chang S 2010 Proceedings of the 27th International Conference on Machine Learning, June 21-24, 2010, Haifa, Israel, p. 1127
|
[32] |
Liu W, Wang J, Kumar S and Chang S 2011 Proceedings of the 28th International Conference on Machine Learning, June 28-July 2, 2011, Bellevue, Washington, USA, p. 567
|
[33] |
Pan J and Manocha D 2011 Proceedings of GIS, November 1-4, 2011, Chicago, USA, p. 211
|
[34] |
Koga H, Oguri M and Watanabe T 2012 Proceedings of the 26th International Conference on Advanced Information Networking and Applications, March 26-29, 2012, Fukuoka, Japan, p. 175
|
[35] |
Slaney M, Lifshits Y and He J 2012 Proc. IEEE 100 2604
|
[36] |
Bawa M, Condie T and Ganesan P 2005 Proceedings of the 14th International World Wide Web Conference (WWW), May 10-14, 2005, Chiba, Japan, p. 651
|
[37] |
Gu X, Zhang Y, Zhang L and Zhang D 2013 Signal Processing 93 2244
|
[38] |
Paulevé L, Jégoub H and Amsalegc L 2010 Pattern Recognition Letters 11 1348
|
[39] |
Stable distribution, http://en.wikipedia.org/wiki/Stable_distribution, 2013
|
[40] |
Normal distribution, http://en.wikipedia.org/wiki/Normal_distribution, 2013
|
[41] |
Cauchy distribution, http://en.wikipedia.org/wiki/Cauchy_distribution, 2013
|
[42] |
Lévy distribution, http://en.wikipedia.org/wiki/L%C3%A9vy_distribution, 2013
|
[43] |
Heavy-tailed distribution, http://en.wikipedia.org/wiki/Heavy-tailed_distribution, 2013
|
[44] |
Zolotarev V M 1986 Translation of Mathematical Monographs 65 270
|
[45] |
Slaney M and Casey M 2008 IEEE Signal Processing Magazine 25 128
|
[46] |
Datasets for approximate nearest neighbor search. http://corpus-texmex.irisa.fr/, 2013
|
No Suggested Reading articles found! |
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
Altmetric
|
blogs
Facebook pages
Wikipedia page
Google+ users
|
Online attention
Altmetric calculates a score based on the online attention an article receives. Each coloured thread in the circle represents a different type of online attention. The number in the centre is the Altmetric score. Social media and mainstream news media are the main sources that calculate the score. Reference managers such as Mendeley are also tracked but do not contribute to the score. Older articles often score higher because they have had more time to get noticed. To account for this, Altmetric has included the context data for other articles of a similar age.
View more on Altmetrics
|
|
|