†Corresponding author. E-mail: tianqiu.edu@gmail.com
*Project supported by the National Natural Science Foundation of China (Grant No. 11175079) and the Young Scientist Training Project of Jiangxi Province, China (Grant No. 20133BCB23017).
We propose an indirect-link-weakened mass diffusion method (IMD), by considering the indirect linkage and the source object heterogeneity effect in the mass diffusion (MD) recommendation method. Experimental results on the MovieLens, Netflix, and RYM datasets show that, the IMD method greatly improves both the recommendation accuracy and diversity, compared with a heterogeneity-weakened MD method (HMD), which only considers the source object heterogeneity. Moreover, the recommendation accuracy of the cold objects is also better elevated in the IMD than the HMD method. It suggests that eliminating the redundancy induced by the indirect linkages could have a prominent effect on the recommendation efficiency in the MD method.
Internet has brought a great convenience to people for providing sufficient information, which on the other hand makes one in the dilemma of choosing what one wants when facing the enormous information. For example, how to quickly find out a favorable movie from the huge number of online movies. As a powerful tool, recommender engine emerges to make predictions for users according to their past activities, which therefore greatly helps people filter out the redundant information.[1]
Various recommendation algorithms have been proposed. The most widely applied algorithms include the so-called collaborative filtering algorithm and its improved ones.[2– 6] Another line of algorithms is the content-based algorithm, and its extensive versions.[7– 10] To enhance the recommendation efficiency, different accessorial information is also considered in some algorithms, such as the social tags.[11– 16] Recently, the network-based recommendation algorithms have attracted great interest of physicists, [16– 18] for the development of the complex network theory.[19, 20] Different physical processes have been introduced to the information filtering, such as the heat conduction process. Inspired by a heat conduction algorithm[21] and a mass diffusion (MD) algorithm, [22] numerous network-based recommendation algorithms have been proposed. Liu et al. proposed a biased heat conduction method by considering the heterogeneity of the target objects, [23] which greatly improves the recommendation accuracy, compared with the original heat conduction method. The heterogeneity of the source objects is further considered, [24] which well alleviates the recommendation bias between the cold and hot objects, and further improves both the recommendation accuracy and diversity. A hybrid method of heat conduction and mass diffusion (HHP) optimizes the diffusion process by introducing a hybridization parameter, which well resolves the dilemma of recommendation accuracy and diversity.[25] Based on the HHP, an improved hybrid version is proposed by considering the object heterogeneity, and it greatly enhances the recommendation accuracy of the cold objects.[26] Scaling behavior is observed in the recommender systems, and the scaling-based algorithm shows better performance not only on the recommendation accuracy of the cold objects, but also on the recommendation diversity.[27] Heterogeneity effect of the initial configuration is investigated using different methods, [28, 29] and the higher-order correlation is eliminated in the mass diffusion method, [30] and in the HHP method.[31] These improved methods are found to be beneficial for the recommendation efficiency in different aspects.
In this article, we propose an indirect-link-weakened MD method (IMD), by weakening both the higher-order redundancy induced by the indirect linkages and the object heterogeneity effect in the MD method. We try to understand which factor could be important to the recommendation efficiency for a particular algorithm. By comparing the recommendation results of four algorithms, the indirect linkage effect is found to have a prominent impact on the recommendation performance in the mass diffusion method.
The network-based recommender system can be described by the bipartite graph, [32, 33] which is composed of the user set U = {u1, u2, … , ui, … , um} and the object set O = {o1, o2, … , oα , … , on}. If an object oα is collected by a user ui, then add a link between them. Therefore, the recommender system can be characterized by an adjacent matrix A = aiα , with aiα being 1 for the linked user– object pairs; otherwise, being 0.
Assume each object has an initial resource, which could spread from one object to another along the link between the user and object. Here we name the object disseminating resource as the source object, and the object receiving resource as the target object. The resource reallocation process can be formulated as
where W is the transformation matrix, characterizing the resource diffusion process; f is the initial resource of the object, which can be assigned to 1 or 0 for simplicity, depending on whether it is collected by the user or not, i.e.,
The transformation matrix W is therefore very essential to the resource reallocation. In the MD algorithm, [22] an equal probability spreading process is introduced, as illustrated in Fig. 1. At first, each object distributes the resource to its neighboring users with an equal probability. For example, the first object β who has an initial resource of 1 as the source object would disseminate the resource to its two neighboring users, i.e., the first and second users, with an equal probability. Therefore, the two users would obtain 1/2 resource from the object β , respectively. Summing up the resources from all the linked objects, the user then redistributes the total resource to his/her neighboring objects, also with the equal probability. For example, the user i who has a total resource of 1 would feedback the resource to i’ s two neighboring objects with an equal probability as well. The final resource of the object is the total resource obtained from all its neighboring users. The transformation matrix of the MD algorithm is formulated by
where kβ is the degree of the source object oβ , and ki is the degree of user ui. Compared with the user-similarity based collaborative filtering algorithm, the MD method greatly improves the recommendation accuracy.
Previous studies have shown that the object heterogeneity has a great impact on the recommendation accuracy.[23, 24] It has been reported that a more homogeneous object distribution may result in a better recommendation accuracy.[28] Based on the MD method, a heterogeneity-weakened MD algorithm (HMD) is introduced in Ref. [28], by relieving the source object heterogeneity with a tunable parameter η . The transformation matrix of the HMD can be formulated by
Furthermore, the redundancy induced by the indirect linkage is considered in the MD method. The redundant correlation effect has been investigated in different algorithms.[30, 31] Properly eliminating the redundant correlations can well elevate the algorithm performance. To depict how the indirect linkage works in the bipartite network based recommender system, we present the simplest example in Fig. 2. As shown in Fig. 2(a), for the particular objects α and β , which are both collected by the same user i, the similarity between α and β can be described as L1 along with the direct linkage α − i − β . However, the correlations between objects could be much complex in real systems. Except the direct linkage, there are numerous indirect linkages. The simplest example is shown in Fig. 2(b). Besides the direct correlation via the user i, the objects α and β could also be correlated via some other media objects, such as the media object γ in Fig. 2(b). That is, the object α is collected by user i, and the object β is collected by user j, while users i and j both collect the object γ . Therefore, there is a second-order correlation L2 between object α and β along the indirect linkage α − i − γ − j − β , which would lead to the linkage redundancy of the system. In order to eliminate the redundancy induced by the indirect linkage, we propose the indirect-link-weakened MD method (IMD), with its transformation matrix formulated by
where λ and η are two tunable parameters. Therefore, the IMD method can be taken as a combination of weakening both the indirect linkage[30] and the object heterogeneity effect.[28] When tuning λ and η to the proper values, the IMD method would achieve the optimal recommendation results.
To better evaluate the performance of the proposed methods, the traditional user-similarity based collaborative filtering algorithm (CF) is also investigated for comparison, by calculating the similarity between users i and j via a cosine similarity,
The score of the user i to object α is fα i = ∑ j≠ isijaα j. Ranking the scores of the user’ s uncollected objects as the descending order, then the objects with the top L scores would be recommended to the user i.
Three empirical datasets, i.e., the MovieLens, the Netflix, and the RYM, are used to test the performance of the recommendation algorithms. The MovieLens and the Netflix datasets are both five-level rating movie systems, and the RYM is a ten-level music rating system. The MovieLens, downloaded from the website of GroupLens Research (
Three widely adopted evaluators are applied to quantify the recommendation accuracy performance, i.e., the ranking score 〈 RS〉 , precision P, and recall R.[22, 34] The ranking score 〈 RS〉 α i is defined as RSα i = pα /(n − ki), where n is the number of all objects, ki is the degree of the user ui, and pα is the position of the recommended object oα located in all the uncollected objects of the user ui. The average ranking score 〈 RS〉 is taken an average of 〈 RS〉 α i over all the deleted links. The smaller the 〈 RS〉 , the higher rank of the deleted link, and the more accurate the algorithm. The precision P is defined as
Two evaluators are used to quantify the personalized recommendation, i.e., the novelty NL and Hamming distance H.[28, 35] The novelty NL is defined as
where ϕ i ∩ ϕ j is the number of the common objects recommended for the user ui and uj in the top L recommendation list, and therefore quantifies the difference between two users’ recommendation lists. The higher the H, the more diverse the algorithm, and vice versa.
To investigate how the heterogeneity of source objects and the indirect linkage effect on the recommendation efficiency, we compare the HMD and the IMD with the original MD method, and also the traditional user-similarity based CF method. For the HMD and IMD methods, we firstly try to find out the optimal value of the tunable parameter. Since the optimal value of parameter may differ in the evaluators, here we use the widely adopted ranking score as the evaluator to estimate the optimal value. As shown in Fig. 3, for the HMD method, the optimal value is obtained at the minimal value of the ranking score 〈 RS〉 , with η = 1.87 for the MovieLens, 1.58 for the Netflix, and 1.78 for the RYM, respectively. For the IMD method, as shown in Fig. 4, the minimal value of the ranking score 〈 RS〉 is obtained at (λ , η ) = (0.58, 0.94) for the MovieLens, (0.46, 0.92) for the Netflix, and (0.78, 1.02) for the RYM, respectively. All the following results are based on the optimal value of the parameter, and the results are averaged over six independent runs.
The results of the CF, MD, HMD, and IMD are shown in Table 1. It is observed, for the MovieLens, Netflix, and RYM, the HMD and IMD methods outperform the original MD, as well as the CF, in all the evaluators of accuracy and personalization. To quantify the improvement, we show the improvement percentage of the IMD against the CF, MD, and HMD in Table 2. Taking the MovieLens as an example, the improvement percentage of the ranking score 〈 RS〉 of the IMD against the MD is 22.9%, and further against the HMD is 18.2%. In addition, the improvement percentage of the diversity H of the IMD against the MD is 28.2%, and against the HMD is 15.0%.
From the above results, the recommendation performance can be improved by weakening both perspectives of the object heterogeneity and the indirect linkage effect. However, compared with the HMD method, the improvement of the IMD method is much greater. That is to say, the redundancy induced by the indirect linkage could have an important impact on the recommendation efficiency for the MD method. The possible reason for the relatively smaller improvement of the HMD algorithm could be that the HMD optimizes the source object resource, which may have less effect on the recommendation list obtained based on the order of the target object resource.
To investigate the robustness of the results, the novelty NL and Hamming distance H on the recommendation list length L is studied for the CF, MD, HMD and IMD methods. As shown in Fig. 5, for the MovieLens and Netflix, the novelty NL is found to be improved for the HMD method to a certain degree, and greatly improved for the IMD method for all the investigated range of the recommendation list length, compared with that of the CF and MD methods. Similarly, as shown in Fig. 6, the Hamming distance H is observed to be improved for the HMD method to a certain degree, and greatly improved for the IMD method for all the investigated range of L for the MovieLens and Netflix, compared with that of both the CF and MD methods. The results suggest that the IMD algorithm indeed greatly improves the personalized recommendation.
Another important evaluation of the recommendation algorithm is how it works on the cold objects, i.e., the so-called cold start problem.[11, 13, 36] From our datasets, the cold objects occupy a big proportion, and the object with its degree no more than 10 is 41.26%, 49.59%, and 21.73% for the MovieLens, Netflix, and RYM, respectively. For lack of the historical records, it remains a great challenge to make recommendation for them. Here we employ an object-dependent ranking score 〈 RS〉 k[28] to evaluate the algorithm performance for the cold objects. The 〈 RS〉 k is defined as the average ranking score over objects with the same value of degrees. As shown in Fig. 7, the 〈 RS〉 k of the HMD is observed to be a little more advantageous than that of the CF and MD again, while that of the IMD is found to be much more advantageous than that of the CF and MD, for all the three datasets. It implies that, for the recommendation of the cold objects, the improvement of the IMD method is also much greater than that of the HMD, which further suggests that the indirect linkage effect has an important impact on the recommendation efficiency of the MD method.
We propose an improved network based recommendation method, the IMD method, by considering the indirect linkage and the object heterogeneity effect, based on the mass diffusion (MD) method. Tested on three empirical datasets, the IMD method is found to greatly improve both the recommendation accuracy and diversity of the MD method, compared with the HMD method which merely considers the source object heterogeneity. Moreover, the IMD method also greatly improves the recommendation accuracy of the cold objects. The obtained results indicate that the indirect linkage effect could have an important impact on the recommendation efficiency in the MD method.
1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
6 |
|
7 |
|
8 |
|
9 |
|
10 |
|
11 |
|
12 |
|
13 |
|
14 |
|
15 |
|
16 |
|
17 |
|
18 |
|
19 |
|
20 |
|
21 |
|
22 |
|
23 |
|
24 |
|
25 |
|
26 |
|
27 |
|
28 |
|
29 |
|
30 |
|
31 |
|
32 |
|
33 |
|
34 |
|
35 |
|
36 |
|