Summary of readings
| Paper title | authors | summary | links | |
| Kernel Nearest-Neighbor Algorithm | KAI YU, LIANG JI* and XUEGONG ZHANG | The ‘kernel approach’ is applied to modify norm distance metric in Hilbert space, and then nn algorithm becomes kernel nearest-neighbor algorithm. In some specific conditions, such as polynomial kernel p=1 or radial basis kernel, it degenerates to conventional nearest-neighbor algorithm. By choosing an appropriate kernel function, the results of kernel nearest-neighbor algorithm are better than those of conventional nearest-neighbor algorithm | http://www.springerlink.com/content/hqg0keryj8tuftyg/ | |
| Scalable Collaborative Filtering with Jointly Derived Neighborhood InterpolationWeights |
Robert M. Bell and Yehuda Koren |
The collobarating filtering through neighbourhood based interpolation weights, which are used to estimate unknown ratings from neighboring known ones. Nevertheless, the literature lacks a rigorous way to derive these weights. This work showed how the interpolation weights can be computed as a global solution to an optimization problem that precisely reflects their role. Comparison with past kNN methods on the Netflix data, demonstrated a significant improvement of prediction accuracy without a meaningful increase in running time. A kNN method can be most effectively employed in an item-oriented manner, by analyzing relationships between items. |
http://portal.acm.org/citation.cfm?id=1442050 | |
| Automated Tag Clustering: Improving search and exploration in the tag space |
Grigory Begelman
Philipp Keller Frank Smadja |
Presented work is convincing evidence that clustering techniques can and should be used in combination with tagging. Clustering can improve the tagging experience and the use of the tagspace in general. They have presented several clustering techniques and provided some results obtained on the del.icio.us . | http://www.pui.ch/phred/automated_tag_clustering/ | |
| Autotagging to Improve Text Search for 3D Models | Corey Goldfeder
Peter Allen |
The demonstration of an automatic tagging system that learns new tags for a 3D model by comparing it to a large set of tagged models and probabilistically propagating tags from neighbors. They shown that autotagging to improve shape retrieval in a digital library, there are several other domains where automatically annotating 3D models can be helpfulthe discriminative power of these tags is comparable to that of the underlying geometric similarity distance, and that searching for models based on our autotags can result in better precision and greater recall than searching on the original tags.
|
http://portal.acm.org/citation.cfm?id=1378889.1378950 | |
| Harvesting Social Knowledge from Folksonomies | Harris Wu
Mohammad Zubair Kurt Maly |
Collaborative tagging systems have the potential of becoming a technological infrastructure for harvesting social knowledge. There are many challenges, the designed prototypes that enhance social tagging systems to meet some of the key challenges. they developed a comprehensive evaluation methodology. | http://portal.acm.org/citation.cfm?id=1149941.1149962 | |
| Classification-Enhanced Ranking | Paul N. Bennett
Krysta Svore Susan T. Dumais |
In this work,demonstrated that topical class information can be used to improve the relevance of retrieval resultsby generalizing along the class dimension to identify other relevant results. In order to do this, they introduced a natural denition of query class that stems from the results that are relevant for that query and can be estimated using click behavior. Approach is notable for its focus on directly improving ranking relevance rather than indirect measures like query classication. |
http://portal.acm.org/citation.cfm?id=1772703&dl=ACM | |
| Exploiting Query Reformulations for Web Search Result Diversification |
Rodrygo L. T. Santos
Craig Macdonald Iadh Ounis |
Introduced a novel probabilistic framework for search result diversification. In particular, the xQuAD (eXplicit Query Aspect Diversification) framework explicitly models the aspects underlying an initial query, in the form of sub-queries. Instead of comparing documents to one another—which usually demands expensive computations—our approach achieves an effective diversification performance by directly estimating the relevance |
http://portal.acm.org/citation.cfm?id=1772690.1772780 | |
| Personalized Query Expansion for the Web | Paul - Alexandru Chirita Claudiu S. Firan Wolfgang Nejdl |
Proposed to expand Web search queries by exploiting the user’s Personal Information Repository in order to automatically extract additional keywords related both to the query itself and to user’s interests, personalizing the search output | http://portal.acm.org/citation.cfm?id=1277741.1277746 | |
| Accurate Methods for the Statistics of Surprise and Coincidence |
Ted Dunning* |
loglikelihood |
http://portal.acm.org/citation.cfm?id=972454
|
|
| Finding Relevant Concepts for Unknown Terms Using a Web-based Approach | Chen-Ming Hung1 and Lee-Feng Chien1 | Presented a potential approach to finding relevant concepts for terms via utilizing World Wide Web. This approach obtained an encouraging experimental result in testing Yahoo!’s computer science hierarchy. However, the work needs more in-depth study. As what we mentioned previously, choosing the word with the highest weighted log likelihood ratio as the concept of a clustered group after the Greedy EM algorithm does not provide enough representative. In addition, one concept usually contains many domains, e.g. “ATM” contains security, teller machine, transaction cost, and etc. Thus, distinguishing the extracted keywords into a certain concept still needs human intervention. On the other hand, in order to solve the problem of “too much effort” of the Greedy EM algorithm, we need to modify it with another convergence criterion. | http://www.aclweb.org/anthology/O/O04/O04-1007.pdf | |
| Improving Term Extraction Using Particle Swarm Optimization Techniques | Mohammad Syafrullah and Naomie Salim |
Presented a particle swarm optimization technique to improve term extraction precision. they choose five features to represent the term score: domain relevance, domain consensus, term cohesion, first occurrence and length of noun phrase. In the experiments, we use a translation of the meaning of the Quran (focus on verses of prayer) as an input document, both for training and testing phases. separate the documents between training documents and test documents. Particles swarm optimization is trained using the training documents to determine the appropriate weight of each feature to produce the best score for each term. We conduct tests with the test document using the weight of each feature which is generated from the training stage to calculate the final score for each term to be extracted. |
http://www.scipub.org/fulltext/jcs/jcs63323-329.pdf |
- Login to post comments
