Summary of readings

K.Reddy's picture
Presentation status: 
 Paper title authors  summary links
Kernel Nearest-Neighbor Algorithm KAI YU, LIANG JI* and XUEGONG ZHANG The ‘kernel approach’ is applied to modify norm distance metric in Hilbert space, and then nn algorithm becomes kernel nearest-neighbor algorithm. In some specific conditions, such as  polynomial kernel p=1 or radial basis kernel, it degenerates to conventional nearest-neighbor algorithm. By choosing an appropriate kernel function, the results of kernel nearest-neighbor algorithm are better than those of conventional nearest-neighbor algorithm
Scalable Collaborative Filtering with Jointly Derived Neighborhood
Robert M. Bell and Yehuda Koren


The collobarating filtering through neighbourhood based  interpolation weights, which are used to estimate unknown ratings from neighboring known ones. Nevertheless, the literature lacks a rigorous way to derive these weights. This work  showed how the interpolation weights can be computed as a global solution to an optimization problem that precisely reflects their role. Comparison with past kNN methods on the Netflix data, demonstrated a significant improvement of prediction accuracy without a meaningful increase in running time. A kNN method can be most effectively employed in an item-oriented manner, by analyzing relationships between items.
Automated Tag Clustering:
Improving search and exploration in the tag space
Grigory Begelman

Philipp Keller

Frank Smadja

Presented work is convincing evidence that clustering techniques can and should be used in combination with tagging. Clustering can improve the tagging experience and the use of the tagspace in general. They have presented several clustering techniques and provided some results obtained on the .
Autotagging to Improve Text Search for 3D Models Corey Goldfeder

Peter Allen

The demonstration of an automatic tagging system that learns new tags for a 3D model by comparing it to a large set of tagged models and probabilistically propagating tags from neighbors. They shown that autotagging to improve shape retrieval in a digital library, there are several other domains where automatically annotating 3D models can be helpfulthe discriminative power of these tags is comparable to that of the underlying geometric similarity distance, and that searching for models based on our autotags can result in better precision and greater recall than searching on the original tags.
Harvesting Social Knowledge from Folksonomies Harris Wu

Mohammad Zubair

Kurt Maly

Collaborative tagging systems have the potential of becoming a technological infrastructure for harvesting social knowledge. There are many challenges, the designed prototypes that enhance social tagging systems to meet some of the key challenges. they developed a comprehensive evaluation methodology.
Classification-Enhanced Ranking Paul N. Bennett

Krysta Svore

Susan T. Dumais


In this work,demonstrated that topical class information can be used to improve the relevance of retrieval resultsby generalizing along the class dimension to identify other relevant results. In order to do this, they introduced a natural de nition of query class that stems from the results that are relevant for that query and can be estimated using click behavior. Approach is notable for its focus on directly improving ranking relevance rather than indirect measures like query classi cation.
Exploiting Query Reformulations for
Web Search Result Diversification
Rodrygo L. T. Santos

Craig Macdonald

Iadh Ounis


Introduced a novel probabilistic framework for search result diversification. In particular, the xQuAD (eXplicit Query Aspect Diversification) framework explicitly models the aspects underlying an initial query, in the form of sub-queries. Instead of comparing documents to one another—which usually demands expensive computations—our approach achieves an effective diversification performance by directly estimating the relevance
of the retrieved documents to multiple sub-queries. Besides being efficient in practice, the principled formulation of xQuAD naturally models several dimensions of interest in a  diversification task, as components within the framework. These include the relevance of a document to an initial query and its multiple aspects, identified as sub-queries, as well as the relative importance of each sub-query and how novel a document satisfying each sub-query is.
 Personalized Query Expansion for the Web  Paul - Alexandru Chirita
Claudiu S. Firan
Wolfgang Nejdl
Proposed to expand Web search queries by exploiting the user’s Personal Information Repository in order to automatically extract additional keywords related both to the query itself and to user’s interests, personalizing the search output
 Accurate Methods for the Statistics
of Surprise and Coincidence
 Ted Dunning*





Finding Relevant Concepts for Unknown Terms Using a Web-based Approach Chen-Ming Hung1 and Lee-Feng Chien1 Presented a potential approach to finding relevant concepts for terms via utilizing World Wide Web. This approach obtained an encouraging experimental result in testing Yahoo!’s computer science hierarchy. However, the work needs more in-depth study. As what we mentioned previously, choosing the word with the highest weighted log likelihood ratio as the concept of a  clustered group after the Greedy EM algorithm does not provide enough representative. In addition, one concept usually contains many domains, e.g. “ATM” contains security, teller machine, transaction cost, and etc. Thus, distinguishing the extracted keywords into a certain concept still needs human intervention. On the other hand, in order to solve the problem of “too much effort” of the Greedy EM algorithm, we need to modify it with another convergence criterion.  
Improving Term Extraction Using Particle Swarm Optimization Techniques Mohammad Syafrullah and Naomie Salim


Presented a particle swarm optimization technique to improve term extraction precision. they choose five features to represent the term score: domain relevance, domain consensus, term cohesion, first occurrence and length of noun phrase. In the experiments, we use a translation of the meaning of the Quran (focus on verses of prayer) as an input document, both for training and testing phases. separate the documents between training documents and test documents. Particles swarm optimization is trained using the training documents to determine the appropriate weight of each feature to produce the best score for each term. We conduct tests with the test document using the weight of each feature which is generated from the training stage to calculate the final score for each term to be extracted.