Contextual weighted representations and indexing models for the retrieval of HTML documents

R. A. Marques Pereira
A. Molinari
G. Pasi
Soft Computing - A Fusion of Foundations, Methodologies and Applications

The diffusion of the World Wide Web (WWW) and the consequent increase in the production and exchange of textual information demand the development of effective information retrieval systems. The HyperText Markup Language (HTML) constitues a common basis for generating documents over the internet and the intranets. By means of the HTML the author is allowed to organize the text into subparts delimited by special tags; these subparts are then visualized by the HTML browser in distinct ways, i.e. with distinct typographical formats.

