Wikipedia is a useful source of knowledge that has many applications in language processing and knowledge representation. The Wikipedia category graph can be compared with the class hierarchy in an ontology; it has some characteristics in common as well as some differences. In this paper, we present our approach for answering entity ranking...
-
December 10, 2007 (v1)Conference paperUploaded on: April 5, 2025
-
April 2008 (v1)Conference paper
Information retrieval from web and XML document collections is ever more focused on returning entities instead of web pages or XML elements. There are many research fields involving named entities; one such field is known as entity ranking, where one goal is to rank entities in response to a query supported with a short list of entity examples....
Uploaded on: April 5, 2025 -
2007 (v1)Report
The traditional entity extraction problem lies in the ability of extracting named entities from plain text using natural language processing techniques and intensive training from large document collections. Examples of named entities include organisations, people, locations, or dates. There are many research activities involving named...
Uploaded on: April 5, 2025 -
December 2006 (v1)Conference paper
This article is a report concerning the two years of the XML Mining track at INEX (2005 and 2006). We focus here on the classification and clustering XML documents. We detail these two tasks and the corpus used for this challenge and then present a summary of the different methods proposed by the participants. We last compare the results...
Uploaded on: April 5, 2025 -
March 2008 (v1)Conference paper
The traditional entity extraction problem lies in the ability of extracting named entities from plain text using natural language processing techniques and intensive training from large document collections. Examples of named entities include organisations, people, locations, or dates. There are many research activities involving named...
Uploaded on: April 5, 2025 -
January 23, 2007 (v1)Conference paper
The goal of our work is to use a set of reports and extract named entities, in our case the names of Industrial or Academic partners. Starting with an initial list of entities, we use a first set of documents to identify syntactic patterns that are then validated in a supervised learning phase on a set of annotated documents. The complete...
Uploaded on: April 5, 2025 -
November 2005 (v1)Conference paper
This paper reports on the INRIA group's approach to XML mining while participating in the INEX XML Mining track 2005. We use a flexible representation of XML documents that allows taking into account the structure only or both the structure and content. Our approach consists of representing XML documents by a set of their sub-paths, defined...
Uploaded on: April 5, 2025 -
January 2005 (v1)Conference paper
This paper presents some experiments in clustering homogeneous XMLdocuments to validate an existing classification or more generally anorganisational structure. Our approach integrates techniques for extracting knowledge from documents with unsupervised classification (clustering) of documents. We focus on the feature selection used for...
Uploaded on: April 5, 2025 -
January 16, 2006 (v1)Conference paper
In this work, we propose a new clustering document representation for semi-structured documents collections. Our approach consists on a representation of XML documents based on their sub-paths, defined according to some criteria (length, root beginning, leaf ending) using the structure only or both the structure and the content. By considering...
Uploaded on: April 5, 2025 -
June 29, 2005 (v1)Conference paper
This paper presents some experiments in clustering homogeneous XMLdocuments to validate an existing classification or more generally anorganisational structure. Our approach integrates techniques for extracting knowledge from documents with unsupervised classification (clustering) of documents. We focus on the feature selection used for...
Uploaded on: April 5, 2025 -
2007 (v1)Book section
XML documents are becoming ubiquitous because of their rich and flexible format that can be used for a variety of applications. Giving the increasing size of XML collections as information sources, mining techniques that traditionally exist for text collections or databases need to be adapted and new methods to be invented to exploit the...
Uploaded on: April 5, 2025