Trees and after: The concept of text topology
- Others:
- BCL, équipe Logométrie : corpus, traitements, modèles ; Bases, Corpus, Langage (UMR 7320 - UCA / CNRS) (BCL) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)
- BCL, équipe Linguistique de l'énonciation : les connecteurs concessifs [jusqu'en 2007] ; Bases, Corpus, Langage (UMR 6039 - UNS / CNRS) (BCL ) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)
- Laboratoire d'Analyse Statistique des Langues Anciennes (LASLA) ; Université de Liège
Description
The model described here relies on the key concepts of topology, i.e. neighbourhood and equivalence of shape. A linguistic object L is studied in text T by means of one or several local questions Q. The set of successive local answers is processed so as to provide a global function characterizing the textual space under scrutiny. We begin with short sequences of tenses to illustrate the way in which to explore originally Emile Benveniste's concepts of history and discourse . We then supply life-size examples of other objects selected for their heuristic value. We go on to demonstrate the model at work on the distribution of strings of finite (F) and non-finite (n) verbal forms in the LOB Corpus of English. A topological chart is produced as the synthetic image mirroring the locations of the relevant linguistic entities throughout the text. All the individual strings concatenating any number of F and n are classified in a table. Alternatively, individual full-text strings can be extracted. We then proceed to refine the notion of lexical distribution in "rafales" in a lemmatized corpus of Latin texts, the purpose being to test the stability of the distributions in individual texts of selected verbs and assess whether a verb's behaviour is related to its semantic status. The final section is devoted to other Latin texts. The use of segments of equal length makes it possible to draw up the narrative profile of each author as revealed by his handling of tenses in main clauses.
Abstract
International audience
Additional details
- URL
- https://hal.archives-ouvertes.fr/hal-00555349
- URN
- urn:oai:HAL:hal-00555349v1
- Origin repository
- UNICA