Published November 27, 2024
| Version v1
Publication
ISSA Pipeline
Contributors
Others:
- Web-Instrumented Man-Machine Interactions, Communities and Semantics (WIMMICS) ; Inria Sophia Antipolis - Méditerranée (CRISAM) ; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Scalable and Pervasive softwARe and Knowledge Systems (Laboratoire I3S - SPARKS) ; Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UniCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UniCA)-Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UniCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UniCA)
- Scalable and Pervasive softwARe and Knowledge Systems (Laboratoire I3S - SPARKS) ; Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UniCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UniCA)
- EuroMov - Digital Health in Motion (Euromov DHM) ; IMT - MINES ALES (IMT - MINES ALES) ; Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Montpellier (UM)
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)
- AAP CollEx Persée 21_22
- Inria & Université Cote d'Azur, CNRS, I3S, Sophia Antipolis, France
- Cirad
- Euromov Digital Health in Motion, IMT Mines Alès
Description
The ISSA pipeline was developed by the ISSA project (https://issa.cirad.fr/) . It orchestrates the automatic indexing of a scientific archive by extracting from the articles full-text thematic descriptors and named entities, and linking them with terminological resources in the Semantic Web format.The repository consists of various tools, scripts and configuration files involved in each step of the pipeline:- retrieve the articles metadata from the archive's API;- download and pre-process the PDF files of the articles;- process the output to extract thematic descriptors and named entities;- translate the output of each processing step into a unified, consistent RDF dataset;- retrieve additional metadata from OpenAlex: topics, Sustainable Devlopment Goals (SDG), authorship with institutions- upload the resulting dataset to a triple store equipped with a SPARQL endpoint.
Additional details
Identifiers
- URL
- https://hal.science/hal-04807540
- URN
- urn:oai:HAL:hal-04807540v1
Origin repository
- Origin repository
- UNICA