The importance of sharing patient-generated clinical speech and language data

Fraser, Kathleen; Linz, Nicklas; Lindsay, Hali; König, Alexandra

Published June 6, 2019 | Version v1

Conference paper Metadata-only

The importance of sharing patient-generated clinical speech and language data

Contributors

Others:

National Research Council of Canada (NRC)
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH = German Research Center for Artificial Intelligence (DFKI)
Spatio-Temporal Activity Recognition Systems (STARS) ; Inria Sophia Antipolis - Méditerranée (CRISAM) ; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)

Increased access to large datasets has driven progress in NLP. However, most computational studies of clinically-validated, patient-generated speech and language involve very few datapoints, as such data are difficult (and expensive) to collect. In this position paper, we argue that we must find ways to promote data sharing across research groups, in order to build datasets of a more appropriate size for NLP and machine learning analysis. We review the benefits and challenges of sharing clinical language data, and suggest several concrete actions by both clinical and NLP researchers to encourage multi-site and multidisciplinary data sharing. We also propose the creation of a collaborative data sharing platform , to allow NLP researchers to take a more active responsibility for data transcription, annotation , and curation.

Abstract

International audience

Additional details

URL: https://hal.archives-ouvertes.fr/hal-02339141
URN: urn:oai:HAL:hal-02339141v1

Origin repository: UNICA

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes

The importance of sharing patient-generated clinical speech and language data

Creators

Contributors

Others:

Description

Abstract

Additional details

Identifiers

Origin repository