Published November 13, 2019 | Version v1
Conference paper

Cross-Platform Evaluation for Italian Hate Speech Detection

Others:
Web-Instrumented Man-Machine Interactions, Communities and Semantics (WIMMICS) ; Inria Sophia Antipolis - Méditerranée (CRISAM) ; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Scalable and Pervasive softwARe and Knowledge Systems (Laboratoire I3S - SPARKS) ; Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S) ; Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS) ; COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)
Fondazione Bruno Kessler [Trento, Italy] (FBK)
Centre National de la Recherche Scientifique (CNRS)
ANR-19-P3IA-0002,3IA@cote d'azur,3IA Côte d'Azur(2019)

Description

English. Despite the number of approaches recently proposed in NLP for detecting abusive language on social networks , the issue of developing hate speech detection systems that are robust across different platforms is still an unsolved problem. In this paper we perform a comparative evaluation on datasets for hate speech detection in Italian, extracted from four different social media platforms, i.e. Facebook, Twitter, Instagram and What-sApp. We show that combining such platform-dependent datasets to take advantage of training data developed for other platforms is beneficial, although their impact varies depending on the social network under consideration. 1 Italiano. Nonostante si osservi un cre-scente interesse per approcci che identi-fichino il linguaggio offensivo sui social network attraverso l'NLP, la necessità di sviluppare sistemi che mantengano una buona performance anche su piattaforme diverseè ancora un tema di ricerca aper-to. In questo contributo presentiamo una valutazione comparativa su dataset per l'identificazione di linguaggio d'odio pro-venienti da quattro diverse piattaforme: Facebook, Twitter, Instagram and Wha-tsApp. Lo studio dimostra che, combinan-do dataset diversi per aumentare i dati di training, migliora le performance di clas-sificazione, anche se l'impatto varia a se-conda della piattaforma considerata. 1

Abstract

International audience

Additional details

Created:
December 4, 2022
Modified:
December 1, 2023