Published October 16, 2024 | Version v1
Publication

A supervised multiclass framework for mineral classification of Iberian beads

Description

Research on personal adornments depends on the reliable characterisation of materials to trace provenance and model complex social networks. However, many analytical techniques require the transfer of materials from the museum to the laboratory, involving high insurance costs and limiting the number of items that can be analysed, making the process of empirical data collection a complicated, expensive and time-consuming routine. In this study, we compiled the largest geochemical dataset of Iberian personal adornments (n = 1243 samples) by coupling X-ray fluorescence compositional data with their respective X-ray diffraction mineral labels. This allowed us to develop a machine learning-based framework for the prediction of bead-forming minerals by training and benchmarking 13 of the most widely used supervised algorithms. As a proof of concept, we developed a multiclass model and evaluated its performance on two assemblages from different Portuguese sites with current mineralogical characterisation: Cova das Lapas (n = 15 samples) and Gruta da Marmota (n = 10 samples). Our results showed that decisión-tres based classifiers outperformed other classification logics given the discriminative importance of some chemical elements in determining the mineral phase, which fits particularly well with the decision-making process of this type of model. The comparison of results between the different validation sets and the proof-of-concept has highlighted the risk of using synthetic data to handle imbalance and the main limitation of the framework: its restrictive class system. We conclude that the presented approach can successfully assist in the mineral classification workflow when specific analyses are not available, saving time and allowing a transparent and straightforward assessment of model predictions. Furthermore, we propose a workflow for the interpretation of predictions using the model outputs as compound responses enabling an uncertainty reduction approach currently used by our team. The Python-based framework is packaged in a public repository and includes all the necessary resources for its reusability without the need for any installation.

Abstract

Fundação para a Ciência e a Tecnologia UI/BD/154365/2023, UIDB/00698/2020, UIDP/00698/2020

Abstract

Ministerio de Ciencia y Tecnología PID2021-124421NB-I00

Additional details

Created:
October 17, 2024
Modified:
October 17, 2024