Published 2023 | Version v1
Publication

Supporting a .csv-based Workflow in MongoDB for Data Analysts

Description

The use of .csv files is very widespread, because of the simplicity of its tabular format and the support by popular editing tools. We propose a novel workflow for enhancing integration of such files with MongoDB storage, and investigate its applicability over a representative sample from the data. world collection. Compared to mongoimport (which is the MongoDB command-line file backup tool), our solution has much higher latency times, but automatizes the data type check and offers users two main degrees of flexibility, that are particularly useful in application development and deployment: possibility of spotting and rejecting duplicate records and possibility of rejecting single rows, instead of whole files in case of errors. Moreover, the reliance on the Measurify IoT application framework allows users to create application-relevant resources by simply enhancing .csv with semantics, while still providing a transparent end-to-end .csv file storage workflow.

Additional details

Created:
February 4, 2024
Modified:
February 4, 2024