Published July 28, 2023 | Version v1
Publication

Labeled HTTP requests dataset: Dataset Biblio-US17

Description

The dataset is organized in a tree structure (subdirectories) each containing different types of files or sets. As provided, 5 sets of files and two partitioning schemes are considered. The partition files are not directly provided but can be generated from the files using the provided script. The following sets of files (subdirs) are included: - RAW files: Initial registers (obtained after preprocessing and anonymization of real captured files). - LABEL files: Labels assigned during analysis. - CLEAN files: Registers considered as clean after sanitization. This is the full dataset to be used as normal traffic. - SID files: Information about SIDs triggered by used SIDS tools. - ATTACK files: Registers classified as attack (only LVL1 -indubituous- attacks). Registers in each set are organized in daily bins (files) named as biblio-2017-- ., being the number of the month, the day and an extension related to the type of content: - .raw for RAW files - .lbl for LBL files - .cl for CLEAN files - .sid for SID files - .att for ATTACK files

Abstract

This dataset contains a set of anonymized and labeled HTTP requests (selected fields) from the logs of a real-in-production web server at the library of the University of Seville during 6.5 months in 2017. The dataset has been sanitized using a supervised methodology as proposed in: - Díaz-Verdejo, Jesús E.; Estepa, Antonio; Estepa, Rafael; Madinabeitia, German; Muñoz-Calle, Javier, "A methodology for conducting efficient sanitization of HTTP training datasets", Future Generation Computer Systems, vol. 109, pp. 67–82, 2020. https://doi.org/10.1016/j.future.2020.03.033.

Abstract

v.1

Additional details

Created:
October 11, 2023
Modified:
November 30, 2023