Labeled HTTP requests dataset: Dataset Biblio-US17
- Others:
- Universidad de Sevilla. Departamento de Ingeniería Telemática
- Universidad de Sevilla. TIC154: Departamento de Ingeniería Telemática
- European Commission (EC). Fondo Europeo de Desarrollo Regional (FEDER)
- Ministerio de Ciencia e Innovación
- Junta de Andalucía (Consejería de Transformación Económica, Industria, Conocimiento y Universidades)
- Universidad de Sevilla
- Muñoz Calle, Francisco Javier
- Estepa Alonso, Rafael María
- Díaz Verdejo, Jesús
- Muñoz Calle, Francisco Javier
Description
The dataset is organized in a tree structure (subdirectories) each containing different types of files or sets. As provided, 5 sets of files and two partitioning schemes are considered. The partition files are not directly provided but can be generated from the files using the provided script. The following sets of files (subdirs) are included: - RAW files: Initial registers (obtained after preprocessing and anonymization of real captured files). - LABEL files: Labels assigned during analysis. - CLEAN files: Registers considered as clean after sanitization. This is the full dataset to be used as normal traffic. - SID files: Information about SIDs triggered by used SIDS tools. - ATTACK files: Registers classified as attack (only LVL1 -indubituous- attacks). Registers in each set are organized in daily bins (files) named as biblio-2017-- ., being the number of the month, the day and an extension related to the type of content: - .raw for RAW files - .lbl for LBL files - .cl for CLEAN files - .sid for SID files - .att for ATTACK files
Abstract
This dataset contains a set of anonymized and labeled HTTP requests (selected fields) from the logs of a real-in-production web server at the library of the University of Seville during 6.5 months in 2017. The dataset has been sanitized using a supervised methodology as proposed in: - Díaz-Verdejo, Jesús E.; Estepa, Antonio; Estepa, Rafael; Madinabeitia, German; Muñoz-Calle, Javier, "A methodology for conducting efficient sanitization of HTTP training datasets", Future Generation Computer Systems, vol. 109, pp. 67–82, 2020. https://doi.org/10.1016/j.future.2020.03.033.
Abstract
v.1
Additional details
- URL
- https://idus.us.es/handle//11441/148254
- URN
- urn:oai:idus.us.es:11441/148254
- Origin repository
- USE