Random Forests (RF) of tree classifiers are a popular ensemble method for classification. RF are usually preferred with respect to other classification techniques because of their limited hyperparameter sensitivity, high numerical robustness, native capacity of dealing with numerical and categorical features, and effectiveness in many real...
-
2017 (v1)PublicationUploaded on: April 14, 2023
-
2019 (v1)Publication
In the current big data era, naive implementations of well-known learning algorithms cannot efficiently and effectively deal with large datasets. Random forests (RFs) are a popular ensemble-based method for classification. RFs have been shown to be effective in many different real-world classification problems and are commonly considered one of...
Uploaded on: March 27, 2023 -
2016 (v1)Publication
We present NG-DBSCAN, an approximate density-based clustering algorithm that operates on arbitrary data and any symmetric distance measure. The distributed design of our algorithm makes it scalable to very large datasets; its approximate nature makes it fast, yet capable of producing high quality clustering results. We provide a detailed...
Uploaded on: March 27, 2023 -
2019 (v1)Publication
We investigate the problem of analyzing the train movements in Large-Scale Railway Networks for the purpose of understanding and predicting their behaviour. We focus on different important aspects: the Running Time of a train between two stations, the Dwell Time of a train in a station, the Train Delay, and the Penalty Costs associated to a...
Uploaded on: March 27, 2023 -
2015 (v1)Publication
Clustering items using textual features is an important problem with many applications, such as root-cause analysis of spam campaigns, as well as identifying common topics in social media. Due to the sheer size of such data, algorithmic scalability becomes a major concern. In this work, we present our approach for text clustering that builds an...
Uploaded on: March 27, 2023 -
2019 (v1)Publication
We investigate the problem of analyzing the train movements in Large-Scale Railway Networks for the purpose of understanding and predicting their behaviour. We focus on different important aspects: the Running Time of a train between two stations, the Dwell Time of a train in a station, the Train Delay, and the Penalty Costs associated to a...
Uploaded on: February 14, 2024 -
2020 (v1)Publication
We investigate the problem of analysing the train movements in large-scale railway networks for the purpose of understanding and predicting their behaviour. We focus on different important aspects: the Running Time of a train between two stations, the Dwell Time of a train in a station, the Train Delay, the Penalty Costs associated to a delay,...
Uploaded on: April 14, 2023 -
2016 (v1)Publication
Statistical authorities promote and safeguard the production and publication of official statistics that serve the public good. One of their duties is to monitor the presence of individuals region by region. Traditionally this activity has been conducted by means of censuses and surveys. Nowadays technologies open new possibilities such as a...
Uploaded on: April 14, 2023 -
2017 (v1)Publication
Mobile phones have an unprecedented rate of penetration across the world. Such devices produce a large amount of data that have been used on different domains. In this work, we make use of mobile calls to monitor the presence of individuals region by region. Traditionally, this activity has been conducted by means of censuses and surveys....
Uploaded on: April 14, 2023