A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage the structure of the Whittle index policy to reduce the search space of Q-learning, resulting in major computational gains. Rigorous convergence analysis is...
-
May 2022 (v1)Journal articleUploaded on: December 3, 2022
-
June 2019 (v1)Journal article
We consider a novel model of stochastic replicator dynamics for potential games that converts to a Langevin equation on a sphere after a change of variables. This is distinct from the models of stochastic replicator dynamics studied earlier. In particular, it is ill-posed due to non-uniqueness of solutions, but is amenable to a natural...
Uploaded on: December 4, 2022 -
March 2018 (v1)Journal article
We consider the task of scheduling a crawler to retrieve from several sites their ephemeral content. This is content, such as news or posts at social network groups, for which a user typically loses interest after some days or hours. Thus development of a timely crawling policy for ephemeral information sources is very important. We first...
Uploaded on: December 4, 2022 -
September 24, 2019 (v1)Conference paper
We revisit the Whittle index policy for scheduling web crawlers for ephemeral content proposed in Avrachenkov and Borkar, IEEE Trans. Control of Network Systems 5(1), 2016, and develop a reinforcement learning scheme for it based on LSPE(0). The scheme leverages the known structural properties of the Whittle index policy.
Uploaded on: December 4, 2022 -
December 12, 2017 (v1)Conference paper
We consider the Generalized Additive Increase Multiplicative Decrease (G-AIMD) dynamics for resource allocation with alpha fairness utility function. This dynamics has a number of important applications such as internet congestion control, charging electric vehicles, and smart grids. We prove indexability for the special case of MIMD model and...
Uploaded on: March 25, 2023 -
December 20, 2021 (v1)Conference paper
A search engine uses a web crawler to crawl the pages from the world wide web (WWW) and aims to maintain its local cache as fresh as possible. Unfortunately, the rates at which different pages change in WWW are highly nonuniform and also, unknown in many real-life scenarios. In addition, the finite available bandwidth and possible server...
Uploaded on: December 3, 2022 -
June 14, 2023 (v1)Conference paper
We extend the provably convergent Full Gradient DQN algorithm for discounted reward Markov decision processes from Avrachenkov et al. (2021) to average reward problems. We experimentally compare widely used RVI Q-learning with recently proposed Differential Q-learning in the neural function approximation setting with Full Gradient DQN and DQN....
Uploaded on: January 6, 2024 -
January 31, 2023 (v1)Publication
No description
Uploaded on: January 6, 2024 -
September 19, 2022 (v1)Conference paper
The Whittle index policy is a heuristic that has shown remarkable good performance (with guaranted asymptotic optimality) when applied to the class of problems known as Restless Multi-Armed Bandit Problems (RMABP). In this paper we present QWI and QWINN, two algorithms capable of learning the Whittle indices for the total discounted criterion....
Uploaded on: December 4, 2022 -
March 19, 2018 (v1)Journal article
Background: In the framework of network sampling, random walk (RW) based estimation techniques provide many pragmatic solutions while uncovering the unknown network as little as possible. Despite several theoretical advances in this area, RW based sampling techniques usually make a strong assumption that the samples are in stationary regime,...
Uploaded on: December 4, 2022 -
June 5, 2021 (v1)Book section
We analyze the DQN reinforcement learning algorithm as a stochastic approximation scheme using the o.d.e. (for 'ordinary differential equation') approach and point out certain theoretical issues. We then propose a modified scheme called Full Gradient DQN (FG-DQN, for short) that has a sound theoretical basis and compare it with the original...
Uploaded on: December 3, 2022 -
September 19, 2022 (v1)Publication
- Whittle index policy is an asymptotically optimal heuristic for solving Restless Multi-Armed Bandit Problems (RMBAP). - We propose two algorithms, QWI and QWINN, for the computation of such indices. - Both employ a two timescale system for the computation of the indices and the Q-values of each state/action.
Uploaded on: December 3, 2022 -
June 14, 2021 (v1)Conference paper
The Whittle index policy is a heuristic that has shown remarkable good performance (with guaranted asymptotic optimality) when applied to the class of problems known as multi-armed restless bandits. In this paper we develop QWI, an algorithm based on Q-learning in order to learn the Whittle indices. The key feature is the deployment of two...
Uploaded on: December 3, 2022 -
September 19, 2022 (v1)Conference paper
The Whittle index policy is a heuristic that has shown remarkable good performance (with guaranted asymptotic optimality) when applied to the class of problems known as Restless Multi-Armed Bandit Problems (RMABP). In this paper we present QWI and QWINN, two algorithms capable of learning the Whittle indices for the total discounted criterion....
Uploaded on: February 22, 2023 -
September 21, 2021 (v1)Journal article
We introduce a model of graph-constrained dynamic choice with reinforcement modeled by positively $\alpha$-homogeneous rewards. We show that its empirical process, which can be written as a stochastic approximation recursion with Markov noise, has the same probability law as a certain vertex reinforced random walk. We use this equivalence to...
Uploaded on: December 4, 2022