alphaXiv

History

Papers Benchmarks

CentraleSup\u00b4elec

10 Dec 2025

computer-science computational-geometry multiagent-systems

On Mobile Ad Hoc Networks for Coverage of Partially Observable Worlds

CNRS

McGill University CentraleSupélec University Paris-Saclay Mila Institute L2S CentraleSup\u00b4elec

This paper addresses the movement and placement of mobile agents to establish a communication network in initially unknown environments. We cast the problem in a computational-geometric framework by relating the coverage problem and line-of-sight constraints to the Cooperative Guard Art Gallery Problem, and introduce its partially observable variant, the Partially Observable Cooperative Guard Art Gallery Problem (POCGAGP). We then present two algorithms that solve POCGAGP: CADENCE, a centralized planner that incrementally selects 270 degree corners at which to deploy agents, and DADENCE, a decentralized scheme that coordinates agents using local information and lightweight messaging. Both approaches operate under partial observability and target simultaneous coverage and connectivity. We evaluate the methods in simulation across 1,500 test cases of varied size and structure, demonstrating consistent success in forming connected networks while covering and exploring unknown space. These results highlight the value of geometric abstractions for communication-driven exploration and show that decentralized policies are competitive with centralized performance while retaining scalability.

24 Jan 2020

agent-based-systems computer-science artificial-intelligence

Modified Actor-Critics

CNRS

Google Research Loria CentraleSupelec Universite´ de Lorraine Universit e de Lorraine Austrian Institute of Technologies University of Applied Science FH Joanneum Centrale Sup\u00b4elec Universit\u00b4e de Lorraine CentraleSup\u00b4elec CentraleSup P elec Universit P e de Lorraine

Recent successful deep reinforcement learning algorithms, such as Trust Region Policy Optimization (TRPO) or Proximal Policy Optimization (PPO), are fundamentally variations of conservative policy iteration (CPI). These algorithms iterate policy evaluation followed by a softened policy improvement step. As so, they are naturally on-policy. In this paper, we propose to combine (any kind of) soft greediness with Modified Policy Iteration (MPI). The proposed abstract framework applies repeatedly: (i) a partial policy evaluation step that allows off-policy learning and (ii) any softened greedy step. Our contribution can be seen as a new generic tool for the deep reinforcement learning toolbox. As a proof of concept, we instantiate this framework with the PPO greediness. Comparison to the original PPO shows that our algorithm is much more sample efficient. We also show that it is competitive with the state-of-art off-policy algorithm Soft Actor Critic (SAC).

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

On Mobile Ad Hoc Networks for Coverage of Partially Observable Worlds

Modified Actor-Critics

Events

AI for Law

Personalize Your Feed