Intelligent buses in a loop service: Emergence of no-boarding and holding strategies

Abstract

We study how N intelligent buses serving a loop of M bus stops learn a no-boarding strategy and a holding strategy by reinforcement learning. The high level no-boarding and holding strategies emerge from the low level actions of stay or leave when a bus is at a bus stop and everyone who wishes to alight has done so. A reward that encourages the buses to strive towards a staggered phase difference amongst them whilst picking up people allows the reinforcement learning process to converge to an optimal Q-table within a reasonable amount of simulation time. It is remarkable that this emergent behaviour of intelligent buses turns out to minimise the average waiting time of commuters, in various setups where buses have identical natural frequency, or different natural frequencies during busy as well as lull periods. Cooperative actions are also observed, e.g. the buses learn to unbunch.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…