Utility Maximizing Sequential Sensing Over a Finite Horizon

Abstract

We consider the problem of optimally utilizing N resources, each in an unknown binary state. The state of each resource can be inferred from state-dependent noisy measurements. Depending on its state, utilizing a resource results in either a reward or a penalty per unit time. The objective is a sequential strategy governing the decision of sensing and exploitation at each time to maximize the expected utility (i.e., total reward minus total penalty and sensing cost) over a finite horizon L. We formulate the problem as a Partially Observable Markov Decision Process (POMDP) and show that the optimal strategy is based on two time-varying thresholds for each resource and an optimal selection rule for which resource to sense. Since a full characterization of the optimal strategy is generally intractable, we develop a low-complexity policy that is shown by simulations to offer near optimal performance. This problem finds applications in opportunistic spectrum access, marketing strategies and other sequential resource allocation problems.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…