Arbitrage-free Data Pricing
Abstract
Driven by the rising value of data in applications such as advertising, finance, and machine learning, markets for data products have become increasingly important. Data markets mainly sell two kinds of products: datasets and machine learning models. Since these products can be replicated at negligible marginal cost, sellers naturally version them through query access and noisy model releases. Versioning immediately raises an arbitrage problem: a buyer may combine cheaper purchases and recover a more informative product at a lower total price. Existing work on query and model pricing studies arbitrage-freeness when buyer values are treated as exogenous, whereas the literature on selling information derives value from the buyer's decision problem but ignores arbitrage-freeness. Accordingly, we study the seller's optimal data pricing problem where buyers value data through Bayesian decision making and we impose arbitrage-freeness constraints. We first interpret query and model pricing as special cases of information pricing, and formulate the general arbitrage-free information selling problem, show the computational hardness and give a branch-and-bound algorithm based on McCormick relaxations. We then consider threshold utilities where buyers have a positive value if and only if the experiment is sufficiently informative. Under this condition, we find that the arbitrage-freeness can be characterized by Blackwell dominance, which in turn unifies the arbitrage-free conditions for query pricing deep2017design and model pricing chen2019towards. Finally, we characterize the revenue-maximizing pricing under restricted query and model menus.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.