On p-hyperparameter Learning via Bilevel Nonsmooth Optimization

Abstract

We propose a bilevel optimization strategy for selecting the best hyperparameter value for the nonsmooth p regularizer with 0<p 1. The concerned bilevel optimization problem has a nonsmooth, possibly nonconvex, p-regularized problem as the lower-level problem. Despite the recent popularity of nonconvex p-regularizer and the usefulness of bilevel optimization for selecting hyperparameters, algorithms for such bilevel problems have not been studied because of the difficulty of p-regularizer. Our contribution is the proposal of the first algorithm equipped with a theoretical guarantee for finding the best hyperparameter of p-regularized supervised learning problems. Specifically, we propose a smoothing-type algorithm for the above mentioned bilevel optimization problems and provide a theoretical convergence guarantee for the algorithm. Indeed, since optimality conditions are not known for such bilevel optimization problems so far, new necessary optimality conditions, which are called the SB-KKT conditions, are derived and it is shown that a sequence generated by the proposed algorithm actually accumulates at a point satisfying the SB-KKT conditions under some mild assumptions. The proposed algorithm is simple and scalable as our numerical comparison to Bayesian optimization and grid search indicates.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…