Stochastic Learning of Semiparametric Monotone Index Models with Large Sample Size

Abstract

I study the estimation of semiparametric monotone index models in the scenario where the number of observation points n is extremely large and conventional approaches fail to work due to heavy computational burdens. Motivated by the mini-batch gradient descent algorithm (MBGD) that is widely used as a stochastic optimization tool in the machine learning field, I proposes a novel subsample- and iteration-based estimation procedure. In particular, starting from any initial guess of the true parameter, I progressively update the parameter using a sequence of subsamples randomly drawn from the data set whose sample size is much smaller than n. The update is based on the gradient of some well-chosen loss function, where the nonparametric component is replaced with its Nadaraya-Watson kernel estimator based on subsamples. My proposed algorithm essentially generalizes MBGD algorithm to the semiparametric setup. Compared with full-sample-based method, the new method reduces the computational time by roughly n times if the subsample size and the kernel function are chosen properly, so can be easily applied when the sample size n is large. Moreover, I show that if I further conduct averages across the estimators produced during iterations, the difference between the average estimator and full-sample-based estimator will be 1/n-trivial. Consequently, the average estimator is 1/n-consistent and asymptotically normally distributed. In other words, the new estimator substantially improves the computational speed, while at the same time maintains the estimation accuracy.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…