Solving Regularized Exp, Cosh and Sinh Regression Problems

Abstract

In modern machine learning, attention computation is a fundamental task for training large language models such as Transformer, GPT-4 and ChatGPT. In this work, we study exponential regression problem which is inspired by the softmax/exp unit in the attention mechanism in large language models. The standard exponential regression is non-convex. We study the regularization version of exponential regression problem which is a convex problem. We use approximate newton method to solve in input sparsity time. Formally, in this problem, one is given matrix A ∈ Rn × d, b ∈ Rn, w ∈ Rn and any of functions , and denoted as f. The goal is to find the optimal x that minimize 0.5 \| f(Ax) - b \|22 + 0.5 \| diag(w) A x \|22. The straightforward method is to use the naive Newton's method. Let nnz(A) denote the number of non-zeros entries in matrix A. Let ω denote the exponent of matrix multiplication. Currently, ω ≈ 2.373. Let ε denote the accuracy error. In this paper, we make use of the input sparsity and purpose an algorithm that use ( \|x0 - x*\|2 / ε) iterations and O(nnz(A) + dω ) per iteration time to solve the problem.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…