Notations

1.2. Notations#

In this section we provide notations used throughout this book.

  • Generic math related symbols

    • Real: \(\RR\); complex: \(\CC\); integer: \(\ZZ\).

    • Scalars: \(a,b,d\ldots\) (\(c\) is reserved for constants)

    • Vectors: \(\va,\vb,\ldots\) and they are \emph{column} vectors. Vector \(\vn\) is reserved for noise.

    • Expectation: \(\EE\)

    • \(\vzero = [0,\cdots, 0]^T\) and \(\vone = [1,\cdots, 1]^T\)

    • \(\sign(x)\in\{-1,0,1\}\) is the sign of \(x\). \(\sign(\vx) = [\sign(x_1),\ldots,\sign(x_N)]^T\)

    • The transpose of a matrix \(\vA\) is \(\vA^T\). The conjugate transpose is \(A^*\).

    • The set of \(N\)-by-\(N\) symmetric matrices: \(\SS^N\); symmetric positive semi-definite: \(\SSp^N\); symmetric positive definite: \(\SSpp^N\)

    • Identity matrix: \(\vI\)

    • Use subscript \(i,j,k\) to denote the indices of matrices and vectors, such as \(\vx_i = \vx[i]\) and \(\vX_i = \vX[i,j]\)

    • Boldfaced greek letters \(\valpha\)

  • Specific notations about optimization

    • learning rate: \(\eta\)

    • weight decay: \(\lambda\)

    • momentum: \(\beta\)

    • \(\vw\) and \(\vW\) denote the weight matrices and optimization variables, to conform with the existing ML / AI convention

    • use \(\ell(\cdot)\) to denote the loss function

    • Let us use superscript \(t\) to denote iteration number; if other superscripts are needed such as square, we use \((\vx^t)^2\)

  • Specific notations about LLM

    • Use \(\pi(\cdot)\) to denote LLM policy

    • Use \(x\) as prompt, and \(y\) as continuation

    • Use \(\mathcal{D}\) to denote dataset