Notations

1.2. Notations#

In this section we provide notations used throughout this book.

Generic math related symbols
- Real: \(\RR\); complex: \(\CC\); integer: \(\ZZ\).
- Scalars: \(a,b,d\ldots\) (\(c\) is reserved for constants)
- Vectors: \(\va,\vb,\ldots\) and they are \emph{column} vectors. Vector \(\vn\) is reserved for noise.
- Expectation: \(\EE\)
- \(\vzero = [0,\cdots, 0]^T\) and \(\vone = [1,\cdots, 1]^T\)
- \(\sign(x)\in\{-1,0,1\}\) is the sign of \(x\). \(\sign(\vx) = [\sign(x_1),\ldots,\sign(x_N)]^T\)
- The transpose of a matrix \(\vA\) is \(\vA^T\). The conjugate transpose is \(A^*\).
- The set of \(N\)-by-\(N\) symmetric matrices: \(\SS^N\); symmetric positive semi-definite: \(\SSp^N\); symmetric positive definite: \(\SSpp^N\)
- Identity matrix: \(\vI\)
- Use subscript \(i,j,k\) to denote the indices of matrices and vectors, such as \(\vx_i = \vx[i]\) and \(\vX_i = \vX[i,j]\)
- Boldfaced greek letters \(\valpha\)
Specific notations about optimization
- learning rate: \(\eta\)
- weight decay: \(\lambda\)
- momentum: \(\beta\)
- \(\vw\) and \(\vW\) denote the weight matrices and optimization variables, to conform with the existing ML / AI convention
- use \(\ell(\cdot)\) to denote the loss function
- Let us use superscript \(t\) to denote iteration number; if other superscripts are needed such as square, we use \((\vx^t)^2\)
Specific notations about LLM
- Use \(\pi(\cdot)\) to denote LLM policy
- Use \(x\) as prompt, and \(y\) as continuation
- Use \(\mathcal{D}\) to denote dataset