1.2. Notations#
In this section we provide notations used throughout this book.
Generic math related symbols
Real: \(\RR\); complex: \(\CC\); integer: \(\ZZ\).
Scalars: \(a,b,d\ldots\) (\(c\) is reserved for constants)
Vectors: \(\va,\vb,\ldots\) and they are \emph{column} vectors. Vector \(\vn\) is reserved for noise.
Expectation: \(\EE\)
\(\vzero = [0,\cdots, 0]^T\) and \(\vone = [1,\cdots, 1]^T\)
\(\sign(x)\in\{-1,0,1\}\) is the sign of \(x\). \(\sign(\vx) = [\sign(x_1),\ldots,\sign(x_N)]^T\)
The transpose of a matrix \(\vA\) is \(\vA^T\). The conjugate transpose is \(A^*\).
The set of \(N\)-by-\(N\) symmetric matrices: \(\SS^N\); symmetric positive semi-definite: \(\SSp^N\); symmetric positive definite: \(\SSpp^N\)
Identity matrix: \(\vI\)
Use subscript \(i,j,k\) to denote the indices of matrices and vectors, such as \(\vx_i = \vx[i]\) and \(\vX_i = \vX[i,j]\)
Boldfaced greek letters \(\valpha\)
Specific notations about optimization
learning rate: \(\eta\)
weight decay: \(\lambda\)
momentum: \(\beta\)
\(\vw\) and \(\vW\) denote the weight matrices and optimization variables, to conform with the existing ML / AI convention
use \(\ell(\cdot)\) to denote the loss function
Let us use superscript \(t\) to denote iteration number; if other superscripts are needed such as square, we use \((\vx^t)^2\)
Specific notations about LLM
Use \(\pi(\cdot)\) to denote LLM policy
Use \(x\) as prompt, and \(y\) as continuation
Use \(\mathcal{D}\) to denote dataset