Compiler Tokens


When a C++ compiler analyses a program, it breaks the program into tokens. A token is a distinct unit that is recognisable by the compiler. There are five categories of tokens:

White space includes:


In order to separate keywords, literals and identifiers, some white space is required within a program. White space may also be used to provide documentation in the form of comments and to make the program readable (e.g. by indenting statements).

A C++ compiler takes each token to be the longest sequence of characters that can be used to form that token. Of course, it does this by starting at the beginning of the program and proceeding in a forward direction through the source code.