In computer programming languages, a token is a small number that replaces a longer word (usually a keyword) from the source code.

Many early versions of BASIC tokenized their source code as you typed it in. Since there is a one to one correspondence between tokens and keywords, it is easy to list the program by simply emitting the words represented by the tokens. Many interpreted languages tokenize their source code before interpreting it.

While tokens can be a form of bytecode, they are distinguished from byte code in that tokens directly represent elements of the syntax, where bytecode is more like machine language, and may be optimized and typically does not contain the original symbol names and other language syntax, thus making it difficult to reverse back into source code.

Also, compilers use tokens in their early stages. For example, the first pass of a compiler might be written in lex (or flex), which generates a lexical analyzer. The lexical analyzer recognizes simple units of syntax (including strings and numbers) and passes back a token which is then fed to the FSM that parses the grammar. In this application, the token may also have a value associated with it. (e.g., token: NUMBER value: 42; token: STRING value: "stuff in quotes")