Specification & Recognition of Tokens,

SPECIFICATION AND RECOGNITION OF TOKENS

First know about Lexical Analysis:

  1. The lexical analyzer breaks syntaxes into a series of tokens, by removing any whitespace or comments in the source code.
  2. If the lexical analyzer finds a token invalid, it generates an error. It reads character streams from the source code, checks for legal tokens, and passes the data to the syntax analyzer when it demands.

What is Token ?

In programming language, keywords, constants, identifiers, strings, numbers, operators and punctuations symbols can be considered as tokens.
For example, in C language, the variable declaration line
int value = 100;
contains the tokens:
int (keyword), value (identifier), = (operator), 100 (constant) and ; (symbol).

Lexeme

Token

=

EQUAL_OP

*

MULT_OP

,

COMMA

(

LEFT_PAREN

Specifications of Tokens:

Let us understand how the language theory undertakes the following terms:
  1. Alphabets
  2. Strings
  3. Special symbols
  4. Language
  5. Longest match rule
  6. Operations
  7. Notations
  8. Representing valid tokens of a language in regular expression
  9. Finite automata
1. Alphabets: Any finite set of symbols 
  • {0,1} is a set of binary alphabets, 
  • {0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F} is a set of Hexadecimal alphabets, 
  • {a-z, A-Z} is a set of English language alphabets.
2. Strings: Any finite sequence of alphabets is called a string.
3. Special symbols: A typical high-level language contains the following symbols:

Arithmetic Symbols

Addition(+), Subtraction(-), Multiplication(*), Division(/)

Punctuation

Comma(,), Semicolon(;), Dot(.)

Assignment

=

Special assignment

+=, -=, *=, /=

Comparison

==, !=. <. <=. >, >=

Preprocessor

#

4. Language: A language is considered as a finite set of strings over some finite set of alphabets.
5. Longest match rule: When the lexical analyzer read the source-code, it scans the code letter by letter and when it encounters a whitespace, operator symbol, or special symbols it decides that a word is completed.
6. Operations: The various operations on languages are:
  1. Union of two languages L and M is written as, L U M {s | s is in L or s is in M}
  2. Concatenation of two languages L and M is written as, LM {st | s is in L and t is in M}
  3.  The Kleene Closure of a language L is written as, L* = Zero or more occurrence of language L.
7. Notations: If r and s are regular expressions denoting the languages L(rand L(s), then
  1. Union : L(r)UL(s)
  2. Concatenation L(r)L(s)
  3. Kleene closure : (L(r))*
8. Representing valid tokens of a language in regular expression:
If x is a regular expression, then:
  • x* means zero or more occurrence of x.
  • xmeans one or more occurrence of x.
9. Finite automataFinite automata is a state machine that takes a string of symbols as input and changes its state accordingly.
If the input string is successfully processed and the automata reaches its final state, it is accepted.
The mathematical model of finite automata consists of:
  • Finite set of states (Q)
  • Finite set of input symbols (Σ)
  • One Start state (q0)
  • Set of final states (qf)
  • Transition function (δ)
The transition function (δmaps the finite set of state (Qto a finite set of input symbols (Σ), Q × Σ  Q

Read more topics in Compiler Design

Compiler Design
Python Programming ↓ 👆
Java Programming ↓ 👆
JAVA EasyExamNotes.com covered following topics in these notes.
JAVA Programs
Principles of Programming Languages ↓ 👆
Principles of Programming Languages
EasyExamNotes.com covered following topics in these notes.

Practicals:
Previous years solved papers:
A list of Video lectures References:
  1. Sebesta,”Concept of programming Language”, Pearson Edu 
  2. Louden, “Programming Languages: Principles & Practices” , Cengage Learning 
  3. Tucker, “Programming Languages: Principles and paradigms “, Tata McGraw –Hill. 
  4. E Horowitz, "Programming Languages", 2nd Edition, Addison Wesley

    Computer Organization and Architecture ↓ 👆

    Computer Organization and Architecture 

    EasyExamNotes.com covered following topics in these notes.

    1. Structure of desktop computers
    2. Logic gates
    3. Register organization
    4. Bus structure
    5. Addressing modes
    6. Register transfer language
    7. Direct mapping numericals
    8. Register in Assembly Language Programming
    9. Arrays in Assembly Language Programming

    References:

    1. William stalling ,“Computer Architecture and Organization” PHI
    2. Morris Mano , “Computer System Organization ”PHI

    Computer Network ↓ 👆
    Computer Network

    EasyExamNotes.com covered following topics in these notes.
    1. Data Link Layer
    2. Framing
    3. Byte count framing method
    4. Flag bytes with byte stuffing framing method
    5. Flag bits with bit stuffing framing method
    6. Physical layer coding violations framing method
    7. Error control in data link layer
    8. Stop and Wait scheme
    9. Sliding Window Protocol
    10. One bit sliding window protocol
    11. A protocol Using Go-Back-N
    12. Selective repeat protocol
    13. Application layer
    References:
    1. Andrew S. Tanenbaum, David J. Wetherall, “Computer Networks” Pearson Education.
    2. Douglas E Comer, “Internetworking with TCP/IP Principles, Protocols, And Architecture",Pearson Education
    3. KavehPahlavan, Prashant Krishnamurthy, “Networking Fundamentals”, Wiley Publication.
    4. Ying-Dar Lin, Ren-Hung Hwang, Fred Baker, “Computer Networks: An Open Source Approach”, McGraw Hill.