This article explain the techniques to read and evaluate the result of an arithmetic expression string. The tokenizer, expression validator (both syntax validation and data-type validation) and expression evaluator are implemented in C# and Silverlight and source code is available for download below.
Expression Evaluation Demo in Silverlight
The techniques of expression-evaluation are illustrated in the following demo. The source code of the demo can be downloaded below. The input expression accepts variable names, numbers and operators.
This demo doesn't support unary operators and functions, as of now.
Expression Evaluation Process
Evaluating a free-text expression string is a three-step process:
- Split the input expression string into tokens (tokenization).
- Convert the expression from infix to postfix notation (or RPN - Reverse Polish Notation).
- Evaluate the postfix expression to find result.
All these steps are illustrated with a live demo (below) in Silverlight. The user can enter a free text expression with numbers, variables and operators and see the evaluation process step by step.
Data Structures for this Article
An expression is ordered collection of tokens. A token is a string of characters that represents a symbol of the programming language in context. The symbol can be a number constant, text constant, arithmetic operators, variable names, function names, etc.
For illustration purposes, let us consider three types of tokens, namely number constants, variables and arithmetic operators. First, we create an interface ITokenObject and then create classes for each type of token, namely Constant, Variable and Operator.
public interface ITokenObject
{
}
public class Constant : ITokenObject
{
public string Value { get; set; }
}
public class Variable : ITokenObject
{
public string Name { get; set; }
public string Value { get; set; }
}
public class Operator : ITokenObject
{
public OperatorSymbol Symbol { get; set; }
public string SymbolText { get; set; }
public int OperandCount { get; set; }
public int PrecedenceLevel { get; set; }
public OperatorAssociativity Associativity { get; set; }
}
The above classes act as data-structures for each token type. The OperatorSymbol enum indicates the operator symbol; we support these arithmetic operators: add (+), subtract (-), multiply (*), divide (/) and modulus (%). The open and close parenthesis are also considered as operators; the parenthesis are used to control the precedence of operators. The OperatorAssociativity enum indicates the associativity of the corresponding operator. The associativity controls how operands are chosen for operators in the absence of parenthesis for controlling precedence. For more information, see http://en.wikipedia.org/wiki/Operator_associativity.
The following Token class is the data-structure for an instance of a token in expressions. The Expression class contains a single property, the list of tokens.
public class Token
{
public TokenType Type { get; set; }
public ITokenObject TokenObject { get; set; }
public int Index { get; set; }
}
public class Expression
{
public List<Token> Tokens { get; set; }
}
The TokenType enum defines three enum members, namely Constant, Variable and Operator.
Tokenization
The process of building token objects from input expression string is called tokenization or lexical analysis. For more information on tokens and tokenization, see http://en.wikipedia.org/wiki/Tokenizing.
The following is the tokenizer class for data-structures defined above.
public class Tokenizer
{
public static Expression Split(string expressionString)
{
List<Token> tokens = new List<Token>();
//read the input-expression letter-by-letter and build tokens
string alphaNumericString = string.Empty;
for (int index = 0; index < expressionString.Length; index++)
{
char currentChar = expressionString[index];
if (AllOperators.Find("" + currentChar) != null) //if operator
{
if (alphaNumericString.Length > 0)
{
tokens.Add(Token.Resolve(alphaNumericString));
alphaNumericString = string.Empty; //reset token string
}
tokens.Add(ExpressionUtility.CreateOperatorToken(currentChar));
}
else if (Char.IsLetterOrDigit(currentChar) || currentChar == '.')
{
//if alphabet or digit or dot
alphaNumericString += currentChar;
}
}
//check if any token at last
if (alphaNumericString.Length > 0)
{
tokens.Add(Token.Resolve(alphaNumericString));
}
return (new Expression() { Tokens = tokens });
}
}
The AllOperators class contains instances of Operator class for all supported operators. The Resolve() method of Token class creates instances of corresponding token class. For example, it creates a Variable class for "a", Constant class for "52" and Operator class for "+", "-", etc.
Infix, Postfix and Prefix Notations
Infix, postfix and prefix notations are three different ways of writing an expression.
- In infix notation, the operators are written in-between their operands. Infix notation needs extra information (operator precedence, associativity and parenthesis) to control the order of evaluation. Example: (3 + 5 ) * 7
- In postfix notation, the operators are written after their operands. The operators act on operands immediately to the left of them. Example: 3 5 + 7 *
- In prefix notation, the operators are written before their operands. The operators act on operands to their right. Example: * 7 + 3 5
Postfix and prefix notations do not need any extra information like parenthesis or rules of operator precedence and associativity. Evaluating an infix expression (with its complex rules) will be a daunting task, unless it is converted to postfix. Evaluating an expression in postfix notation is trivially easy while using stacks.
Converting Infix to Postfix
The shunting yard algorithm, invented by Edsger Dijkstra is a stack-based algorithm used to convert an expression from infix notation to postfix notation. It maintains a stack to hold operators that are not yet added to postfix notation. The following is a summary of this algorithm:
- Read a token (left to right).
- If token is a variable or constant, push it onto the stack.
- If token is an operator,
- If open parenthesis, push it onto stack.
- If close parenthesis, pop all tokens (until but not including a left parenthesis) from stack and add them to output.
- If current token and top token in stack are arithmetic operators, then push the later onto output if its precedence is greater than the former.
For full explanation of shunting yard algorithm, see http://en.wikipedia.org/wiki/Shunting-yard_algorithm. The following function demonstrates the shunting yard algorithm for data-structures defined above.
public static bool InfixToPostfix(Expression inputExpression, out Expression postfixExpression)
{
List<Token> postfixTokens = new List<Token>();
Stack<Token> postfixStack = new Stack<Token>();
//process all tokens in input-expression, one by one
foreach (Token token in inputExpression.Tokens)
{
if (token.Type == TokenType.Constant) //handle constants
{
postfixTokens.Add(token);
}
else if (token.Type == TokenType.Variable) //handle variables
{
postfixTokens.Add(token);
}
else if (token.Type == TokenType.Operator) //handle operators
{
if (ExpressionUtility.IsOpenParenthesis(token)) //handle open-parenthesis
{
postfixStack.Push(token);
}
else if (ExpressionUtility.IsCloseParenthesis(token)) //handle close-parenthesis
{
//pop all operators off the stack onto the output (until left parenthesis)
while (true)
{
if (postfixStack.Count == 0)
{
postfixExpression = null; //error: mismatched parenthesis
return (false);
}
Token top = postfixStack.Pop();
if (ExpressionUtility.IsOpenParenthesis(top)) break;
else postfixTokens.Add(top);
}
}
else //handle arithmetic operators
{
Operator currentOperator = AllOperators.Find(token.LinearToken);
if (postfixStack.Count > 0)
{
Token top = postfixStack.Peek();
if (ExpressionUtility.IsArithmeticOperator(top))
{
//'>' operator implies less precedence
Operator stackOperator = AllOperators.Find(top.LinearToken);
if ((currentOperator.Associativity == OperatorAssociativity.LeftToRight &&
currentOperator.PrecedenceLevel >= stackOperator.PrecedenceLevel) ||
(currentOperator.Associativity == OperatorAssociativity.RightToLeft &&
currentOperator.PrecedenceLevel > stackOperator.PrecedenceLevel))
{
postfixStack.Pop();
postfixTokens.Add(top);
}
}
}
postfixStack.Push(token); //push operator to stack
}
}
}
//after reading all tokens, pop entire stack to output
while (postfixStack.Count > 0)
{
Token top = postfixStack.Pop();
if (ExpressionUtility.IsOpenParenthesis(top) || ExpressionUtility.IsCloseParenthesis(top))
{
postfixExpression = null; //error: mismatched parenthesis
return (false);
}
else
{
postfixTokens.Add(top);
}
}
postfixExpression = new Expression() { Tokens = postfixTokens };
return (true);
}
Downloads for this article
File |
Language |
Tools |
ExpressionEvaluator-Demo-Source 55.54 kb (368 downloads) |
C#, Silverlight 3 |
Visual Studio 2008 |