It's been a long time since the internet was cut off at work. I tried to give the title a hard name that seems not to be well received in the current Qiita
Due to the performance required by the requirements, I created an analyzer that performs four arithmetic operations using the recursive descent parsing method. I'll leave it as it may be helpful to someone
Since the formula is obtained as a character string in the batch application, it seems that I wanted to calculate it and return the result, but the number was huge. Originally, it seems that he intended to call JavaScript from Java and pass the formula as it is for calculation, but it seems that it did not finish in one day because the number of target formulas is tens of thousands and hundreds of thousands.
Moreover, it seems that the number will increase steadily in the future!
Of course, there is also the next job, so we have to make sure that the process is completed by the start time.
So I was talking about wanting me to do something about it. When I first heard about it, I thought that it was not possible to perform high-speed calculation processing with JavaScript by calling an external program.
You just have to create an analyzer that can perform high-speed four arithmetic operations. It seems that you also want to control the rounding error during division, so you need to think about that as well.
Well, I expected that if I implemented it all in Java, there would be no problem in terms of speed, so I decided to go with a reasonably powerful recursive descent parsing method instead of a simple implementation.
The following is implemented in Java 8
FormulaParser.java
import java.math.BigDecimal;
import java.math.RoundingMode;
import java.util.HashMap;
import java.util.Map;
import java.util.Optional;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class FormulaParser {
private static final int HIGH_PRIORITY = 3;
private static final int MID_PRIORITY = 2;
private static final int LOW_PRIORITY = 1;
private static final Pattern EMPTY_BRACKET = Pattern.compile("\\(\\)");
private static final Pattern NOT_CLOSE_BRACKET = Pattern.compile("\\)\\(");
private static final int NO_EXIST = -1;
private static final String ZERO = "0";
private static final String BRACKET_AND_ZERO = "(0";
private String expression; //formula
private FormulaParser left; //Left child node
private FormulaParser right; //Right child node
private int scale;
/**
*Constructor with scale
** Consider scale only in the case of division
*
* @param expression formula
* @param scale Decimal point position when dividing
*/
public FormulaParser(String expression, int scale) {
this.expression = format(expression);
this.scale = scale;
}
/**
*Divide the formula into binary trees and calculate
*
* @throws FormulaParseException
*/
public String execute() throws FormulaParseException {
//Remove the outermost parentheses
String exp = delMostOuterBrackets(this.expression);
//Find the operator from the expression and get the position
int opePos = getOpePos(exp);
//If the expression does not contain an operator, it is considered as a term.
if (opePos < 0) {
left = null;
right = null;
return new BigDecimal(exp).toPlainString();
}
//Illegal treatment if operator position is at the beginning or end of an expression
if (opePos == 0 || opePos == exp.length() - 1) {
throw new FormulaParseException(exp);
}
//Create a node with the left side of the operator as the left subexpression
left = new FormulaParser(exp.substring(0, opePos), scale);
//Create a node with the right side of the operator as the right subexpression
right = new FormulaParser(exp.substring(opePos + 1), scale);
//Set the remaining operators in this note
expression = exp.substring(opePos, opePos + 1);
return calculate(left.execute(), OPERATOR.getEnum(expression), right.execute(), scale);
}
/**
*Get the position of the operator
*
* @param exp expression
* @return operator position(0 beginning)/If not-Returns 1
*/
private static int getOpePos(String exp) {
int opePos = NO_EXIST;
int curPriority = Integer.MAX_VALUE;
int nest = 0;
for (int i = 0, len = exp.length(); i < len; i++) {
int priority = 0;
switch (OPERATOR.getEnum(String.valueOf(exp.charAt(i)))) {
case PLUS:
case MINUS:
priority = MID_PRIORITY;
break;
case MULTIPLY:
case DIVIDE:
priority = HIGH_PRIORITY;
break;
case LEFT_BRACKET:
nest++;
continue;
case RIGHT_BRACKET:
nest--;
continue;
default:
continue;
}
if (nest == 0 && priority <= curPriority) {
curPriority = priority;
opePos = i;
}
}
return opePos;
}
/**
*Calculation process
*
* @param lExp Left term
* @param ope operator
* @param rExp Right term
* @param scale scale
*/
private String calculate(String lExp, OPERATOR ope, String rExp, int scale) throws FormulaParseException {
BigDecimal left = new BigDecimal(lExp);
BigDecimal right = new BigDecimal(rExp);
switch (ope) {
case PLUS:
return left.add(right).toPlainString();
case MINUS:
return left.subtract(right).toPlainString();
case MULTIPLY:
return left.multiply(right).toPlainString();
case DIVIDE:
return left.divide(right, scale, RoundingMode.DOWN).toPlainString();
default:
throw new FormulaParseException(left + ope.val + rExp);
}
}
/**
*format
*Half-width space is excluded
*Since it is difficult to distinguish between positive and negative symbols and addition / subtraction operators, fill in 0 to make it easier.
*
* @param exp expression
* @return formatted expression ex) -3 + 1 → 0-3+1
*/
private static String format(String exp) {
//Eliminate half-width spaces
StringBuilder e = new StringBuilder(exp.replaceAll(" ", ""));
int cur = 0;
for (; ; ) {
int plusIndex = e.indexOf(OPERATOR.PLUS.val, cur);
int minusIndex = e.indexOf(OPERATOR.MINUS.val, cur);
if (plusIndex == NO_EXIST && minusIndex == NO_EXIST) {
break;
}
//Get index of which operator exists and precedes
if (plusIndex < minusIndex) {
cur = (plusIndex == NO_EXIST) ? minusIndex : plusIndex;
} else {
cur = (minusIndex == NO_EXIST) ? plusIndex : minusIndex;
}
if (cur == 0) {
e.insert(cur, ZERO);
cur = cur + ZERO.length() + 1;
continue;
}
//Add 0 if the character immediately before the operator is not a number and is not a closing brace, multiplication, or division operator
char preChar = e.charAt(cur - 1);
if (!Character.isDigit(preChar)
&& preChar != OPERATOR.RIGHT_BRACKET.cVal) {
if (preChar == OPERATOR.MULTIPLY.cVal
|| preChar == OPERATOR.DIVIDE.cVal
|| preChar == OPERATOR.MINUS.cVal) {
int endDigits = 0;
for (int i = cur + 1, len = e.length(); i < len; i++) {
char c = e.charAt(i);
if (!Character.isDigit(c) && c != '.') {
break;
}
endDigits = i;
}
e.insert(cur, BRACKET_AND_ZERO).insert(endDigits + BRACKET_AND_ZERO.length() + 1, OPERATOR.RIGHT_BRACKET.cVal);
cur = endDigits + BRACKET_AND_ZERO.length();
} else {
e.insert(cur, ZERO);
cur = cur + ZERO.length();
}
}
cur++;
if (cur > e.length()) break;
}
return e.toString();
}
/**
*Parenthesis removal process
*Remove the outermost parenthesis from the expression
* (1+2)+(3+4)Returns an expression like
*Throw an exception as an invalid expression if the parentheses are not closed
*
* @param exp expression
* @return Expression ex with parentheses removed) (4*2) → 4*2, ((3+4)) → 3+4, (1+2)+(3+4) → (1+2)+(3+4)
* @throws FormulaParseException
*/
private static String delMostOuterBrackets(String exp) throws FormulaParseException {
if (matchFirst(exp, EMPTY_BRACKET).isPresent()) throw new FormulaParseException(exp);
if (matchFirst(exp, NOT_CLOSE_BRACKET).isPresent()) throw new FormulaParseException(exp);
if (exp.charAt(0) == OPERATOR.RIGHT_BRACKET.cVal) throw new FormulaParseException(exp);
boolean hasMostOuterBrackets = false;
int nest = 0;
//Verify the first character
if (exp.charAt(0) == OPERATOR.LEFT_BRACKET.cVal) {
hasMostOuterBrackets = true;
nest++;
}
//Verify the first and subsequent characters character by character
for (int i = 1, len = exp.length(); i < len; i++) {
if (exp.charAt(i) == OPERATOR.LEFT_BRACKET.cVal) {
nest++;
} else if (exp.charAt(i) == OPERATOR.RIGHT_BRACKET.cVal) {
nest--;
if (nest == 0 && i < len - 1) {
hasMostOuterBrackets = false;
}
}
}
//Illegal expression if parentheses are not closed
if (nest != 0) throw new FormulaParseException(exp);
//If there are no parentheses, return it as it is
if (!hasMostOuterBrackets) return exp;
//Remove the first and last parentheses
String ret = exp.substring(1, exp.length() - 1);
//If there are still parentheses, process again
if (ret.charAt(0) == OPERATOR.LEFT_BRACKET.cVal
&& ret.charAt(ret.length() - 1) == OPERATOR.RIGHT_BRACKET.cVal) {
return delMostOuterBrackets(ret);
}
return ret;
}
/**
*Search
*
* @param s Search target
* @param p regular expression
* @return search results
*/
private static Optional<String> matchFirst(String s, Pattern p) {
Matcher m = p.matcher(s);
return m.find() ? Optional.of(m.group(0)) : Optional.empty();
}
/**
*operator
*/
private enum OPERATOR {
PLUS("+", '+'),
MINUS("-", '-'),
MULTIPLY("*", '*'),
DIVIDE("/", '/'),
LEFT_BRACKET("(", '('),
RIGHT_BRACKET(")", ')'),
NO_OPE(" ", ' ');
public String val;
public char cVal;
private final static Map<String, OPERATOR> m = new HashMap<>();
static {
for (OPERATOR o : OPERATOR.values()) {
m.put(o.val, o);
}
}
private OPERATOR(String val, char cVal) {
this.val = val;
this.cVal = cVal;
}
public static OPERATOR getEnum(String val) {
return m.getOrDefault(val, NO_OPE);
}
}
}
FormulaParseException.java
public class FormulaParseException extends Exception {
public FormulaParseException(String msg) {
super(msg);
}
}
The exception implemented something like that Analysis by parser and calculation processing are not separated and are combined into one class. Since it is troublesome to distinguish between positive and negative symbols and addition / subtraction operators, it is filled with 0. It is also a point to add parentheses as needed
Since analysis also costs a certain amount, the original application has a certain amount of pre-checking.
Also, I am making it possible to process Math functions and variables, but I omitted it because it is difficult to remember and write code.
The matchFirst ()
method is a remnant of that
I implemented other useful utilities and did various things
All you have to do is replace the variables appropriately
As for the Math function, just to give you a hint, the Math function is treated as a term in the code above.
So, in the code below, you should determine whether it is a Math function and call the method that processes it if it is a Math function.
return new BigDecimal(exp).toPlainString();
The method that processes the Math function should be processed by checking the characters steadily like delMostOuterBrackets ()
and format ()
.
In other words, you can apply the code written in the article, so if necessary, try making it yourself.
For the time being, I implemented it and tried it, and it met the requirements, so it ends there. It is suspicious if the number of cases goes from tens of millions to 100 millions, but it seems that it will not go that far, so it seems okay
I wrote a lot of tests, but I will omit it because it is difficult to remember and port Well, I don't even need JUnit because I just check the expected result of the expression.
I'm learning by myself, not by majoring in CS, so it was quite helpful to think carefully about this kind of design and implementation in practice.
Also, I hope that there will be projects that can not be matched without knowledge of algorithms and data structures, and the article is over.
Noshi
Recommended Posts