Math Engine, eval()-free expression interpreter for Python

The challenge

The obvious way to evaluate an expression like 3 + 4 * 2 in Python is a single line: eval("3 + 4 * 2"). That very line is the problem. eval() executes arbitrary Python code, a string disguised as numeric input such as __import__('os').system('rm -rf …') runs without complaint. For any application that takes expressions from a file, a form field, an API or a configuration string, eval() is therefore a direct code-execution vector, not a calculator.

The second, quieter defect is correctness. eval() and Python's float compute in binary: 0.1 + 0.2 yields 0.30000000000000004, 1/3 is truncated, large integers tip over into scientific notation. For a calculator, a financial formula or an educational context, that is not "almost right", it is wrong.

The third defect is diagnostics. Hand eval() a broken expression and you get a Python traceback at an internal line number, not the spot in the input string where the problem sits. For a tool that processes end-user input, that is useless.

The task, then: a complete evaluation engine from scratch that (1) never executes foreign code, (2) computes exactly rather than binary-approximately, (3) pinpoints every error to the exact character, and (4) does all of that at library quality, tested, documented, versioned and installable from PyPI. Not a weekend parser, but an engine with the discipline of a small compiler.

The implementation

eval()-free by construction

The entire library never calls Python's eval(), exec() or compile() anywhere, this is not an after-the-fact filter but the architecture itself. Input strings pass through a closed pipeline (Input → Tokenizer → Parser → Evaluator/Solver → Formatter → Output Converter), whose alphabet is a finite set of numbers, operators, parentheses and a whitelist of function names. At worst, an attacker-controlled string can trigger a typed MathError, never code execution. Even the single place that parses a user-supplied data structure uses the safe ast.literal_eval, which accepts literals only.

Recursive-descent parser with a 10-level precedence chain

Operator precedence is not hacked in via regex or a shunting-yard table, but encoded structurally as ten nested parser closures, each with exactly one precedence level: from parse_gleichung (=) through bitwise operators, shift operations, sum and term, down to parse_power (**) and parse_factor. Left- vs. right-associativity falls out of the structure: whatever consumes in a loop is left-associative (a - b - c = (a - b) - c); parse_power recurses to the right and makes ** correctly right-associative. A deliberate decision: ^ is bitwise XOR, not exponentiation, exactly as in C and Python.

Decimal precision with dynamic scaling

Every number is a decimal.Decimal from the tokenizer through to the output, never a float, which is why 0.1 + 0.2 is exactly 0.3. The precision of the Decimal context is determined anew for each calculation (between 100 and 10,000 digits, depending on the input), plus a hard input ceiling of 20,000 digits. The point: a long result is never silently truncated, a short one never wastes memory. Exactly the class of correctness that float-based calculators quietly lose here.

Character-exact error positioning

Alongside the token list, the tokenizer keeps a span list: for each token a (start_col, end_col, original_text) triple. Every AST node and every MathError carries position_start / position_end. The payoff: an error does not say "syntax error somewhere", it points at the exact character. This bookkeeping is the reason the engine is debuggable across an API. Via a single setting (readable_error), the same position info switches between two contracts: typed exceptions for the library, a visual diagnostic with a ^ pointer under the faulty column for the console.

Typed, catalogued error system

A base class MathError plus exactly seven domain subclasses, including a catalogue of 78 unique, four-digit error codes across nine families. The digits are structured: first digit = family, second = component, the rest = sequence number. Code 3008 therefore means "Calculator family, core parser, more than one '.' in a number". These codes are deliberately never renumbered, they are a contract toward the UI and external log parsers. The public calculate() function wraps the whole pipeline in a layered except block, so that no raw ZeroDivisionError or ValueError ever reaches the caller, everything lands typed in the MathError hierarchy.

More than a calculator

Two further capabilities sit on the same AST. If an expression contains an = and a variable, the engine solves the linear equation symbolically: each node returns a (factor, constant) pair, the solver brings both sides into the form A·x + B = C·x + D and computes x. Non-linearity is caught structurally (variable·variable, variable in the denominator, variable in the exponent), degenerate cases named cleanly ("No Solution", "Inf. Solutions"). On top of that, a programmer's-calculator mode with fixed word width (8/16/32/64 bit), two's complement and bitwise operators, so that 127 + 1 in 8-bit signed mode correctly overflows to -128. A prefix-driven output system (dec:, int:, hex: …) determines the Python return type and refuses lossy conversions instead of silently truncating.

Engineering highlights & test discipline

Reliability was not a feature here but the reason for being, a safe engine you cannot trust is useless.

399 pytest tests, 90% coverage. The suite was grown from 234 to 399 tests, coverage raised from 69% to 90%. A dedicated helper assert_error_location(expr, code, start, end) checks not only that an expression fails, but that it fails with the exact error code at the exact character position, the position data is itself part of the test contract.
CI matrix across five Python versions. GitHub Actions runs the full suite on every push and pull request against Python 3.8, 3.9, 3.10, 3.11 and 3.12; the coverage report goes to Codecov. Dead and work-in-progress code is honestly excluded from coverage rather than padding the number.
Clean layering, broken cycles. Clearly separated modules (calculator / utility / cli / plugins); circular imports are resolved via deliberately deferred imports. Every class and function carries a docstring, a standalone DOCUMENTATION.md captures the architecture, the full API, parser internals and the complete error-code catalogue.
Library quality on delivery. Pure-Python wheel, three console entry points, exactly two runtime dependencies (rich, prompt_toolkit). The interactive REPL offers persistent history and tab completion. Six minor releases (0.1.0 → 0.6.7) in roughly five months, throughout following Semantic Versioning.

The result

Live on PyPI as math-engine, installable via pip install math-engine, MIT-licensed, pure-Python wheel for Python 3.8+, with three console commands out of the box.
eval()-free by construction. Closed input alphabet, the worst case of a hostile input is a typed error, never code execution.
399 tests, 90% coverage, green across five Python versions (3.8–3.12) on every push, with test cases that pin exact error codes to exact character positions.
Exact Decimal arithmetic with adaptive precision (100 … 10,000 digits) and a 20,000-digit input limit, no silent float drift, no silent truncation.
Character-exact diagnostics: 78 error codes across nine families, an eight-class typed exception hierarchy, position_start / position_end on every error.
Roughly 4,200 LOC of production code in cleanly layered modules, backed by ~2,400 LOC of tests, plus full technical documentation and a catalogued error system.