This document is part of the scratchgen project — a Python library that generates Scratch (.sb3) files from code. Example 33 is an algebraic expression simplifier: a Scratch program that takes mathematical expressions and simplifies them step by step, showing each transformation visually.
The project has several files (all in docs/examples/):
33_algebra_simplifier.py — full version with graphical UI (keyboard, stamps)33_algebra_simplifier_bare_engine.py — engine only, minimal UI (this doc describes it)33_algebra_ui.py — UI shell with engine stubbed out33_algebra_prototype_recursive.py — Python prototype (clean, recursive)33_algebra_prototype_iterative.py — Python prototype (iterative, Scratch-compatible)33_how_the_python_prototypes_work.md — companion doc explaining the Python prototypesThis document explains how the math engine works inside Scratch — how it’s structured, what each procedure does, and how to follow the code. If you’ve opened the .sb3 in the Scratch editor and are looking at ~1800 blocks thinking “where do I even start?” — this is your map.
Takes an algebraic expression like 2*(x+3) or (a+b)*(a-b) and simplifies it step by step:
2*(x+3) → 2*x + 2*3 → 2*x + 6
It knows: distribute, combine like terms, constant fold, cancel fractions, GCD, and more.
expression (text)
│
▼
parse ← turns text into a tree (AST)
│
▼
format ← turns tree back into text for display
│
▼
┌─────────────────────┐
│ simplify_step │ ← press Space: applies one rule
│ format │ ← shows the new expression
│ (repeat until done)│
└─────────────────────┘
expression variable to your input text.parse is called — builds the tree in lists nT, nV, nK. Sets root.format is called — walks the tree, produces readable text in display.simplify_step — finds one rule to apply, transforms the tree, then format displays the result.Scratch has no objects or pointers, so the tree lives in three parallel lists:
| List | Contains | Example |
|---|---|---|
nT |
node type | "SUM" |
nV |
node value | "" (empty for SUM/PROD/NEG/FRAC) |
nK |
children (comma-separated indices) | "1,4" (children are nodes 1 and 4) |
Position = node index. If nT[5] = "SUM" and nK[5] = "1,2", that means node 5 is an addition of node 1 and node 2.
Node types:
NUM — a number (value in nV)VAR — a variable letter (value in nV)SUM — addition (2 or more children)PROD — multiplication (2 or more children)NEG — negation (1 child)FRAC — division (2 children: numerator, denominator)Nodes are append-only — once created, never modified. Simplification creates new nodes and a new root each time.
tokenizeReads expression character by character. Groups consecutive digits into multi-digit numbers. Outputs to the tokens list.
"2*(x+3)" → ["2", "*", "(", "x", "+", "3", ")"]
mark_unaryScans tokens. Any - at the start, after (, or after an operator is renamed to "NEG" (unary minus, not subtraction).
parseThe main parser. Uses the shunting-yard algorithm with two stacks:
opStk — operators waiting to be appliedoutStk — finished tree nodes (indices)Reads tokens left to right:
alloc_node creates a leaf node, pushes index to outStkopStk (calling apply_op), then pushes itself( → pushed to opStk as a marker) → pops and applies operators until ( is foundAfter all tokens: pops remaining operators. The single node on outStk becomes root.
apply_opPops operands from outStk, creates a new node:
+ → SUM (extends left child if it’s already a SUM — makes n-ary)- → SUM with right child wrapped in NEG* → PROD (extends left if already PROD)/ → FRACNEG → wraps one operand in NEG (or just negates a number directly)simplify_stepThe workhorse. Depth-first search through the tree:
root, call try_simplifyret_changed = 0): descend to first childUses pN/pC lists (path nodes / path child-index) to remember the descent.
try_simplifyThe brain. Checks the node type and tries all applicable rules:
%NEG rules%
-(a+b) → (-a)+(-b)%SUM rules%
NUM 0 childextract_coeff decomposes each child into (coefficient, variable-key). Matching keys → merge. Uses sort_varkey so a*b = b*a.%PROD rules%
2*3*x → 6*x)a*(b+c) → a*b + a*c)%FRAC rules%
x/1 → x0/x → 0x/x → 1extract_coeffSplits a term into coefficient + variable key:
3*x*y → coeff=3, varkey="xy"x → coeff=1, varkey="x"-2*a → coeff=-2, varkey="a"Calls sort_varkey to normalize: b*a and a*b both get varkey "ab".
make_termBuilds a node from coefficient + variable factors:
formatPost-order traversal using stack fStk. Entries: "nodeIndex,phase".
Uses fmt_lookup / fmt_store with cache lists fN/fS.
Formatting rules:
"-" + child (parentheses if child is complex)" + ", NEG children shown as " - "" * ", parentheses around SUM/NEG childrennumerator " / " denominator, parentheses where needed| Procedure | Input | Output | What it does |
|---|---|---|---|
alloc_node |
t5=type, t6=value, ret_str=kids | ret=new index | Appends to nT/nV/nK |
get_kid |
t1=node, t2=which (1-based) | ret=child index | Parses comma-separated nK |
kid_count |
t1=node | ret=count | Counts children |
kids_to_list |
t1=node | kT list filled | Splits nK into list |
list_to_kids |
kT list | ret_str=joined | Joins list back |
sort_varkey |
ret_varkey string | ret_varkey sorted | Bubble-sorts characters |
Scratch variables are global. If two procedures both use i as a loop counter, one clobbers the other. Each procedure gets dedicated variables:
ui — only kids_to_list and list_to_kidsgk — only get_kidsi, sj, sf — only sort_varkeyfi — only flatten operationsni — only neg-distribute and frac cancelga, gb, gt — only GCD computationGeneral temps t1–t8 are shared by simplifier rules, but ret2 saves the original node before try_simplify (which clobbers t1/t3).
"3+5", step through. Parse creates 3 nodes, simplify combines them into NUM 8."2*(x+3)". Watch PROD(2, SUM(x,3)) become SUM(PROD(2,x), PROD(2,3)), then constant fold gives 2*x + 6."(a+b)*(a-b)" — distribute, flatten, cancel like terms, end up with a*a - b*b.logs list in the variables panel. Shows which nodes are visited and which rules fire."t" — 17 built-in tests verify correctness.x/x → 1 — simplest example. Check condition, build new node with alloc_node, set ret_str and ret_changed=1.