algebra simplifier explanation, part 2: Scratch

Context

This document is part of the scratchgen project — a Python library that generates Scratch (.sb3) files from code. Example 33 is an algebraic expression simplifier: a Scratch program that takes mathematical expressions and simplifies them step by step, showing each transformation visually.

The project has several files (all in docs/examples/):

This document explains how the math engine works inside Scratch — how it’s structured, what each procedure does, and how to follow the code. If you’ve opened the .sb3 in the Scratch editor and are looking at ~1800 blocks thinking “where do I even start?” — this is your map.

What it does

Takes an algebraic expression like 2*(x+3) or (a+b)*(a-b) and simplifies it step by step:

2*(x+3)  →  2*x + 2*3  →  2*x + 6

It knows: distribute, combine like terms, constant fold, cancel fractions, GCD, and more.

Execution flow — bird’s eye view

expression (text)
     │
     ▼
   parse          ← turns text into a tree (AST)
     │
     ▼
   format         ← turns tree back into text for display
     │
     ▼
   ┌─────────────────────┐
   │  simplify_step      │  ← press Space: applies one rule
   │  format             │  ← shows the new expression
   │  (repeat until done)│
   └─────────────────────┘
  1. Set the expression variable to your input text.
  2. parse is called — builds the tree in lists nT, nV, nK. Sets root.
  3. format is called — walks the tree, produces readable text in display.
  4. Each Space press calls simplify_step — finds one rule to apply, transforms the tree, then format displays the result.
  5. When no rule applies, simplification is done.

The tree structure (AST)

Scratch has no objects or pointers, so the tree lives in three parallel lists:

List Contains Example
nT node type "SUM"
nV node value "" (empty for SUM/PROD/NEG/FRAC)
nK children (comma-separated indices) "1,4" (children are nodes 1 and 4)

Position = node index. If nT[5] = "SUM" and nK[5] = "1,2", that means node 5 is an addition of node 1 and node 2.

Node types:

Nodes are append-only — once created, never modified. Simplification creates new nodes and a new root each time.

The parser — procedure by procedure

tokenize

Reads expression character by character. Groups consecutive digits into multi-digit numbers. Outputs to the tokens list.

"2*(x+3)"["2", "*", "(", "x", "+", "3", ")"]

mark_unary

Scans tokens. Any - at the start, after (, or after an operator is renamed to "NEG" (unary minus, not subtraction).

parse

The main parser. Uses the shunting-yard algorithm with two stacks:

Reads tokens left to right:

After all tokens: pops remaining operators. The single node on outStk becomes root.

apply_op

Pops operands from outStk, creates a new node:

The simplifier — procedure by procedure

simplify_step

The workhorse. Depth-first search through the tree:

  1. Start at root, call try_simplify
  2. If nothing fires (ret_changed = 0): descend to first child
  3. If still nothing: backtrack to parent, try next child
  4. When a rule fires: rebuild the path from that node back to root

Uses pN/pC lists (path nodes / path child-index) to remember the descent.

try_simplify

The brain. Checks the node type and tries all applicable rules:

%NEG rules%

%SUM rules%

%PROD rules%

%FRAC rules%

extract_coeff

Splits a term into coefficient + variable key:

Calls sort_varkey to normalize: b*a and a*b both get varkey "ab".

make_term

Builds a node from coefficient + variable factors:

The formatter

format

Post-order traversal using stack fStk. Entries: "nodeIndex,phase".

Uses fmt_lookup / fmt_store with cache lists fN/fS.

Formatting rules:

Utility procedures

Procedure Input Output What it does
alloc_node t5=type, t6=value, ret_str=kids ret=new index Appends to nT/nV/nK
get_kid t1=node, t2=which (1-based) ret=child index Parses comma-separated nK
kid_count t1=node ret=count Counts children
kids_to_list t1=node kT list filled Splits nK into list
list_to_kids kT list ret_str=joined Joins list back
sort_varkey ret_varkey string ret_varkey sorted Bubble-sorts characters

Why so many variables?

Scratch variables are global. If two procedures both use i as a loop counter, one clobbers the other. Each procedure gets dedicated variables:

General temps t1t8 are shared by simplifier rules, but ret2 saves the original node before try_simplify (which clobbers t1/t3).

Tips for exploring

  1. Start simple: expression "3+5", step through. Parse creates 3 nodes, simplify combines them into NUM 8.
  2. Distribution: try "2*(x+3)". Watch PROD(2, SUM(x,3)) become SUM(PROD(2,x), PROD(2,3)), then constant fold gives 2*x + 6.
  3. Full experience: "(a+b)*(a-b)" — distribute, flatten, cancel like terms, end up with a*a - b*b.
  4. Enable logs: check the logs list in the variables panel. Shows which nodes are visited and which rules fire.
  5. Run tests: press "t" — 17 built-in tests verify correctness.
  6. Add a rule: look at FRAC x/x → 1 — simplest example. Check condition, build new node with alloc_node, set ret_str and ret_changed=1.