Example: Live Variable Analysis
The problem
A variable is live at a program point if its current value might still be read before
it is overwritten on at least one subsequent execution path. If a variable is dead —
never read before being overwritten — then any register holding it can be safely reused.
Compilers use this to allocate registers efficiently. Static analysis tools use it to
detect dead assignments (a write to a variable that is never subsequently read is
suspicious: either the write is dead code, or the variable was meant to be used and
wasn't). Both uses require knowing which variables are live at each point.
This page builds a complete live variable analysis from scratch, introducing each piece
of the dataflow analysis framework before the code that
implements it.
Step 1 — The target program
Here is the program we will analyse:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21 | public class Example {
/**
* A simple method used to illustrate live variable analysis.
*
* <p>At the join point after the if/else, both the true branch (y = c)
* and the false branch (y = x) have assigned y, but they use different variables.
* This makes the join point non-trivial: the set of live variables is the union
* of what is live on each incoming path.
*/
public static int compute(int a, int b, int c) {
int x = a + b; // stmt: x = a + b (uses a, b; defines x)
int y;
if (a > 0) {
y = c; // stmt: y = c (uses c; defines y)
} else {
y = x; // stmt: y = x (uses x; defines y)
}
return y; // stmt: return y (uses y)
}
}
|
The interesting point is the if/else: the true branch uses c and defines y;
the false branch uses x and defines y. At the join point
after the two branches, facts from both paths must be combined.
Step 2 — The five elements
Every dataflow analysis is fully described by five choices:
| Element |
This analysis |
| Direction |
Backward — liveness is about the future: will this value be read later? |
| Fact domain |
Set<Local> — the set of locals that are live at this point |
| Boundary condition |
∅ — nothing is live after the method returns |
| Initial fact |
∅ — start with the conservative assumption that nothing is live |
| Meet |
Union (∪) — this is a may analysis: a variable is live if it is live on at least one path |
| Transfer function |
IN[S] = use(S) ∪ (OUT[S] − def(S)) — gen/kill style |
Why backward? Because liveness is a property of the future. To know whether x is
live before statement S, you need to know whether x is used after S — which means
looking at what comes after. Facts therefore travel backward, from exit toward entry.
Why union at join points? Because we want soundness: if x might be used on any
path after a join, we must report it as live. Union ensures nothing is missed.
Step 3 — The framework interface
The DataflowAnalysis<F> interface captures the five elements above as method signatures.
Any analysis that implements it can be driven by the generic DataflowSolver:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31 | public interface DataflowAnalysis<F> {
/** True for a forward analysis (entry → exit); false for backward (exit → entry). */
boolean isForward();
/**
* The fact at the boundary node: the method entry for forward analyses, the method exit for
* backward analyses. Represents what the analysis knows before examining any code.
*/
F newBoundaryFact(Body body);
/**
* The fact used to initialise every non-boundary node at the start of the analysis. Typically the
* "bottom" element of the lattice (e.g. empty set, empty map).
*/
F newInitialFact();
/**
* Meet {@code fact} into {@code target} (i.e. {@code target = target ⊓ fact}). Mutates {@code
* target} in place so the solver can reuse the same object across iterations at a join point,
* avoiding repeated allocation.
*/
void meetInto(F fact, F target);
/**
* Apply the transfer function for {@code stmt}: read {@code in}, write {@code out}.
*
* @return true if {@code out} changed (so the solver knows whether to propagate further)
*/
boolean transferNode(Stmt stmt, F in, F out);
}
|
Step 4 — Building the analysis
Fact type and boundary
The analysis fact is Set<Local>. The
boundary condition is the empty set (nothing is live
past the return), and the initial fact is also empty:
| @Override
public Set<Local> newBoundaryFact(Body body) {
// Nothing is live after the method returns.
return new HashSet<>();
}
@Override
public Set<Local> newInitialFact() {
// Conservative start: assume nothing is live (we add facts as we discover uses).
return new HashSet<>();
}
|
Meet operator
The meet is union: if x is live on either incoming path,
it is live at the join point. meetInto mutates target in place so the solver can
reuse the same set object across iterations at each join point,
avoiding repeated allocation:
| @Override
public void meetInto(Set<Local> fact, Set<Local> target) {
// May analysis: a variable is live if it is live on at least one path → union.
target.addAll(fact);
}
|
Transfer function
The transfer function reads OUT[S] and writes IN[S]
(backward direction). It first kills the variable defined by S (if any), then gens
the variables used by S:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30 | @Override
public boolean transferNode(Stmt stmt, Set<Local> in, Set<Local> out) {
// We compute IN from OUT (backward direction).
// IN[S] = use(S) ∪ (OUT[S] − def(S))
Set<Local> newIn = new HashSet<>(out);
if (stmt instanceof JAssignStmt) {
JAssignStmt assign = (JAssignStmt) stmt;
// Kill: remove the defined variable (it is no longer "needed" upstream).
if (assign.getLeftOp() instanceof Local) {
newIn.remove(assign.getLeftOp());
}
// Gen: add any locals used on the right-hand side.
addLocals(assign.getRightOp(), newIn);
} else {
// For all other statements, gen the locals they use.
for (Value use : stmt.getUses()) {
if (use instanceof Local) {
newIn.add((Local) use);
}
}
}
if (newIn.equals(in)) {
return false; // unchanged — solver can stop propagating backward from here
}
in.clear();
in.addAll(newIn);
return true;
}
|
Step 5 — The worklist solver
The generic DataflowSolver drives the iteration. In the backward case it meets all
successor IN facts into OUT, applies the transfer, and re-schedules predecessors
whenever IN changed:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30 | // Seed the worklist with all nodes (conservative: assume every fact may change).
Deque<Stmt> worklist = new ArrayDeque<>(nodes);
while (!worklist.isEmpty()) {
Stmt node = worklist.poll();
if (analysis.isForward()) {
// Forward: meet all predecessor OUT facts into this node's IN fact.
F in = inFacts.get(node);
for (Stmt pred : cfg.predecessors(node)) {
analysis.meetInto(outFacts.get(pred), in);
}
// Apply the transfer function; if OUT changed, schedule successors.
if (analysis.transferNode(node, in, outFacts.get(node))) {
worklist.addAll(cfg.successors(node));
}
} else {
// Backward: meet all successor IN facts into this node's OUT fact.
F out = outFacts.get(node);
List<Stmt> succs = cfg.successors(node);
succs.addAll(cfg.exceptionalSuccessors(node).values());
for (Stmt succ : succs) {
analysis.meetInto(inFacts.get(succ), out);
}
// Apply the transfer function; if IN changed, schedule predecessors.
if (analysis.transferNode(node, inFacts.get(node), out)) {
worklist.addAll(cfg.predecessors(node));
}
}
}
|
The solver terminates at the fixed point — when the
worklist is empty because no fact changed in the last iteration.
Hand trace on Example.compute (simplified Jimple, abbreviated local names):
| Statement |
IN (live before) |
OUT (live after) |
return y |
{y} |
{} (boundary) |
y = x (else branch) |
{x} |
{y} |
y = c (if branch) |
{c} |
{y} |
if a > 0 |
{a, b, c, x} |
{c, x} (union of both branches) |
x = a + b |
{a, b, c} |
{a, b, c, x} |
(entry) |
{a, b, c} |
— |
At the if join (looking backward), the two branches bring {c} and {x}, so their
union is {c, x}. Then the if condition itself uses a, adding it.
Step 6 — Setting up and running the analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 | AnalysisInputLocation inputLocation =
PathBasedAnalysisInputLocation.create(
Paths.get("src/test/resources/Liveness/binary"), SourceType.Application);
View view = new JavaView(inputLocation);
ClassType classType = view.getIdentifierFactory().getClassType("Example");
SootClass sootClass = view.getClass(classType).get();
MethodSignature methodSignature =
view.getIdentifierFactory().parseMethodSignature("<Example: int compute(int,int,int)>");
SootMethod method = sootClass.getMethod(methodSignature.getSubSignature()).get();
Body body = method.getBody();
// Run live variable analysis and retrieve the solver for querying results.
DataflowSolver<Set<Local>> solver = LiveVariableAnalysis.analyze(body);
|
Step 7 — Verifying the result
These assertions are what CI checks on every pull request. If the SootUp API changes
in a way that breaks the analysis, this test fails and the documentation-code gap is
caught before it reaches the docs site:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34 | for (Stmt stmt : cfg.getNodes()) {
Set<Local> liveIn = solver.getInFact(stmt);
if (stmt instanceof JReturnStmt) {
JReturnStmt ret = (JReturnStmt) stmt;
Immediate returnedValue = ret.getOp();
// The variable being returned must be live just before the return.
assertTrue(returnedValue instanceof Local, "return operand should be a Local");
assertTrue(
liveIn.contains((Local) returnedValue), "returned variable must be live before return");
// No other locals should be live at the return point (nothing else is used after).
assertFalse(
liveIn.stream().anyMatch(l -> l != returnedValue),
"no other variable should be live before return");
}
if (stmt instanceof JAssignStmt) {
JAssignStmt assign = (JAssignStmt) stmt;
// A variable that is defined and then immediately never used should NOT be live
// in its own OUT set (it was just killed).
// We spot-check: nothing defined here should appear in OUT (post-transfer).
Set<Local> liveOut = solver.getOutFact(stmt);
if (assign.getLeftOp() instanceof Local) {
Local defined = (Local) assign.getLeftOp();
// If this definition is the last use of the variable (i.e. it is dead code),
// the defined variable will not appear in liveOut. We cannot assert this
// universally because the variable might be used later. We just verify the
// result is non-null (the solver ran).
assertTrue(liveOut != null, "solver must produce a result for every stmt");
}
}
}
|
What to try next