Skip to content

Example: Live Variable Analysis

The problem

A variable is live at a program point if its current value might still be read before it is overwritten on at least one subsequent execution path. If a variable is dead — never read before being overwritten — then any register holding it can be safely reused.

Compilers use this to allocate registers efficiently. Static analysis tools use it to detect dead assignments (a write to a variable that is never subsequently read is suspicious: either the write is dead code, or the variable was meant to be used and wasn't). Both uses require knowing which variables are live at each point.

This page builds a complete live variable analysis from scratch, introducing each piece of the dataflow analysis framework before the code that implements it.


Step 1 — The target program

Here is the program we will analyse:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
public class Example {

    /**
     * A simple method used to illustrate live variable analysis.
     *
     * <p>At the join point after the if/else, both the true branch (y = c)
     * and the false branch (y = x) have assigned y, but they use different variables.
     * This makes the join point non-trivial: the set of live variables is the union
     * of what is live on each incoming path.
     */
    public static int compute(int a, int b, int c) {
        int x = a + b;   // stmt: x = a + b  (uses a, b; defines x)
        int y;
        if (a > 0) {
            y = c;       // stmt: y = c       (uses c; defines y)
        } else {
            y = x;       // stmt: y = x       (uses x; defines y)
        }
        return y;        // stmt: return y    (uses y)
    }
}

The interesting point is the if/else: the true branch uses c and defines y; the false branch uses x and defines y. At the join point after the two branches, facts from both paths must be combined.


Step 2 — The five elements

Every dataflow analysis is fully described by five choices:

Element This analysis
Direction Backward — liveness is about the future: will this value be read later?
Fact domain Set<Local> — the set of locals that are live at this point
Boundary condition — nothing is live after the method returns
Initial fact — start with the conservative assumption that nothing is live
Meet Union (∪) — this is a may analysis: a variable is live if it is live on at least one path
Transfer function IN[S] = use(S) ∪ (OUT[S] − def(S))gen/kill style

Why backward? Because liveness is a property of the future. To know whether x is live before statement S, you need to know whether x is used after S — which means looking at what comes after. Facts therefore travel backward, from exit toward entry.

Why union at join points? Because we want soundness: if x might be used on any path after a join, we must report it as live. Union ensures nothing is missed.


Step 3 — The framework interface

The DataflowAnalysis<F> interface captures the five elements above as method signatures. Any analysis that implements it can be driven by the generic DataflowSolver:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
public interface DataflowAnalysis<F> {

  /** True for a forward analysis (entry → exit); false for backward (exit → entry). */
  boolean isForward();

  /**
   * The fact at the boundary node: the method entry for forward analyses, the method exit for
   * backward analyses. Represents what the analysis knows before examining any code.
   */
  F newBoundaryFact(Body body);

  /**
   * The fact used to initialise every non-boundary node at the start of the analysis. Typically the
   * "bottom" element of the lattice (e.g. empty set, empty map).
   */
  F newInitialFact();

  /**
   * Meet {@code fact} into {@code target} (i.e. {@code target = target ⊓ fact}). Mutates {@code
   * target} in place so the solver can reuse the same object across iterations at a join point,
   * avoiding repeated allocation.
   */
  void meetInto(F fact, F target);

  /**
   * Apply the transfer function for {@code stmt}: read {@code in}, write {@code out}.
   *
   * @return true if {@code out} changed (so the solver knows whether to propagate further)
   */
  boolean transferNode(Stmt stmt, F in, F out);
}

Step 4 — Building the analysis

Fact type and boundary

The analysis fact is Set<Local>. The boundary condition is the empty set (nothing is live past the return), and the initial fact is also empty:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
  @Override
  public Set<Local> newBoundaryFact(Body body) {
    // Nothing is live after the method returns.
    return new HashSet<>();
  }

  @Override
  public Set<Local> newInitialFact() {
    // Conservative start: assume nothing is live (we add facts as we discover uses).
    return new HashSet<>();
  }

Meet operator

The meet is union: if x is live on either incoming path, it is live at the join point. meetInto mutates target in place so the solver can reuse the same set object across iterations at each join point, avoiding repeated allocation:

1
2
3
4
5
  @Override
  public void meetInto(Set<Local> fact, Set<Local> target) {
    // May analysis: a variable is live if it is live on at least one path → union.
    target.addAll(fact);
  }

Transfer function

The transfer function reads OUT[S] and writes IN[S] (backward direction). It first kills the variable defined by S (if any), then gens the variables used by S:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
  @Override
  public boolean transferNode(Stmt stmt, Set<Local> in, Set<Local> out) {
    // We compute IN from OUT (backward direction).
    // IN[S] = use(S) ∪ (OUT[S] − def(S))
    Set<Local> newIn = new HashSet<>(out);

    if (stmt instanceof JAssignStmt) {
      JAssignStmt assign = (JAssignStmt) stmt;
      // Kill: remove the defined variable (it is no longer "needed" upstream).
      if (assign.getLeftOp() instanceof Local) {
        newIn.remove(assign.getLeftOp());
      }
      // Gen: add any locals used on the right-hand side.
      addLocals(assign.getRightOp(), newIn);
    } else {
      // For all other statements, gen the locals they use.
      for (Value use : stmt.getUses()) {
        if (use instanceof Local) {
          newIn.add((Local) use);
        }
      }
    }

    if (newIn.equals(in)) {
      return false; // unchanged — solver can stop propagating backward from here
    }
    in.clear();
    in.addAll(newIn);
    return true;
  }

Step 5 — The worklist solver

The generic DataflowSolver drives the iteration. In the backward case it meets all successor IN facts into OUT, applies the transfer, and re-schedules predecessors whenever IN changed:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    // Seed the worklist with all nodes (conservative: assume every fact may change).
    Deque<Stmt> worklist = new ArrayDeque<>(nodes);

    while (!worklist.isEmpty()) {
      Stmt node = worklist.poll();

      if (analysis.isForward()) {
        // Forward: meet all predecessor OUT facts into this node's IN fact.
        F in = inFacts.get(node);
        for (Stmt pred : cfg.predecessors(node)) {
          analysis.meetInto(outFacts.get(pred), in);
        }
        // Apply the transfer function; if OUT changed, schedule successors.
        if (analysis.transferNode(node, in, outFacts.get(node))) {
          worklist.addAll(cfg.successors(node));
        }
      } else {
        // Backward: meet all successor IN facts into this node's OUT fact.
        F out = outFacts.get(node);
        List<Stmt> succs = cfg.successors(node);
        succs.addAll(cfg.exceptionalSuccessors(node).values());
        for (Stmt succ : succs) {
          analysis.meetInto(inFacts.get(succ), out);
        }
        // Apply the transfer function; if IN changed, schedule predecessors.
        if (analysis.transferNode(node, inFacts.get(node), out)) {
          worklist.addAll(cfg.predecessors(node));
        }
      }
    }

The solver terminates at the fixed point — when the worklist is empty because no fact changed in the last iteration.

Hand trace on Example.compute (simplified Jimple, abbreviated local names):

Statement IN (live before) OUT (live after)
return y {y} {} (boundary)
y = x (else branch) {x} {y}
y = c (if branch) {c} {y}
if a > 0 {a, b, c, x} {c, x} (union of both branches)
x = a + b {a, b, c} {a, b, c, x}
(entry) {a, b, c}

At the if join (looking backward), the two branches bring {c} and {x}, so their union is {c, x}. Then the if condition itself uses a, adding it.


Step 6 — Setting up and running the analysis

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
    AnalysisInputLocation inputLocation =
        PathBasedAnalysisInputLocation.create(
            Paths.get("src/test/resources/Liveness/binary"), SourceType.Application);
    View view = new JavaView(inputLocation);

    ClassType classType = view.getIdentifierFactory().getClassType("Example");
    SootClass sootClass = view.getClass(classType).get();

    MethodSignature methodSignature =
        view.getIdentifierFactory().parseMethodSignature("<Example: int compute(int,int,int)>");
    SootMethod method = sootClass.getMethod(methodSignature.getSubSignature()).get();
    Body body = method.getBody();

    // Run live variable analysis and retrieve the solver for querying results.
    DataflowSolver<Set<Local>> solver = LiveVariableAnalysis.analyze(body);

Step 7 — Verifying the result

These assertions are what CI checks on every pull request. If the SootUp API changes in a way that breaks the analysis, this test fails and the documentation-code gap is caught before it reaches the docs site:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
    for (Stmt stmt : cfg.getNodes()) {
      Set<Local> liveIn = solver.getInFact(stmt);

      if (stmt instanceof JReturnStmt) {
        JReturnStmt ret = (JReturnStmt) stmt;
        Immediate returnedValue = ret.getOp();

        // The variable being returned must be live just before the return.
        assertTrue(returnedValue instanceof Local, "return operand should be a Local");
        assertTrue(
            liveIn.contains((Local) returnedValue), "returned variable must be live before return");

        // No other locals should be live at the return point (nothing else is used after).
        assertFalse(
            liveIn.stream().anyMatch(l -> l != returnedValue),
            "no other variable should be live before return");
      }

      if (stmt instanceof JAssignStmt) {
        JAssignStmt assign = (JAssignStmt) stmt;
        // A variable that is defined and then immediately never used should NOT be live
        // in its own OUT set (it was just killed).
        // We spot-check: nothing defined here should appear in OUT (post-transfer).
        Set<Local> liveOut = solver.getOutFact(stmt);
        if (assign.getLeftOp() instanceof Local) {
          Local defined = (Local) assign.getLeftOp();
          // If this definition is the last use of the variable (i.e. it is dead code),
          // the defined variable will not appear in liveOut. We cannot assert this
          // universally because the variable might be used later. We just verify the
          // result is non-null (the solver ran).
          assertTrue(liveOut != null, "solver must produce a result for every stmt");
        }
      }
    }

What to try next