Jimple
What is Jimple? Jimple is the intermediate representation IR of Soot, and thus SootUp. Soot's intention is to provide a simplified way to analyze JVM bytecode. JVM bytecode is stack-based, which makes it difficult for program analysis. Java source code, on the other hand, is also not quite suitable for program analysis, due to its nested structures. Therefore, Jimple aims to bring the best of both worlds, a non-stack-based and flat (non-nested) representation. For this purpose Jimple was designed as a representation of JVM bytecode which is human readable.
Info
To learn more about jimple, refer to the thesis by Raja Vallee-Rai.
Lets have a look at the following Jimple code representing Java code of a HelloWorld class.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | |
1 2 3 4 5 6 7 8 9 10 11 | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | |
Why not work directly on source or bytecode?
This is a natural question. Each representation has a fundamental problem for analysis:
Source code is readable, but it is not always available — you often only have a
compiled .jar. Even when you do have source, nested expressions like
a.foo(b.bar() + c) require you to invent temporary variables mentally before you can
reason about intermediate values.
Bytecode is always derivable from the .class file, but the JVM is a stack machine:
instructions push and pop an implicit operand stack. There are no named variables between
instructions. Tracking "what value is currently at position 2 on the stack after this
branch?" is surprisingly painful to reason about automatically.
Jimple is the best of both worlds. It is derived from bytecode (no source needed),
but expressed as a register machine: every value is held in a named Local, every
operation touches at most three operands (three-address code), and there are no nested
expressions. What the JVM handles implicitly, Jimple makes explicit. The result is a
representation where "which local variables flow into this assignment?" has a direct,
mechanical answer.
The Java Sourcecode is the easiest representation - So why all the fuzz and just use that?
Sometimes we have no access to the sourcecode but have a binary with the bytecode.
For most People reading bytecode is not that intuitive. So SootUp generates Jimple from the bytecode.
Jimple is very verbose, but makes everything explicit, that the JVM does implicitly and transforms the stack-machine strategy by a register-machine strategy i.e. Variable (Local) handling .
Jimple Grammar Structure
Jimple mimics the JVMs class file structure. Therefore it is object oriented. A Single Class (or Interface) per file. Three-Address-Code which means there are no nested expressions. (nested expressions can be modeled via Locals that store intermediate calculation results.)
Signatures and ClassTypes
Signatures are used to identify Classes,Methods or Fields uniquely/globally. Sidenote: Locals, do not have a signature, since they are referenced within method boundaries.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | |
1 2 3 4 5 6 7 8 9 10 11 | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | |
SootClass
A SootClass consists of SootFields and SootMethods.
It is referenced by its global identifier the ClassType like java.lang.String.
1 2 3 4 5 6 7 8 9 10 | |
1 2 3 | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | |
SootField
A SootField is a piece of memory which can store a value that is accessible according to its visibility modifier.
It is referenced by its FieldSignature like <java.lang.String: int hash>.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
1 2 3 4 5 | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | |
SootMethod and its Body
The interesting part is a method. A method is a "piece of code" that can be executed.
It is referenced by its MethodSignature like <java.lang.Object: java.lang.String toString()>.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | |
1 2 3 4 5 6 7 8 9 10 | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | |
More about the Body of the SootMethod.