First steps using Soot 2.3.0 as a command-line tool
Eric | August 21, 2008This is the first of what will be a series of blog posts about frequently asked questions with using Soot. I will try to cover different topics like using Soot from the command line, as a framework and in form of its Eclipse plug-in. This is basically a user-friendly digested version of all the fabulous Soot tutorials that we have online already. Today’s topic will be on Soot’s command line and phase options.
Obtaining Soot
You can always download the latest release version of Soot from the official Soot download page. There are a bunch of different options to choose from but usually you will be needing the following:
- sootclasses-x.y.z.jar (the main Soot distribution)
- jasminclasses-x.y.z.jar (the bytecode assembler that Soot uses to create .class files)
- polyglotclasses-a.b.c.jar (the compiler front-end that Soot uses to parse .java files)
We download these three files and now we are ready to give Soot a try:
mucuna /tmp/soot $ java -cp sootclasses-2.3.0.jar:jasminclasses-2.3.0.jar:polyglotclasses-1.3.5.jar soot.Main Soot version 2.3.0 Copyright (C) 1997-2008 Raja Vallee-Rai and others. All rights reserved. ...
Bleeding-edge version (nightly build)
For the really brave among you, Ondrej Lhotak provides a nightly build that is directly drawn from our Subversion repository. Usually the latest nightly build is the most stable version of Soot because tend to test code before we commit it. However, this may not always be true.
Soot’s command line
Ok, so it seems to be working but what can we do with it now? Let’s have a look at the command line options:
mucuna /tmp/soot $ java -cp sootclasses-2.3.0.jar:jasminclasses-2.3.0.jar:polyglotclasses-1.3.5.jar soot.Main -help General Options: -h -help Display help and exit -pl -phase-list Print list of available phases -ph PHASE -phase-help PHASE Print help for specified PHASE -version Display version information and exit -v -verbose Verbose mode ...
The full list of command line options is always available here and I encourage every Soot beginner to have a look at this document.
Processing single files
Soot in general processes a bunch of classes. These classes can come in one of three formats:
- Java source code, i.e. .java files,
- Java bytecode, i.e. .class files, and
- Jimple source, i.e. .jimple files.
In case you don’t know yet, Jimple is Soot’s primary intermediate representation, a three-address code that is basically a sort of simplified version of Java that only requires around 15 different kinds of statements. You can instruct Soot to convert .java or .class files to .jimple files or the other hand around. You can even have Soot generate .jimple from .java, modify the .jimple with a normal text editor and then convert your .jimple to .class, virtually hand-optimizing your program. But we are getting off-track here…
For brevity, in the following, I will abbreviate the classpath…
sootclasses-2.3.0.jar:jasminclasses-2.3.0.jar:polyglotclasses-1.3.5.jar
by just <soot>.
The principle way to have Soot process two classes A and B is just to add them to the command line, which makes them “application classes”:
mucuna /tmp/soot $ ls *.java A.java B.java mucuna /tmp/soot $ java -cp <soot> soot.Main A B Soot started on Thu Aug 21 08:26:41 GMT-05:00 2008 Exception in thread "main" java.lang.RuntimeException: couldn't find class: A (is your soot-class-path set properly?)
Whooops, what went wrong there? Well, I omitted an important detail: Soot has its own classpath!
Soot’s classpath
Soot has it’s own classpath and will load files only from JAR files or directories on that path. By default, this path is empty and therefore in the above example Soot does not “see” the classes A and B although they exist. So let’s just add the current directory “.”:
mucuna /tmp/soot $ java -cp sootclasses-2.3.0.jar:jasminclasses-2.3.0.jar:polyglotclasses-1.3.5.jar soot.Main -cp . A B Soot started on Thu Aug 21 08:32:13 GMT-05:00 2008 Exception in thread "main" java.lang.RuntimeException: couldn't find class: java.lang.Object (is your soot-class-path set properly?)
What’s wrong now? Apparently Soot was able to find A and B (at least it doesn’t complain about these any more) but now it’s missing java.lang.Object.
Why does Soot care about java.lang.Object anyway? In order to do anything meaningful with your program, Soot needs to have typing information and in particular it needs to reconstruct types for local variables and in order to do so it needs to know the complete type hierarchy of the classes you want to process.
Regarding the exception, there are three ways to resolve it:
- add rt.jar to your classpath
- add the –pp option, given your CLASSPATH variable comprises rt.jar or JAVA_HOME is set correctly
- use the –allow-phantom-refs option (not recommended)
In the first option you add your JDK’s rt.jar to Soot’s classpath (not the JVM’s classpath!). This JAR file contains the class java.lang.Object:
mucuna /tmp/soot $ java -cp <soot> soot.Main -cp .:/home/user/ebodde/bin/sun-jdk1.6.0_05/jre/lib/rt.jar A B Soot started on Thu Aug 21 08:42:09 GMT-05:00 2008 Transforming B... Transforming A... Writing to sootOutput/B.class Writing to sootOutput/A.class Soot finished on Thu Aug 21 08:42:12 GMT-05:00 2008 Soot has run for 0 min. 3 sec.
Heureka! This seems to have worked. (yes) Soot successfully processed the two .java files and placed resulting .class files into the sootOutput folder. Note that in general, Soot will process all classes you name on the command line and all classes referenced by those classes.
Beware, though, a common mistake is the following:
mucuna /tmp/soot $ java -cp <soot> soot.Main -cp .:~/bin/sun-jdk1.6.0_05/jre/lib/rt.jar A B Soot started on Thu Aug 21 08:43:43 GMT-05:00 2008 Exception in thread "main" java.lang.RuntimeException: couldn't find class: java.lang.Object (is your soot-class-path set properly?)
What went wrong? Well, you tried to use “~” because that points to your home directory, no? Well yes, but the problem is that usually “~” is expanded by the shell, but not in this case. Soot gets the raw “~” string as a command line option and currently Soot is unable to expand that string into the right string for your home directory. So always use full or relative paths in Soot’s classpath. (wait)
The second option is to use –pp:
mucuna /tmp/soot $ java -cp <soot> soot.Main -cp . -pp A B Soot started on Thu Aug 21 08:47:42 GMT-05:00 2008 Transforming A... Transforming B... Writing to sootOutput/A.class Writing to sootOutput/B.class Soot finished on Thu Aug 21 08:47:46 GMT-05:00 2008 Soot has run for 0 min. 3 sec.
Wow, that was much easier than adding this dawn classpath all the time, wasn’t it? Exactly and that’s why we added this option. –pp stands for “prepend path” and it means that Soot automatically adds the following to it’s own classpath (in that order):
- the contents of your current CLASSPATH variable,
- ${JAVA_HOME}/lib/rt.jar, and
- if you are in whole-program mode (i.e. the –w option is enabled; more to come) then it also adds ${JAVA_HOME}/lib/jce.jar
The third way (not recommended) to make Soot sort of happy is the option –allow-phantom-refs:
mucuna /tmp/soot $ java -cp <soot> soot.Main -allow-phantom-refs -cp . A B Soot started on Thu Aug 21 08:52:35 GMT-05:00 2008 Warning: java.lang.Short is a phantom class! Warning: java.lang.Class is a phantom class! Warning: java.lang.Character is a phantom class! ... Transforming B... Transforming A... Writing to sootOutput/B.class Writing to sootOutput/A.class Soot finished on Thu Aug 21 08:52:37 GMT-05:00 2008 Soot has run for 0 min. 1 sec.
So what does that do? Basically this option tells Soot: “Well, I really don’t want to give you the classes you are missing (maybe because you just don’t have those classes) but please make a best effort even without them.” Soot creates a “phantom class” for each class that it cannot resolve and tells you about it. Note that this approach is very limited and in many cases does not lead to the results you need. Only use this option if you know what you are doing.
Processing entire directories
You can also process entire directories or JAR files using Soot, using the –process-dir option:
mucuna /tmp/soot $ java -cp <soot> soot.Main -cp . -pp -process-dir . Soot started on Thu Aug 21 09:01:12 GMT-05:00 2008 Transforming A... Transforming B... Writing to sootOutput/A.class Writing to sootOutput/B.class Soot finished on Thu Aug 21 09:01:15 GMT-05:00 2008 Soot has run for 0 min. 3 sec.
To process a JAR file, just use the same option but provide a path to a JAR instead of a directory. Nice, eh? Be careful, though: If you apply the very same command again to the very same folder you will run into a problem now:
mucuna /tmp/soot $ java -cp <soot> soot.Main -cp . -pp -process-dir . Soot started on Thu Aug 21 09:02:29 GMT-05:00 2008 Exception in thread "main" java.lang.RuntimeException: Error: class A read in from a classfile in which sootOutput.A was expected.
What happened? Well, as I noted earlier, Soot places the generated .class files into the folder sootOutput, which resides in the current directory “.”. Therefore Soot now processed the previously generated files, at the same time complaining about the fact that a class of name “A” resides at location ./sootOutput/A and therefore should actually have the name sootOutput.A, i.e. be in the sootOutput package. Therefore, when using the –process-dir option also use the –d option to redirect Soot’s output:
mucuna /tmp/soot $ java -cp <soot> soot.Main -cp . -pp -process-dir . -d /tmp/sootout Soot started on Thu Aug 21 09:06:29 GMT-05:00 2008 Transforming A... Transforming B... Writing to /tmp/sootout/A.class Writing to /tmp/sootout/B.class Soot finished on Thu Aug 21 09:06:32 GMT-05:00 2008 Soot has run for 0 min. 2 sec.
This redirects Soot’s output to /tmp/sootout, which is not a sub-directory of the current directory. Voila.
Processing certain types of files (.class / .java / .jimple)
Assume you have a directory that contains both A.java and A.class and you invoke Soot as before. In this case Soot will load the definition of A from the file A.class. This may not always be what you want. The –src-prec option tells Soot which input type it should prefer over others. There are four options:
- c or class (default): favour class files as Soot source,
- only-class: use only class files as Soot source,
- J or jimple: favour Jimple files as Soot source, and
- java: favour Java files as Soot source.
So e.g. -src-prec java will load A.java in the above example.
Application classes vs. library classes
Classes that Soot actually processes are called “application classes”. This is opposed to “library classes”, which Soot does not process but only uses for type resolution. Application classes are usually those explicitly stated on the command line or those classes that reside in a directory referred to via –process-dir.
When you use the -app option, however, then Soot also processes all classes referenced by these classes. It will not, however, process any classes in the JDK, i.e. classes in one of the java.* and com.sun.*packages. If you wish to include those too you have to use the special –i option, e.g. -i java.. See the guide for this and other command line options.
Output of .jimple or .java files
Soot cannot only produce .class files, it can also produce .jimple and .java files and others. You can select the output format using the –f option. If you use –f dava to decompile to Java please make sure that the file <jre>/lib/jce.jar is on Soot’s classpath.
Phase options
Soot supports hundreds of very fine grained options that allow you to tune all the analyses and optimizations to your needs, directly from the command line.
The general format of these command line options is -p PHASE OPT:VAL. A complete document of all phase options is available here. For instance, let’s say that we want to preserve the names of local variables (if possible) when performing ana analysis within Soot. Then we can add the command line option -p jb use-original-names:true. A shortcut is -p jb use-original-names, where the true is implicitly assumed.
That’s all for today! (clap)