// NOTE: free encode after usage to avoid leaking memory ks_free(encode);
// close Keystone instance when done ks_close(ks);
To compile this file, we need a Makefile like below.
Readers can get this sample code in a zip file here. Compile and run it as follows.
The C sample is intuitive, but just in case, readers can find below the explanation for each line of test1.c.
Line 3: Include header file keystone.h before we do anything.
Line 6: Assembly string we want to compile. The code in this sample is X86 32bit, in Intel format. You can either separate assembly instructions in this string by “;” or “\n”.
Line 10: Declare a handle variable of the pointer to data type ks_engine. This handle will be used for every API of Keystone.
Line 11: Declare an error variable of the data type ks_err. This variable will be used to verify the result returned from all the API.
Line 12: Declare a variable to contain number of statements this program will compile (line 22).
Line 13: Declare encode, a pointer variable of the type unsigned char, which points to an array containing the encoding of compiled instructions.
Line 14: Declare size, a variable to contain the size (in bytes) of encode variable.
Line 16 ~ 20: Initialize Keystone with function ks_open. This API accepts 3 arguments: the hardware architecture, hardware mode and pointer to Keystone handle. In this sample, we want to assemble 32-bit code for X86 architecture. In return, we have the handle updated in variable ks. This API can fail in extreme cases, so our sample verifies the returned result against the error code KS_ERR_OK.
Line 22: Compile the input assembly string using the API ks_asm with the handle we got from the ks_open. The 2nd argument of ks_asm() is the assembly string we want to compile. The 3rd argument is the address of the first instruction, which can be ignored in some architectures such as X86. In return, this API gives back a dynamically allocated memory in the next argument encode, as well as its size in size. Keystone also lets us know how many statements in the input assembly was handled during this process, thus give us a hint where it stops in case the input has error.
Line 25 ~ 34: Print out instruction encoding of the input assembly returned in the memory array kept in encode variable.
Line 37: Using the API ks_free() to free memory kept in variable encode, which was allocated by ks_asm().
Line 40: Close the handle when we are done with the API ks_close().
By default, Keystone accepts X86 assembly in Intel syntax. Keystone has an API named ks_option to customize its engine at run-time. Before running ks_asm, we can switch to X86 AT&T syntax by calling ks_option like below.
Sample code test2.c demonstrates X86 AT&T support.
2. Tutorial for Python language
The following code presents the same example as above, but in Python, to compile 32-bit assembly code of X86.
1 2 3 4 5 6 7 8 9 10 11 12
# separate assembly instructions by ; or \n CODE=b"INC ecx; DEC edx"
try: # Initialize engine in X86-32bit mode ks=Ks(KS_ARCH_X86,KS_MODE_32) encoding,count=ks.asm(CODE) print("%s = %s (number of statements: %u)"%(CODE,encoding,count)) exceptKsErrorase: print("ERROR: %s"%e)
Readers can get this sample code here. Run it with Python as follows.
This Python sample is intuitive, but just in case, readers can find below the explanation for each line of test1.py.
Line 1: Import Keystone module before using it.
Line 4: Assembly string we want to compile. The code in this sample is X86 32bit, in Intel format. You can either separate assembly instructions in this string by “;” or “\n”.
Line 8: Initialize Keystone with class Ks. This class accepts 2 arguments: the hardware architecture and hardware mode. This sample deals with 32-bit code for X86 architecture. In return, we have a variable of this class in ks.
Line 9: Compile assembly instruction using method asm. In return, we have a list of encoding bytes, and number of input statements that Keystone handled during compilation process, which gives us a hint where it stops in case the input has error.
Line 10: Print out the instruction encoding and number of assembly statements processed.
Line 11 ~ 12: handle exception in the type of KsError, which is triggered when something is wrong.
By default, Keystone accepts X86 assembly in Intel syntax. To handle X86 AT&T syntax, we can simply switch to syntax AT&T like below.
Sample code test2.py demonstrates X86 AT&T support.
3. More examples
This tutorial does not explain all the API of Keystone yet.
For C sample, see code in directory samples/ in Keystone source.
IDA Pro is the de-facto binary analysis tool widely used in the security community. While browsing the assembly code in IDA, we may want to modify the original code to change the behavior of the executable file. IDA offers this functionality in its menu “Edit \ Patch program \ Assemble”, in which we can type in new assembly to overwrite the existing code, as in the screenshot below.
However, this built-in assembler suffers from several significant issues, as follows.
Except X86, it does not support any other architectures. Due to this, when we open the menu on an ARM binary, IDA refuses with a message “Sorry, this processor module doesn’t support the assembler”.
Even on X86, IDA assembler fails to handle many simple X86_64 instructions. For example, the instruction “PUSH RAX” is refused with error “Invalid operand”.
We anticipated that IDA assembler misses all the latest X86 instructions (such as those from SGX extension), but actually it also fails on many not-so-modern X86 instructions. For example, AVX instruction “VDIVSS XMM2, XMM6, XMM4” is (wrongly) considered illegal with error “Invalid mnemonic”.
X86 assembler seems quite buggy, with many minor issues here and there. Example: if you enter invalid code “PUSH ESI” on an X86_64 binary, IDA assembler would happily accept that, but then overwrite the existing code with one byte “56”, which is actually for “PUSH RSI”.
If the new patched code is shorter than the original code, the orphan bytes after the new code are kept intact, which is mostly undesired. In the example below, 3 original bytes “48 89 FB” (for “MOV RBX, RDI”) are overwritten with 2 bytes “31 C0” (for “XOR EAX, EAX”). The orphan byte “FB” is still there, and decoded as instruction “STI”. Due to this, we need to perform one more step to patch this left-over byte with “NOP” opcode. Unfortunately, IDA does not do clear the orphan code.
IDA assembler does not log any changes, making it hard to track what and where code were modified. We would have to keep note on what we patched, which is cumbersome.
Unfortunately, there was no solution for all the above problems of IDA assembler. We decided to accept the challange, and build a new assembler plugin for IDA named Keypatch to solve all the existing issues.
Our tool offers some nice features as follows.
Keypatch leverages the power of Keystone assembler engine, so it can support 8 CPUs: X86, ARM, ARM64, Hexagon, Mips, PowerPC, Sparc & SystemZ. On each architecture, Keystone is able to handle the latest CPU instruction sets.
Our GUI makes it much easier to see what you would do: it shows the original code (before modifying), and new code that will patch your binary.
We have an option to automatically pad all the orphan bytes with NOP opcode.
Keypatch can understand & accept IDA symbols, so you can conveniently use them in assembly code, without having to convert them to immediates beforehand.
We make it easier to track what and where the code were modified by logging all the changes in the “Output” window of IDA, with content like:
Keypatch has another functionality in its own menu “Edit \ Keypatch \ Assembler”, in which you can experimentally assemble arbitrary code on any architectures supported by Keystone. This convenient tool does not modify the original binary under analysis, so can be an extra weapon in reversing process.
Last but not least, Keypatch is open source, so it easy to fix bugs & add more features.
To summary, Keypatch has everything to replace the internal IDA assembler because it can do more, and do better. We believe that this little IDA plugin will be indispensible in your toolset of reverse engineering.
Learn quick from this tutorial on how to program with Keystone in C & Python.
This version fixes some important bugs inside the core of Keystone (especially X86 assembler), added some new bindings & made some minor improvements, without breaking compatibility. All users of Keystone are encouraged to upgrade to v0.9.1.
NOTE: Keystone is now available on PyPi in keystone-engine package. This package includes the core, Cmake is required to build the shared library. Then Python users can easily install Keystone with:
$ sudo pip install keystone-engine
See below for the changelog.
Core & tool
Fix a segfault in kstool (on missing assembly input).
kstool now allows to specify instruction address.
Build Mac libraries in universal format by default.
Add “lib32” option to cross-compile to 32-bit *nix (on 64-bit system).
Add “lib_only” option to only build libraries (skip kstool).
Learn quick from this tutorial on how to program with Keystone in C & Python.
We would like show our gratitude to all the Indiegogo supporters, who financially contributed to the development of Keystone. We will never forget all the testers for incredible bug reports & code contributions during the beta phase! Without the invaluable helps of community, our project would not have gone this far!
Keystone aims to lay the ground for innovative works. We look forward to seeing many advanced research & development in the security area built on this engine. Let the fun begin!
We are very excited to announce that we already released Keystone source code to some early adopters! Together, we will work hard to find and clean as many bugs as possible before making it public later.
We would like to thank those who are willing to put valuable time & efforts to help us in this phase! Believe us, the code are in good hands right now :-)
Our Indiegogo fundraising for Keystone project was successful with 165% funded! Thanks a lot to all the awesome 99 backers, you are the motivation and the very reason why Keystone sees the light of day, and will become available to public soon!
Since our stretch goal was also met, we will have support for GNU Gas & Nasm syntax in the first release of Keystone.
Here is our quick plan:
We will ship stickers to all the backers from level-32 and up soon from next week.
We are still collecting requests on T-shirt at the moment. We will order T-shirt printing after that, then post to all the backers from level-128 and up. We hope to start shipping from next week.
We are cleaning up code and fixing some issues. All the backers from level-512 and up will get the source code in about 2 weeks.
We will send the source code to all the backers of level-256 sometime in May.
If we can fix all the major issues, the first release of Keystone to pubic will be out in May or June.
More update about our project will be shared soon.
We have passed the initial IndieGogo funding goal of $10000 in just 1 week! Thanks a lot to everybody who believed in this project and supported us, you are awesome!
With about 10 more days to go, we decided to set out a stretch goal of $15000 to do support more types of assembly syntax.
Our motivation is that Keystone is based on LLVM, which only supports LLVM syntax. If we only compile simple instructions, we will be fine as we are. But if the code has directives, macros, comments and so on, then the assembly syntax matters because each assembler has different way to express their languages.
If the stretch goal of $15000 is reached, the first public version of Keystone will support GNU Gas & Nasm syntaxes, while leaving the support for other assemblers open through a plugin interface.
Things we must do to support Gas & Nasm for this stretch goal.
Investigate the syntaxes of these assemblers.
Refactor the assembly parser of Keystone to support external syntaxes.
Design a plugin interface for external assembly syntaxes, so that it is easy to add more syntaxes in the future.
Implement Gas & Nasm support, and allow to choose non-default syntax at run-time.
When ready, we can enable these syntaxes when setting up the engine, like below.
With Python, this can be done simply like below.
We think the option of freely choosing the assembly syntax is important. Please help to spread the news of this stretch goal, and do back us so we will finally have a nice assembler with full feature when Keystone is released!
We are very excited to launch the crowd-funding campaign for Keystone assembler engine on IndieGogo!
A multi-architecture, multi-platform open source assembler framework is a missing piece in a chain of fundamental engines for reverse engineering. After Capstone & Unicorn, Keystone is the latest of our on-going effort to bring better tools to the security community.
Keystone involves a lot of hardwork, however. Therefore, we hope to have community support via this IndieGogo campaign, so we can push this to the end goal, and all of us can finally have a nice asseembler engine!
Get behind Keystone project, so together we can solve the problem of missing an assembler framework once, and for all: https://igg.me/at/keystone/.
Update: if you do not want to use IndieGogo, we accept donation on Paypal & Bitcoin. All the perks from IndieGogo still apply.