pyasm User's Guide V. 0.3
Grant Olson <kgo at grant-olson dot net>
Pyasm is a full-featured dynamic assembler written entirely in
dynamic, I mean that it can be used to generate and execute machine code
python at runtime without requiring the generation of object files and
It essentially allow 'inline' assembly in python modules on x86
Pyasm can also generate object files (for windows) like a traditional
standalone assembler, although you're probably better off using one of
freely available assemblers if this is you primary goal.
Pyasm currently requires python 2.6.
A simple Windows version of a hello_world.py program is as follows:
# Hello World in assembly: pyasm/examples/hello_World.py
from pyasm import pyasm
!PROC hello_world PYTHON
!CALL PySys_WriteStdout "Hello, World!\n\0"
ADD ESP, 0x4
A brief description of what is happening durring the pyasm call:
the globals() statement tells pyasm where to bind
newly created python
The !CHARS directive creates a string constant.
The !PROC and !ARG directives create a procedure
that matches the
standard CPythonFunction signature [PyObject* hello_world(PyObject*
PyObject* args) and create procedure initialization code.
The procedure calls python's PySys_WriteStdout
function. Since python functions
use CDECL calling conventions, we:
- PUSH the paramters onto the stack from right to left
- CALL the function
- Cleanup the stack ourselves
PyCFunctions must return some sort of python
object, so we:
- Load PyNone into the EAX register, which will become the return
- Add one to the reference count
6. The !ENDPROC directive ends the procedure and creates function
code. This creates a procedure called hello_world that would have the C
signature of PyObject* hello_world(PyObject* self, PyObject* args).
procedure loads hello_str onto the stack, calls the python interpreters
- Calling hello_world() executes the newly created function.
The rest of this document assumes that you know x86
assembly language. A
tutorial is beyond the scope of this document. If you don't know
language, you'll want to read an introductory text (such as The Art
Assembly Language) as well as downloading Volumes 2 and 3 of the IA-32
Architecture Software Developer's Manual for reference.
Like most assemblers, the command-line assembler contains a very
There two basic statements that can be used. An instruction
statement and an
assembler directive. Assembler directives contain
information that makes
your assembly a little easier to read than raw assembly code, such as
begining and ending of function; declaration of parameters, variables,
constants and data; and other stuff. Instruction Statements
consist of real
assembly instructions such as MOV [EAX+4],value
Additional notes specific to this assembler are as follows:
- Numbers use python's formatting scheme, so hex is represented as
0xFF and not FFh.
- Instructions and Registers must be in all caps. mov eax,0x0 is
Instruction statements are reasonably straightforward if you know x86
Assembler directives begin with an exclamation mark, followed by the
itself, and followed by any applicable parameters. Keep in mind that
directives are provided for the programmer's convienence. Anything that
done via a directive could be translated into raw assembly, it's just
|!CALL proc [arg arg]
||Procedure call framework
|!CHARS name value
||Create a character array (aka a string)
|!CONST name value
||Create a constant value.
||Provide a symbolic label for later ref.
|!PROC name [type]
||Begin a procedure.
|!ARG argname [size]
||Add an argument to a procedure def.
|!LOCAL varname [size]
||Add a local var to a procedure def.
||End a procedure
- !CALL proc [arg arg arg]
- A convienence function for procedure calling. PUSHes arguments onto
stack from right to left and calls the appropriate procedure. Stack
(if any) is still the programmer's responsibility.
- !CHARS name value
- Create a character array (aka a string)
- !COMMENT text
- Ignore this line.
- !CONST name value
- Just declares a constant that is replaced in subsequent occurances.
mind that this is resolved at compile time, so the values should really
be numbers. !CONST hello_world "hello world\n\0" is invalid.
- !LABEL name
- Provide a symbolic label to the current memory address. Primarily
loops, if-then logic, etc. You can use a label and hand-roll a
but you probably want to use the !PROC directive instead.
- !PROC name[type]
- Begin a procedure. This will emit the boilerplate code to start a
Arguments and Local variables can be declared with !ARG and !LOCAL
listed below. These declarations must occur before any instruction
statements or an error will occur. This will generate the boilerplate
function startup code, which consists of PUSHing the EBP register,
the current location of ESP, and translating arguments and local
into references via the offset of the EBP pointer. If the previous
didn't make any sense to you, just remember that the EBP register
be manipulatedin your code here or things will get screwed up.
- !ARG argname [size]
- An argument passed to a procedure via the stack. By default, we
size is 4 bytes although you can specify if you need to.
- !LOCAL varname [size]
- A local variable maintained on the procedure's stack frame.
- End a procedure. Emit the cleanup code as the caller's
Typically, usage is as simple as the hello world example listed
the pyasm function from the pyasm package and call it. globals blah
Assembly via the
calling pyasm is fine if you're just trying to inline some assembly
but if you're trying to dynamically generate assembly (such as writing a
compiler) you're better off accessing the api directly. This involves a
- import the assembler class from x86 asm and instantiate.
- Add instructions either as strings that need to be preprocessed or
via the api.
- generate an intermediate 'codepackage' by calling the .Compile()
- transform the codePackage to runtime memory via CpToMemory.
NOT IMPLEMENTED YET
If you really want to, a command-line asembler is available for
python pyassemble.py asmfile.asm
This will generate an object file asmfile.o that can be used by your
If you write assembly, chances are that you are going to crash your
app at one
point or another.
On Linux, you obviously have gdb.
Contrary to popular belief, there is a buildin command-line debugger
NT/2000/XP called ntsd.exe that can be used in a bind. If you're doing
serious work though, do yourself a favor and download the 18MB
for Windows." It includes an updated version of ntsd.exe and a version
simple Windows interface called WinDBG. You'll really want to download
you're getting serious about assembly debugging. Actual usage is beyond
scope of this document, but read up on setting up a symbol server.
After installing, you may want to register WinDBG as the
debugger by cd'ing to the program directory and issuing 'windbg -I'
cause WinDBG to spawn automatically when any program crashes or executes
3 instruction. It also has the added benefit of making friends and
think that you're a much more hardcore programmer than you really are.
jury is still out as to whether this impresses the ladies or not.
And yes, there is the Visual Studio .NET debugger. This is a great
when you're debugging C or VB code in an existing project. But it is
designed to work as part of an IDE. It gets a little wierd when
raw assembly or compiled code without the source floating around. As
WinDBG's gui looks like by todays standards, it is a lot more convienent
Source output - *not implemented yet
I plan to provide a hook via the logging module so you can obtain
of the source at runtime.
Any and all patches will be considered. If you're planning on
anything serious you may want to run it by me so you don't end up
time. There is some low-hanging fruit out there though.
I haven't added all of the x86 instructions yet. Most of it involves
and pasting from the IA32 Intel Software Architecture Manual Volumes 2
For standard instructions, you should just be able to add the
to x86inst.py and creating a test in test_instructions.py. SIMD and FPU
operations will probably require some additional hacking.
There is currently code that converts windows COFF objects to a
object model and vice versa. This allows you to create standard object
for traditional linking. An equivilent for ELF files would allow you to
same thing in Linux. Refer to the coff*.py files to see how this format
New in version 0.3
- You can now run the test cases via mingw as well as msvc. Set the
in test/linkCmd.py appropraitely. Thanks to Markus Lall for figuring
how to do this.
- Updated to python 2.6.
- Updated MSVC project files to VC 2008.
- Python structure values are loaded automatically if desired. For
assuming EAX is a pointer to a string MOV
will change the first four letters of the string to B's.
- Preliminary debugging console to view generation of assembly at
stages in the compilation pipeline.
- Implicit string variable creation is now possible. e.g. "PUSH
now works instead of requiring "!CHARS foo 'foon0'" and "PUSH foo"
- New !CALL assembler directive handles throwing arguements onto the
e.g. "!CALL foo bar baz bot" instead of "PUSH bot" "PUSH baz" "PUSH
- Fixed tokenizer for instruction definitions with numbers in them
- Now includes an 'examples' directory that should be easier for users
read than the test directory.
- Show symbol name in disassembly if it exists.