RMasm, a new generation of macro assembler

This blog post was originally posted on my previous blog code4k
I have not updated this website for a while... that's because I was actively working on a new exciting project called RMasm!

What is RMasm? It's a new generation of macro assembler, using the Ruby language as its main "macro" language and supports, in the beginning, the x86 assembler syntax. You may ask "Why developing another assembler while there are plenty of them around? (look at Nasm, Fasm, Masm, HLA...)". Well, while I was coding the new softsynth for FRequency in assembler (project that is temporally suspended because of RMasm, oops, sorry ulrick for the delay!), I found current assemblers to be very limited by their macro language. Although I was using Masm that is recognized to have a "strong" macro language, I found this language very ugly, not really powerful, syntax hard to learn and in definitive, quite limited.

So i started to think about a new kind of assembler...


An assembler is... an interpreter


I first tried to prototype something with Irony, which is a great C# library to write BNF grammars directly inside the C# language, instead of going through the old Yacc/Lex/Bison/ANTLR whatever tool chain. I was trying to mimic Masm syntax but i had some problems developing a macro expansion system. I almost decided to abandon the idea... but found an interesting common characteristic of assemblers : a macro assembler is in fact also an interpreted language (the "macro" language used to develop macro). Let's look at the following Masm example:

; MASM assembler example file
; --- CStr(): macro function to define a text constant
CStr macro text
local text_var
.const ; Open the const section
text_var db text,0 ; Add the text to text_var
.code ; Reopen the code section
exitm ; Return the offset of test_var address
endm

; Try to call CStr macro
.code ; <-- Original text 1
mov [eax], CStr("Column 1") ; <-- Original text 2
When being assembled with Masm, this file is interpreted by Masm, and after one preprocessing pass, the following file is generated :

.code      ; <-- Original text 1
.const ; <-- text inserted by macro CStr
text_var0001 db "Column 1",0 ; <-- text inserted by macro CStr
.code ; <-- text inserted by macro CStr
mov [eax],offset text_var0001 ; <-- Modified text 2
The macro CStr is allocating a string in the const section and returning the address of the string to the caller. It's then easier to write string declaration in assembler. As you can see, the CStr macro is literally called by the preprocessor but one thing that I found not common to say, the C preprocessor, is that the macro is able to output some text to the previous line the macro is called (see <-- text inserted ) while being able to return a from the macro call ( offset text_var0001).

So that's why I'm saying that a macro language is in fact an interpreted language (although there is nothing new here, but it was at least new to my mind! ;) ). Before interpreting the mov [eax],... instruction to translate it to its binary representation, the assembler is performing several pass to expand the macros in order to get a plain file with only processor instructions.

Using Ruby as the main interpreter language...

Now, suppose that we use a well-known interpreted language to mimic this feature. Let's take Ruby :

eax = "eax"
def mov(dst,src)
puts "We are in mov #{dst},#{src}"
# Here generate the binary representation of the instruction
end

def CStr(str)
puts "We are in CStr #{str}"
# open the section and add the str to a db * declaration
label_str = "this_is_the_label" # here create a label from (str)
return label_str # should return an .offset
end

mov [eax], CStr("Column 1")
This Ruby program (you can test it directly with an online Ruby interpreter like IronRuby) will output the following result:
We are in CStr Column 1
We are in mov ["eax"],this_is_the_label
And here is a main concept of RMasm : It's build with Ruby being the main interpreter language as well as extending the language itself to allow specific assembler declaration.

A RMasm assembler file is in fact a Ruby file which uses the rmasm ruby module to allow new syntax inside the language itself.

... to build a new language and compiler : RMasm

Now, look at the resulting RMasm assembler syntax example for x86:

# Specify that we want to use the x86 assembler
use :x86

# Define a structure
struct :MyStructure do
db :my_field1
dw :my_field2
dd :my_field3
end

# Open the data section
section:data

MyStructure :my_structure_var # Declare a structure
db :this_is_a_text << "This is a text directive" # Declare a text

# Open the code section
section:code

__:MyProcedure.global # Declare the label MyProcedure and make it global (public)
xor eax,eax
mov [esi],eax
ret
With RMasm, this file will be able to be compiled to a .obj or .out object! RMasm is not relying on a complex Lex/Yacc parser but is leveraging on the power of Ruby to create a new Domain Specific Language aka DSL. RMasm is indeed a DSL language based on Ruby, nothing else.

What are the unique features of RMasm?

You may ask, "so well, you are designing a new assembler using an existing interpreted language to speed-up the development, but why RMasm could help me while there are already a bunch of existing assemblers?"

The short answer is : RMasm could help you to enhance your experience writing in an assembler language. Moreover you could enhance the language itself to meet your needs. Not only RMasm provides raw assembler, and HLL (High Level Language) as well, but it provides a genuine way to extend the language itself, using operator overloading in Ruby or whatever!

With RMasm, it will be possible to extend the language to accept the following syntax :

eax << 0 
ebx << eax
which would be equivalent to :
xor eax, eax
mov ebx, eax
with the following code to extend RMasm :

# Reopen the RMasm::Register class to add a new operator
class RMasm::Register
def <<(arg)
if arg == 0
xor self, self # Generate xor instruction if arg is 0
else
mov self, arg # Generate plain mov instruction else
end
end
end
What we have done is extending the class RMasm::Register (part of the RMasm framework) to add a new operator << that generates instructions when calling the operator << to a register.

With RMasm, expecting features are to be able to:
  • To debug your macro assembler file, not the binary exe, but the file being generated! Because a rmasm file is in fact a Ruby file, you can debug your asm file. This is one of the major features that makes RMasm really exciting! Think of it : you have developed a complex macro. Instead of putting several "print" to debug your assembler file (which is what we are doing with current assemblers) you can debug the file with a classic Ruby debugger. This is what i'm using to develop RMasm. Now, i'm able to step in macros and see what's in action!
  • Extend the language, with operator overloading, block of code...etc. Because RMasm is extending  Ruby to add assembler syntax, RMasm can be extended as well to add your own high level assembler syntax. RMasm can be considered as a meta-assembler.
  • Enhanced data declaration: with RMasm, data declaration is going to be much easier than what we have in current assemblers on even in the C language. For example, with RMasm, the following declaration is performing a complex data initialization (leveraging on Ruby data structures : hash {}, ranges (0..15), arrays []... and so on!) :

    # initialize an array of 50 word (2 bytes) with the values
    # [0] = 0, [1] = 2, [2] = 2, [3] = 3
    # [5] = 9, [10] = 9
    # [11..49] = 65535
    dw :my_label[50] << [0,1,2,3] << { 5 => 9, 10 => 9} << { (11..-1) => 65535 }

  • Namespace support: with RMasm, namespaces are supported (through the concept of Ruby's module) allowing to wrap any existing API in a nice way (here inside a Win32 module):
    Win32.MessageBox "I'm here from an assembler program"
  • Supports for advanced calling convention: with RMasm, it will be possible to simplify calling COM object using the object-oriented characteristic of Ruby. This is really interesting to improve the language experience, allowing to call any COM interface like we use to do in a "high level language" like C++. While, still being able to check, modify, write-your-own generated assembler directive
    # A call to
    directx.Draw(arg1, arg2)

    # will be translated by RMasm to something
    push arg2
    push arg1
    push directx
    mov eax, directx.vtable
    call [eax + DirectX_Draw]
  • Supports for "precompiled" header: with RMasm, it is going possible to have some kind of precompiled header, most notably for example for all Windows includes, speeding up compilation time.
  • Supports for multiple architecture: RMasm is also a framework to build new assembler. See the "use :x86" directive in the RMasm example. I expect to be able to write "use :z80", or "use :m68020"... while still sharing the same syntax for the struct, data, section, procedure declaration!
  • And much more! The possible extension to RMasm will be so wide that it's impossible to anticipate what we will be able to achieve with it, until we try it!

Next?

Well, you may have noticed that I'm really excited about this project. The thing is, to make it possible, I need to write a usable assembler... and even if the development is going to be much easier than for a regular assembler, it will however take some time to build a fully functional assembler (preferably supporting x86 assembler syntax)!

Of course, because Ruby is running on several platforms, RMasm will be able to run on the same platform. Currently, RMasm is in pre-alpha release, version 0.1.1, with a growing architecture that is not yet achieved, not able to generate a .obj or anything similar... so, It will take some times to have a workable version. Hope that I will be able to publish an alpha release in Q1-2010.

I believe that this assembler could somehow "revolutionize" the little assembler community... but I might be wrong... although, I feel that with this kind of assembler, demo size coding is going to be a bit easier!

But before going much further, I would like to get some feedback about this starting project. What do you think about this? As an assembler programmer? As a C programmer?

____________

I have just opened the RMasm Website. This is a basic structure just to settle things... hope that this project will go up to the 1.0 version!

Comments