java.lang.Object
com.pnfsoftware.jeb.core.units.code.asm.decompiler.ir.compiler.ECompiler

public class ECompiler extends Object
Compiler of IR expressions, IR statements, IR CFG, IR routines, and IR programs (code and data). The compiler takes a string representation of some IR, and generate IR code for it. The IR compiler is mostly used for testing purposes.

Current limitations:

  • The compilation of IEUntranslatedInstruction is not supported.
  • Wildcard types are supported in << ... >> brackets. Leave the type info empty (<<>>) for EImm to specify they should be mutable but not carry any type information (this is different than <<?>> which specify a wildcard type of 1 slot with no information.

Rules and syntax:

  • The Syntax is similar to the IR formatting, except bitsizes must be present and etypes must be absent(to be supported at a later time), eg s32:var1 = i32:01h
  • Unless a wanted size is specified, statements are auto-assigned a size of 1
  • Optional wanted size for statements: "/SIZE: ...", eg /2: nop will create an ENop of size 2 instead of the default size 1
  • Optional wanted native address (hex) for statements: "ADDRESS: ...", eg 1A00: nop will create an ENop statement whose mapping to a hypothetical native address is set to 0x1A00
  • Literal labels for EJump are supported, eg: @label1 ... goto @label1
  • Spaces between tokens and parenthesis around expressions are mostly mandatory; in order to avoid compilation errors, it is better to use them systematically
  • Comments: // or ;
  • EVar are created automatically upon first encounter unless they already exist in the routine-context (checked first), or the global-context (checked second)
  • The class of EVar used for creation depends on its name prefixed; by default, special locals (negative range [2, 0x10000[) is used
  • Standardized prefixes:
       global-context
          rID            -> physical register EVar if possible
          RID            -> virtual register EVar if possible
          gADDR          -> memory-mapped global EVar if possible
          ptr_gAAA       -> global symbol EVar if possible
       routine-context
          vID            -> virtual routine-context EVar if possible (similar to global-context's R..)
          varADDR        -> memory-mapped local stack EVar variable (negative stack offset rel.to SP0)
          parADDR        -> memory-mapped local stack EVar variable (positive or null stack offset rel.to SP0)
          ptr_varADDR    -> pointer (reference) to a local memory-mapped stack variable
          ptr_parADDR    -> pointer (reference) to a local memory-mapped stack parameter
          $r..           -> copy of var
          $r..$N         -> additional copy of var (N>=1)
          $r.._r..       -> copy of var pair
          $r.._r..$N     -> additional copy of var pair (N>=1)
          $r..loX        -> copy of var, truncated (LSB part)
          $r..hiX        -> copy of var, truncated (MSB part)
          $r..loX$N      -> additional copy of var, truncated (LSB part)
          $r..hiX$N      -> additional copy of var, truncated (MSB part)
     

Specific rules for expression and statement compilation:
- PC-assigns can receive additional information, to be provided as end-of-line tags enclosed in brackets:
- [BRANCH] -> means the PC-assign should be generated as if it came from a normal branching instruction
- [SUB] -> means the PC-assign should be generated as if it came from call-to-sub instruction
- [BRANCH_HINTS:offsets] -> provide pseudo-native target hints for the branching instructions; offsets must be a comma-separated list of pseudo-native offsets (not IR offsets)

Specific rules for CFG compilation:
- N/A

Specific rules for routine compilation:
- routines may or may not be enclosed in PROC/ENDP

Specific rules for program compilation:
- routines must be enclosed in PROC/ENPD. The wanted name, wanted pseudo start address (native), and IR prototype are all optional:
- data elements: see below.

 PROC Name @NativeAddress :Prototype
 ...
 ...
 ENDP
 

Defining references: simulate dynamically resolved references to routine and data imported into the module, but physically located in an external component.

 IMPORT CODE MethodName [:OptionalPrototype]
 IMPORT DATA FieldName [:OptionalType]
 

Defining data elements (native memory):

 - syntax for raw bytes (does not create variable object, just memory init.)
     DB/DW/DD/DQ/DS @Address Value
     B,W,D,Q=BYTE, WORD, DWORD, QWORD (1, 2, 4, 8 byte), hex or decimal value
     endianness for memory-encoding matches the processor's (referenced in the native context, held by global context provided to the compiler)
     DB can also be used byte sequences: Value is an arbitrarily-long hex-encoded byte sequence or an escaped string - no zero terminator is appended
     
   examples:
     DB @100 0x11
     DW @100 0x1122
     DD @100 0x11223344
     DQ @100 0x1122334455667788
     DB @100 '11aabb660099ff414141141'  <--- hex-encoded string (note the single-quotes, vs double-quotes for strings)
     DB @100 "Hello World!"             <--- encode to ASCII
     [NOT SUPPORTED YET] DB @100 U"Hello World!"            <--- encode to UTF8
     [NOT SUPPORTED YET] DB @100 L"Hello World!"            <--- encode to UTF16LE (note little-endian)
 
 - syntax for regular data items:
     DV Name @Address :TypeName [OptionalValue]
   where Value is an optional hex-encoded string whose length must be less than or equals to the size of the variable Type
   
 - syntax for string (ascii-encoded, 0-terminated) data items:
     DS Name @Address "Hello"
   the zero terminator is added implicitly, so the above string would translate to 6 bytes, not 5 
   
 - syntax for imported references:
     DR Name @Address &ImportName
 
 
  • Constructor Details

  • Method Details

    • cc

      public static IEGeneric cc(String s, IEGlobalContext gctx)
      Convenience method to parse an IR expression or statement.
      Parameters:
      s -
      gctx -
      Returns:
    • cc

      public static <T extends IEGeneric> T cc(String s, IEGlobalContext gctx, Class<T> clazz)
      Convenience method to parse an IR expression or statement.
      Type Parameters:
      T -
      Parameters:
      s -
      gctx -
      clazz -
      Returns:
    • reset

      public void reset()
      Reset this compiler's state. Note that the global IR context (IEGlobalContext) is not reset.
    • compileExpression

      public ECompiler.CompiledExpression compileExpression(String s)
      Compile a non-statement expression.
      Parameters:
      s - pure expression string (not a statement)
      Returns:
      the compiled expression
    • compileExpression

      public ECompiler.CompiledExpression compileExpression(IERoutineContext ctx, String s)
      Compile a non-statement expression.
      Parameters:
      ctx - optional routine context to be used; if null, a fresh context will be created
      s - pure expression string (not a statement)
      Returns:
      the compiled expression
    • compileStatement

      public ECompiler.CompiledStatement compileStatement(String s)
      Compile a single statement.
      Parameters:
      s - statement string
      Returns:
      the compiled statement
    • compileStatement

      public ECompiler.CompiledStatement compileStatement(IERoutineContext ctx, String s)
      Compile a single statement.
      Parameters:
      ctx - optional routine context to be used; if null, a fresh context will be created
      s - statement string
      Returns:
      the compiled statement
    • compileCfg

      public CFG<IEStatement> compileCfg(String... slist)
      Compile a sequence of statements and return the CFG.
      Parameters:
      slist - statement list
      Returns:
      IR CFG
    • compileCfg

      public CFG<IEStatement> compileCfg(IERoutineContext ctx, String... slist)
      Compile a sequence of statements and return the CFG.
      Parameters:
      ctx - optional routine context to be used; if null, a fresh context will be created
      slist - statement list
      Returns:
      IR CFG
    • compileRoutine

      public ECompiler.CompiledRoutine compileRoutine(String... slist)
      Compile an IR routine.
      Parameters:
      slist - routine source
      Returns:
      the compiled routine
    • compileProgram

      public ECompiler.CompiledProgram compileProgram(File file) throws IOException
      Compile an IR program made of 1 or more routines.
      Parameters:
      file - UTF8 encoded source file
      Returns:
      the compiled program
      Throws:
      IOException
    • compileProgram

      public ECompiler.CompiledProgram compileProgram(String... slist)
      Compile an IR program made of 1 or more routines.
      Parameters:
      slist - program source strings
      Returns:
      the compiled program
    • compileProgram

      public ECompiler.CompiledProgram compileProgram(List<String> slist)
      Compile an IR program made of 1 or more routines.
      Parameters:
      slist - program source strings
      Returns:
      the compiled program