Last week was the release of JEB 2.3.7 with a brand new parser for Mach-O, the executable file format of Apple’s macOS and iOS operating systems. This file format, like its cousins PE and ELF, contains a lot of technical peculiarities and implementing a reliable parser is not a trivial task.
During the journey leading to this first Mach-O release, we encountered some interesting executables. This short blog post is about one of them, which uses some Mach-O features to make reverse-engineering harder.
Recon
The executable in question belongs to a well-known adware family dubbed InstallCore, which is usually bundled with others applications to display ads to the users.
The sample we will be using in this post is the following:
57e4ce2f2f1262f442effc118993058f541cf3fd: Mach-O 64-bit x86_64 executable
Let’s first take a look at the Mach-O sections:
Interestingly, there are some sections related to the Objective-C language (“__objc_…”). Roughly summarized, Objective-C was the main programming language for OS X and iOS applications prior the introduction of Swift. It adds some object-oriented features around C, and it can be difficult to analyze at first, in particular because of its way to call methods by “sending messages”.
Nevertheless, the good news is that Objective-C binaries usually come with a lot of meta-data describing methods and classes, which are used by Objective-C runtime to implement the message passing. These metadata are stored in the “__objc_…” sections previously mentioned, and the JEB Mach-O parser process them to find and properly name Objective-C methods.
After the initial analysis, JEB leaves us at the entry point of the program (the yellow line below):
Wait a minute… there is no routine here and it is not even correct x86-64 machine code!
Most of the detected routines do not look good either; first, there are a few objective-C methods with random looking names like this one:
Again the code makes very little sense…
Then comes around 50 native routines, whose code can also clearly not be executed “as is”, for example:
Moreover, there are no cross-references on any of these routines! Why would JEB disassembler engine – which follows a recursive algorithm combined with heuristics – even think there are routines here?!
Time for a Deep Dive
Code Versus Data
First, let’s deal with the numerous unreferenced routines containing no correct machine code. After some digging, we found that they are declared in the LC_FUNCTION_STARTS Mach-O command – “command” being Mach-O word for an entry in the file header.
This command provides a table containing function entry-points in the executable. It allows for example debuggers to know function boundaries without symbols. At first, this may seem like a blessing for program analysis tools, because distinguishing code from data in a stripped executable is usually a hard problem, to say the least. And hence JEB, like other analysis tools, uses this command to enrich its analysis.
But this gift from Mach-O comes with a drawback: nothing prevents miscreants to declare function entry points where there are none, and analysis tools will end up analyzing random data as code.
In this binary, all routines declared in LC_FUNCTION_STARTS command are actually not executable. Knowing that, we can simply remove the command from the Mach-O header (i.e. nullified the entry), and ask JEB to re-analyze the file, to ease the reading of the disassembly. We end up with a much shorter routine list:
The remaining routines are mostly Objective-C methods declared in the metadata. Once again, nothing prevents developers to forge these metadata to declare method entry points in data. For now, let’s keep those methods here and focus on a more pressing question…
Where Is the Entry Point?
The entry point value used by JEB comes from the LC_UNIXTHREAD command contained in the Mach-O header, which specifies a CPU state to load at startup. How could this program be even executable if the declared entry point is not correct machine code (see Figure 2)?
Surely, there has to be another entry point, which is executed first. There is one indeed, and it has to do with the way the Objective-C runtime initializes the classes. An Objective-C class can implement a method named “+load” — the + means this is a class method, rather than an instance method –, which will be called during the executable initialization, that is before the program main() function will be executed.
If we look back at Figure 5, we see that among the random looking method names there is one class with this famous +load method, and here is the beginning of its code:
Finally, some decent looking machine code! We just found the real entry point of the binary, and now the adventure can really begin…
That’s it for today, stay tuned for more technical sweetness on JEB blog!