As I mentioned in my
earlier post, I will try to share some useful tips and tricks on improving reverse engineering experience whenever possible. For today, I would like to start by pointing the readers to a post made by Cody Pierce on TippingPoint's website over a year ago:
http://dvlabs.tippingpoint.com/blog/2008/10/09/mindshare-first-things-firstIn that post he discussed about the top down approach taken by them when analyzing binaries. Even though the post is more towards finding vulnerabilities, the same approach could also be applied when analyzing malicious samples. Anyway, this is the part of the post which we will be discussing today:
"With that said, one of the first things we do is look at the most cross referenced functions. By doing so, our efforts will always help future endeavors. Let's say we have a function that is called 4000 times throughout the binary. If we identify that it is a memory allocation routine, we have just made those other 4000 functions that much easier to understand. These common functions are the building blocks for more complex code we may encounter later."
This is very true especially to new analysts as I still remember how I was struggling the first time I analyzed a live sample, not knowing where to start analyzing wasn't a fun experience to me. Another interesting post is made by Dion with the subject "Differential Reversing" which was posted on his website not too long ago. You can find the link below:
http://dion.t-rexin.org/notes/2009/09/29/differential-reversing/But we will leave that for future discussion.
The ability to cross reference function calls is one of the great features being offered by
IDA Pro, It's really helpful to analysts especially when it comes to deciding which functions should be prioritized for analysis. Combine this with differential analysis, it becomes even more powerful to help analysts do their work faster.
My only problem is that sometimes I just prefer, instead of going through the functions list in the dialogue box manually counting how many cross-references it has, to have IDA just give me the number (total). It will be even better if I can somehow have control over what kind of information related to a function I want IDA to display. I tried to Google for a plugin that will do such thing for me but couldn't find any (Maybe I just didn't try hard enough) The good news is, this can all be done with the provided IDA SDK and Gergely Erdélyi's
IDAPython just make things even easier and more fun.
Let me first list down the features that I would like to have in this plugin:
1) Print me some basics info about the function, its address, name and the segment where it's located.
2) Tell me how many locations are referencing to this functions. I would like to get the total of the code and data references and also the total for both types.
Ok now let's take a look at the quick hack I put together:
Warning : This is a quick hack that I put together for this post and might not be as clean as you expected (After all I am not a good coder :p).
# Zarul Shahrin 2010, zarulshahrin - at - hackinthebox.org
from idaapi import *
from idautils import *
def reportRefsTo(func):
funcName = get_func_name(func.startEA) # Get the name of the function from the given address
segName = get_segm_name(func.startEA) # Get the segment it's located
print "=" * 72
print ''
print "Function Name : ", funcName
print "Address : 0x%x" % func.startEA
print "Segment : ", segName
print "Total code references to this function : ",
cTotal = sum(1 for ref in CodeRefsTo(func.startEA, 1)) # Get the total of code reference to this function
print cTotal
print "Total data references to this function : ",
dTotal = sum(1 for ref in DataRefsTo(func.startEA)) # Get the total of data reference to this function
print dTotal
print "Total code + data references to this function : ", cTotal + dTotal
print "\n"
def getFuncInfo(dOption = "all",fImport = 0):
if dOption == "all":
qty = get_func_qty() # Get the number of functions
fCount = 0
while ( fCount < qty):
func = getn_func(fCount) # Get pointer to function structure
reportRefsTo(func)
fCount += 1
else:
func = get_func(dOption) # Get pointer to function structure
reportRefsTo(func) # Print the report
def main():
getFuncInfo()
if __name__ == "__main__":
main()
(Blogger messed up with the code .You can get the final and updated code
here)
Below is the sample results for the above IDAPython script (Not full listing. The target application is the GNU Patch.exe):
========================================================================
Function Name : sub_415510
Address : 0x415510
Segment : _text
Total code references to this function : 0
Total data references to this function : 2
Total code + data references to this function : 2
========================================================================
Function Name : sub_4155A0
Address : 0x4155a0
Segment : _text
Total code references to this function : 1
Total data references to this function : 0
Total code + data references to this function : 1
========================================================================
Function Name : sub_415610
Address : 0x415610
Segment : _text
Total code references to this function : 6
Total data references to this function : 0
Total code + data references to this function : 6
========================================================================
Function Name : sub_415810
Address : 0x415810
Segment : _text
Total code references to this function : 15
Total data references to this function : 0
Total code + data references to this function : 15
========================================================================
Function Name : sub_415970
Address : 0x415970
Segment : _text
Total code references to this function : 1
Total data references to this function : 0
Total code + data references to this function : 1
========================================================================
Function Name : _write
Address : 0x4159a0
Segment : _text
Total code references to this function : 4
Total data references to this function : 0
Total code + data references to this function : 4
========================================================================
Function Name : _read
Address : 0x4159b0
Segment : _text
Total code references to this function : 5
Total data references to this function : 0
Total code + data references to this function : 5
========================================================================
Function Name : _open
Address : 0x4159c0
Segment : _text
Total code references to this function : 7
Total data references to this function : 0
Total code + data references to this function : 7
========================================================================
Great! So it does what I wanted it to do. My only problem with the above result is, I would like to get rid of the functions that do nothing more than jumping to a system API. In order to do this I added a quick filter to the original code:
# Zarul Shahrin 2010, zarulshahrin - at - hackinthebox.org
from idaapi import *
from idautils import *
def isImport(ea):
ref = get_first_dref_from(ea) # Get the first data reference to this function
if get_segm_name(ref) == "_idata": # Check if it's a reference to an entry in the import section
return True
else:
return False
def reportRefsTo(func):
funcName = get_func_name(func.startEA) # Get the name of the function from the given address
segName = get_segm_name(func.startEA) # Get the segment it's located
print "=" * 72
print ''
print "Function Name : ", funcName
print "Address : 0x%x" % func.startEA
print "Segment : ", segName
print "Total code references to this function : ",
cTotal = sum(1 for ref in CodeRefsTo(func.startEA, 1)) # Get the total of code reference to this function
print cTotal
print "Total data references to this function : ",
dTotal = sum(1 for ref in DataRefsTo(func.startEA)) # Get the total of data reference to this function
print dTotal
print "Total code + data references to this function : ", cTotal + dTotal
print "\n"
def getFuncInfo(dOption = "all",fImport = 0):
if dOption == "all":
qty = get_func_qty() # Get the number of functions
fCount = 0
while ( fCount < qty):
func = getn_func(fCount) # Get pointer to function structure
if (fImport):
if isImport(func.startEA): # Check if it's in the import sections
fCount += 1
continue
reportRefsTo(func)
fCount += 1
else:
func = get_func(dOption) # Get pointer to function structure
reportRefsTo(func) # Print the report
def main():
getFuncInfo()
if __name__ == "__main__":
main()
(Blogger messed up with the code .You can get the final and updated code
here)
So now we have successfully get rid of those functions from our result :
========================================================================
Function Name : sub_4144A4
Address : 0x4144a4
Segment : _text
Total code references to this function : 16
Total data references to this function : 0
Total code + data references to this function : 16
========================================================================
Function Name : sub_4144EC
Address : 0x4144ec
Segment : _text
Total code references to this function : 2
Total data references to this function : 0
Total code + data references to this function : 2
========================================================================
Function Name : sub_4146F4
Address : 0x4146f4
Segment : _text
Total code references to this function : 1
Total data references to this function : 0
Total code + data references to this function : 1
========================================================================
Function Name : sub_4148C4
Address : 0x4148c4
Segment : _text
Total code references to this function : 2
Total data references to this function : 0
Total code + data references to this function : 2
========================================================================
Function Name : sub_414C24
Address : 0x414c24
Segment : _text
Total code references to this function : 2
Total data references to this function : 0
Total code + data references to this function : 2
========================================================================
Function Name : sub_414D80
Address : 0x414d80
Segment : _text
Total code references to this function : 1
Total data references to this function : 0
Total code + data references to this function : 1
========================================================================
Function Name : sub_4152B0
Address : 0x4152b0
Segment : _text
Total code references to this function : 12
Total data references to this function : 0
Total code + data references to this function : 12
========================================================================
From the results, you can clearly see the most cross referenced function. This is very helpful in deciding which function to be analyzed first.
This is just a simple quick hack to demonstrate the kind of powerful things you can do with IDA and its SDK which help speeding up the reverse engineering process (Even if it only save 10 minutes of analysis time, it is more than good enough). Of course you can add more things to it so that it will give you more information about a function. Some of them included:
- Check if a function contains any loop within it (loops detection), if it does, check for the signs of it being an encryption or decryption routine.
- Combine with differential analysis, we can create a filter to only output the functions being called within the execution flow.
I will try to discuss more about this in the future.
I really hope those who are new to reverse engineering will find this post to be useful. The codes are not perfect and there are many ways they can be improved. Feel free to make improvement to it and share with us if you wish :-D
Finally, I would like to wish everyone who are celebrating the Chinese New Year Gong Xi Fa Cai!!
P/S: I would like to thank Esteban Guillardoy from RibadeoHacklab for the help with proof-reading. Thanks Bro!