AST encoding not working

The AST generation works and I get this database for an example file:

And as you can see the ast_encode is NULL.

How do you create the function encoding?.

Possible issue:

In application.py there is no function encode_ast_in_db().

In fact the main function in application.py doesn't do anything except save some path:

Lines 187 to 194 in 3a036af

    
           if __name__ == '__main__': 
        
               # end_to_end_evaluation() 
        
               # time_consumption_statistics() 
        
               # db_names = [""] 
        
               # app.encode_ast_in_db("/root/data/firmwares/vul.sqlite") 
        
               # app.encode_ast_in_db("/root/data/firmwares/Dlinkfirmwares.sqlite") 
        
               db_path = "/root/data/firmwares/Netgearfirmwares.sqlite" 
        
               # app.encode_ast_in_db(db_path, table_name='NormalTreeLSTM', where_suffix=" where elf_file_name like 'libcrypto%' ")

This is executed in Asteria_ida_plugin.py (self.application is application.py)

Asteria/ASTExtraction/Asteria_ida_plugin.py

Lines 228 to 232 in 3a036af

    
           # 4. 
        
           ida_kernwin.replace_wait_box("AST Encoding...") 
        
           cmd = 'python "{}" --dbpath "{}"'.format(self.application, self._sqlitefilepath) 
        
           idaapi.msg("[Asteria] >>> AST Encoding...[{}]\n".format(cmd))

So neither the ida plugin nor running the application.py on a database directly gives any results.
Any recommended fixes?
@yangshouguo @shouguoyang @Asteria-BCSD

I created a function that creates the encoding. If anyone else stumbles across this problem here is the function:

def encode_ast_in_db(self, dbPath):
        '''
        :param dbPath: path to database
        '''
        conn = sqlite3.connect(dbPath)
        c = conn.cursor()
        encodings = []
        for row in c.execute("SELECT ast_pick_dump FROM function"):
            encoded_ast = self.encode_ast(pickle.loads(row[0])).tolist()
            encodings.append((json.dumps(encoded_ast), row[0]))
        for encoding in encodings:
            c.execute("UPDATE function SET ast_encode = ? WHERE ast_pick_dump = ?", encoding)
        conn.commit()
        conn.close()

It gets called like this:

db_path = "default.sqlite"
app = Application(load_path='saved_model.pt')
app.encode_ast_in_db(db_path)

Resulting table:

Hi @Adalsteinnjons,

I tried creating the setup with IDA pro 7.7.
I am not able to generate ASTs as it returns an error with return code 1. Can you please specify the steps you followed to generate the ASTs?
As per readme, I had copied the file and dir in their required place in plugins and python dirs inside IDA pro 7.7. While running the command, I am providing the absolute path to the 'idat64' bin as well. ''TVHEADLESS=1' flag was also not recognized.

It would be really helpful if you can share the steps you followed along with the python and IDA versions used in the process.

Thanks.

Hi @kgautam01
I used IDA pro 7.5. The first thing I did was to upgrade the Asteria_ida_plugin.py file according to this link . As well as fixing the Exceptions.
Here is the updated file:

# encoding=utf-8
'''
This is a ida plugin script
## 1
To dump all/one AST feature(s) of function(s) to a database file.
## 2
To Calculate Similarity between all functions and functions in another database file.
'''

import idaapi
from idaapi import IDA_SDK_VERSION, IDAViewWrapper
import idc
import idautils
import logging
from ida_kernwin import Form, PluginForm, show_wait_box, replace_wait_box, hide_wait_box, user_cancelled
import os, sys
import json
import subprocess
import time

try:
    if IDA_SDK_VERSION < 690:
        # In versions prior to IDA 6.9 PySide is used...
        from PySide import QtGui

        QtWidgets = QtGui
        is_pyqt5 = False
    else:
        logging.error("IDA version is ", IDA_SDK_VERSION)
        # ...while in IDA 6.9, they switched to PyQt5
        from PyQt5 import QtCore, QtGui, QtWidgets

        is_pyqt5 = True
except ImportError:
    pass

l = logging.getLogger("ast_generator.py")
l.setLevel(logging.ERROR)
logger = logging.getLogger('Asteria')
logger.addHandler(logging.StreamHandler())
logger.addHandler(logging.FileHandler("Asteria.log"))
logger.handlers[0].setFormatter(
    logging.Formatter("[%(filename)s][%(levelname)s] %(message)s\t(%(module)s:%(funcName)s)"))
logger.handlers[1].setFormatter(
    logging.Formatter("[%(filename)s][%(levelname)s] %(message)s\t(%(module)s:%(funcName)s)"))
logger.setLevel(logging.ERROR)


# logging.basicConfig(format='[%(filename)s][%(levelname)s] %(message)s\t(%(module)s:%(funcName)s)',
#                     filename="Asteria.log",
#                     filemode='a')

# view to show the similaity results
class CHtmlViewer(PluginForm):

    def OnCreate(self, form):
        if is_pyqt5:
            self.parent = self.FormToPyQtWidget(form)
        else:
            self.parent = self.FormToPySideWidget(form)
        self.PopulateForm()

        self.browser = None
        self.layout = None
        self.text = ""

        return 1

    def PopulateForm(self):
        self.layout = QtWidgets.QVBoxLayout()
        self.browser = QtWidgets.QTextBrowser()
        self.browser.setLineWrapMode(QtWidgets.QTextEdit.NoWrap)
        self.browser.setHtml(self.text)
        self.browser.setReadOnly(True)
        self.browser.setFontWeight(18)
        self.layout.addWidget(self.browser)
        self.parent.setLayout(self.layout)

    def ScoreToColor(self, score):
        '''
        return the color rgb value according to score
        1. score ranges 0.8~1.0
        2. color format #ff{A}{A} , A is calculated according to Score
        '''
        score = float(score)
        if score < 0.8:
            return "#ffffff"  # white

        ColorScore = int(125 - (score - 0.8) * 10 * 62)
        return "#ff%.2x%.2x" % (ColorScore, ColorScore)

    def AssembleHtml(self, jsondata):
        '''
        jsondata :dic: {'vulFuncName':{'info':['funcName','BinPath','BinName'], "rank":[[[like info], score], [[like info], score]]}}
        '''
        _html_template = """
                              <html>
                              <head>
                              <style>%(style)s</style>
                              </head>
                              <body>
                              <table class="diff_tab"  cellspacing="3px">
                              %(rows)s
                              </table>
                              </body>
                              </html>
        """
        _style = """
                 table.diff_tab {
                    font-family: Courier monospace;
                    border: solid black;
                    font-size: 15px;
                    text-align: left;
                    width: 100%;
                  }
                  td {
                    border: 2px;
                    border-bottom-style:solid;
                  }
                  """

        head_row_temp = """
                    <tr> 
                    <td class="VulFuncName" rowspan="%(rowNumber)d"> %(vulnerableFunc)s </td>
                    <td class="CandidateName"> %(candidateFunc)s </td>
                    <td class="score" bgcolor=%(bgcolor)s> %(score)s </td>
                    </tr>
        """
        other_row_temp = """
                    <tr> 
                    <td > %(candidateFunc)s </td>
                    <td bgcolor=%(bgcolor)s> %(score)s </td>
                    </tr>
        """
        table_header = """
            <thead> 
                <tr> 
                <th>Functions In Chosen File</th> <th>Functions In IDA</th>  <th>Similarity Scores</th> 
                </tr> 
            </thead>
        """
        table_content = ""
        for vulFunc in jsondata:
            candidates = jsondata[vulFunc]['rank']
            rowNumber = len(candidates)
            if rowNumber == 0:
                continue
            vulInfo = jsondata[vulFunc]['info']
            vulFuncName = vulFunc
            vulFuncBin = vulInfo[2]

            headerRow = head_row_temp % {"rowNumber": rowNumber, "vulnerableFunc": vulFuncName,
                                         "candidateFunc": candidates[0][0][0], "score": candidates[0][1],
                                         "bgcolor": self.ScoreToColor(candidates[0][1])}
            table_content += headerRow
            if rowNumber > 1:
                for candidate in candidates[1:]:
                    table_content += other_row_temp % {"candidateFunc": candidate[0][0], "score": candidate[1],
                                                       "bgcolor": self.ScoreToColor(candidate[1])}

        src = _html_template % {"style": _style, "rows": table_header + table_content}
        logger.debug(src)
        return src

    def Show(self, jsondata, title):
        self.text = self.AssembleHtml(jsondata)
        """Creates the form is not created or focuses it if it was"""
        return PluginForm.Show(self, title, options=PluginForm.WOPN_PERSIST)


# function menus
class MyForm(Form):
    # Main Class for functionality of plugin
    def __init__(self):
        self.invert = False
        F = Form
        F.__init__(
            self,
            r"""STARTITEM 0
            Please choose your option.
            <##Function Feature Generation:{iButton1}> <##Function Similarity:{iButton2}>
            """,
            {
                'iButton1': F.ButtonInput(self.db_generate),
                'iButton2': F.ButtonInput(self.sim_cal)
            }
        )
        self._sqlitefilepath = idc.get_idb_path() + ".sqlite"
        idadir = idaapi.idadir("python")
        self.mainapp_path = os.path.join(idadir, "ASTExtraction",
                                         "main_app.py")  # path to the script of the ast encoding and similarity
        if not os.path.exists(self.mainapp_path):
            logger.error("Python File {} not exists!".format(self.mainapp_path))
        self.application = os.path.join(idadir, "ASTExtraction",
                                        "application.py")  # path to the script of the ast encoding and similarity
        if not os.path.exists(self.application):
            logger.error("Python File {} not exists!".format(self.application))

    # When you click on feature generation button
    def db_generate(self, code=0):
        logger.debug("ast generating...")
        # 1. set file to save; 2. show waiting box and progress 3. invoke ida script for ast generation 4. show sucess box.
        # 1.
        logger.debug("sqlite file path {}".format(self._sqlitefilepath))

        # 2.
        show_wait_box("Processing...")

        # 3.
        # 3.1 extract asts
        try:
            idaapi.require("ASTExtraction")
            idaapi.require("ASTExtraction.ast_generator")
            idaapi.require("ASTExtraction.DbOp")
            g = ASTExtraction.ast_generator.AstGenerator()
            dbop = ASTExtraction.DbOp.DBOP(self._sqlitefilepath)
            g.run(g.get_info_of_func)
            # 3.2 save asts to database
            replace_wait_box("Processing. Saving to Database.")
            recordsNo = g.save_to(dbop)
            logger.info("%d records are inserted into database." % recordsNo)
            del dbop
        except Exception as e:
            replace_wait_box(str(e))
            logger.error("ASTExtraction import error! {}".format(e))
            time.sleep(2)
        # 4.
        replace_wait_box("AST Encoding...")
        logger.error("ast encoding 0")
        cmd = 'python "{}" --dbpath "{}"'.format(self.application, self._sqlitefilepath)
        idaapi.msg("[Asteria] >>> AST Encoding...[{}]\n".format(cmd))
        logger.error("ast encoding 1")
        returncode = self.invoke_system_python(cmd, hidden_window=False)
        if returncode:
            idaapi.msg("[Asteria] >>> AST Encoding failed\n")
        hide_wait_box()
        idaapi.msg("[Asteria] >>> AST Encoding Finished\n")

    def invoke_system_python(self, cmd, hidden_window=True):
        '''
        cmd : str: command to be executed
        hidden_window: bool:
        return : subprocess.returncode
        '''
        startupinfo = subprocess.STARTUPINFO()
        # to hidden the cmd window
        if hidden_window:
            if 'win32' in str(sys.platform).lower():
                startupinfo.dwFlags = subprocess.CREATE_NEW_CONSOLE | subprocess.STARTF_USESHOWWINDOW
                startupinfo.wShowWindow = subprocess.SW_HIDE

            # p = subprocess.Popen(cmd,startupinfo= startupinfo, stdout=subprocess.PIPE)#, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, bufsize=0)
            # while p.poll() == None:
            #     idaapi.msg(p.stdout.readline())
            # return p.returncode

        p = subprocess.Popen(cmd,
                             startupinfo=startupinfo)  # , stdout=subprocess.PIPE, stderr=subprocess.STDOUT, bufsize=0)
        # while p.poll() == None:
        #     idaapi.msg(p.stdout.readline().strip())
        # p.stdout.flush()
        return p.wait()

    def sim_cal(self, code=0):
        logger.debug("Similarity Calculation...")
        # : 1. make sure current idb has generated sqlite db file 2. Choose sqlite db file to compare with 3. do calculation 4. show similarity results.
        # 1.
        if os.path.exists(self._sqlitefilepath):
            # 2.
            sqlite_database2, _ = QtWidgets.QFileDialog.getOpenFileName()  # another sqlite db file for similarity calculation with the sqlite db file generated by current idb
            idaapi.msg("[Asteria] >>> The sqlite file to be calculated is {}\n".format(sqlite_database2))
            # 3. 3.1: TODO: should check the python environment first
            # 3.2: do the calculation and save the results to file

            result_path = self._sqlitefilepath.split(".")[0] + ".json"
            cmd = 'python "{}" --result "{}" "{}" "{}"'.format(self.mainapp_path, result_path, sqlite_database2,
                                                               self._sqlitefilepath)
            logger.info("Calculation CMD: {}".format(cmd))
            show_wait_box("Calculating... Please wait.")
            returncode = self.invoke_system_python(cmd, hidden_window=True)
            hide_wait_box()
            if returncode:
                # cmd execution fails
                idaapi.msg("[Asteria] >>> Calcualtion failed, {}.\n".format(returncode))
            # 3.3 load the results
            if not os.path.exists(result_path):
                idaapi.msg("[Asteria] >>> Result file does not exists after the calculation, please see the log.\n")
                return

            results = json.load(open(result_path, 'r'))
            # logger.info(str(results))
            # show window of result
            window = CHtmlViewer()
            title = "Similarity Scores"
            window.Show(results, title)
            self.Close(1)
            # self.Free()
        else:
            show_wait_box("Please run AST Generation first.")
            while not user_cancelled():
                pass
            hide_wait_box()
            return


class AsteriaPlugin(idaapi.plugin_t):
    """
    This is the main class of the plugin. It subclasses plugin_t as required
    by IDA. It holds the modules of plugin, which themselves provides the
    functionality of the plugin.
    """
    # plugin information
    flags = idaapi.PLUGIN_UNL
    comment = "This is a plugin for the Asteria (https://github.com/Asteria-BCSD/Asteria)"
    wanted_name = "Asteria"  # name of plugin
    wanted_hotkey = "Alt-F1"  # hot key
    help = "Please see README in https://github.com/Asteria-BCSD/Asteria"

    def init(self):
        if not self.load_plugin_decompiler():
            idaapi.msg("[Asteria]Failed to load hexray plugin.")
            return idaapi.PLUGIN_SKIP

        # To load the modules in /path/to/ida/python/Asteria

        try:
            idaapi.require("ASTExtraction")
            # idaapi.require("ASTExtraction.ast_generator")
            # idaapi.require("ASTExtraction.DbOp")
        except Exception as e:
            idaapi.msg(
                "[Asteria] Plugin initialization failed. Please read the README file and copy 'ASTExtraction' dir of the project to '/path/to/ida/python/'\n")
            logger.error(e.args)
            return idaapi.PLUGIN_SKIP

        idaapi.msg("[Asteria] >>> Asteria plugin initialization finished!\n")
        return idaapi.PLUGIN_OK  # return PLUGIN_KEEP

    def run(self, arg):
        self.form = MyForm()
        self.form.Compile()
        self.form.Execute()
        logger.debug(">>> Asteria plugin exists")

    def term(self):
        idaapi.msg("[Asteria] >>> Asteria plugin ends.\n")

    def load_plugin_decompiler(self):
        '''
        load the hexray plugins
        :return: success or not
        '''
        is_ida64 = idc.get_idb_path().endswith(".i64")
        if not is_ida64:
            idaapi.load_plugin("hexrays")
            idaapi.load_plugin("hexarm")
        else:
            idaapi.load_plugin("hexx64")
        if not idaapi.init_hexrays_plugin():
            logger.error('[+] decompiler plugins load failed. IDAdb: %s' % idc.get_root_filenamePath())
            return False
        return True


def PLUGIN_ENTRY():
    # this function returns an instance of an idapython plugin

    return AsteriaPlugin()

You have to have idapython working in IDA Pro as well.
I don't use the plugin but mainly use it through my terminal since I use it on a large folder of binaries.
I am running it through wine on linux with the command:
python API_ast_generator.py --ida_path /home/ajonsson/.wine/drive_c/IDAPro7.5/idat --directory /home/../../..//data/01_raw/malicious --database malicious.sqlite
The directory malicious is where I store my binary files that I want to analyze.

And in the code I use:

if platform == 'linux':
        Header = "wine"
cmd_list = [Header, self.args.ida_path, "-c" ,"-A", '-S"%s %s"' % (self.Script, IDA_ARGS), binary_path]
cmd = " ".join(cmd_list)

That gives me an sqlite database with the ast_trees as pickle dump. When I open the malicious.sqlite database I see that the encoding columns is still empty at this point. Do you get to this point?
To get the encoding I run application.py with the function that I provided in my commend above (comment nr. 2). Simply add that function and then

db_path = "malicious.sqlite"
app = Application(load_path='saved_model.pt')
app.encode_ast_in_db(db_path)

I later changed to using feather instead of sqlite because it is much faster and takes less space. So I first change the database from sqlite to feather format. Then this is my function to do the encodings:

def encode_ast_in_feather(self, file):
        '''
        :param file: path to feather file
        '''
        import swifter
        df = pd.read_feather(file)
        print('Encoding ASTs in {}'.format(file))
        print('Number of rows: {}'.format(df.shape[0]))
        encodings = df.swifter.apply(lambda x: self.encode_ast(pickle.loads(x["ast_pick_dump"])), axis=1)
        df["ast_encode"] = encodings
        df.to_feather(file)

Hope that works for you. Otherwise let me know.

Hi @kgautam01 I used IDA pro 7.5. The first thing I did was to upgrade the Asteria_ida_plugin.py file according to this link . As well as fixing the Exceptions. Here is the updated file:

# encoding=utf-8
'''
This is a ida plugin script
## 1
To dump all/one AST feature(s) of function(s) to a database file.
## 2
To Calculate Similarity between all functions and functions in another database file.
'''

import idaapi
from idaapi import IDA_SDK_VERSION, IDAViewWrapper
import idc
import idautils
import logging
from ida_kernwin import Form, PluginForm, show_wait_box, replace_wait_box, hide_wait_box, user_cancelled
import os, sys
import json
import subprocess
import time

try:
    if IDA_SDK_VERSION < 690:
        # In versions prior to IDA 6.9 PySide is used...
        from PySide import QtGui

        QtWidgets = QtGui
        is_pyqt5 = False
    else:
        logging.error("IDA version is ", IDA_SDK_VERSION)
        # ...while in IDA 6.9, they switched to PyQt5
        from PyQt5 import QtCore, QtGui, QtWidgets

        is_pyqt5 = True
except ImportError:
    pass

l = logging.getLogger("ast_generator.py")
l.setLevel(logging.ERROR)
logger = logging.getLogger('Asteria')
logger.addHandler(logging.StreamHandler())
logger.addHandler(logging.FileHandler("Asteria.log"))
logger.handlers[0].setFormatter(
    logging.Formatter("[%(filename)s][%(levelname)s] %(message)s\t(%(module)s:%(funcName)s)"))
logger.handlers[1].setFormatter(
    logging.Formatter("[%(filename)s][%(levelname)s] %(message)s\t(%(module)s:%(funcName)s)"))
logger.setLevel(logging.ERROR)


# logging.basicConfig(format='[%(filename)s][%(levelname)s] %(message)s\t(%(module)s:%(funcName)s)',
#                     filename="Asteria.log",
#                     filemode='a')

# view to show the similaity results
class CHtmlViewer(PluginForm):

    def OnCreate(self, form):
        if is_pyqt5:
            self.parent = self.FormToPyQtWidget(form)
        else:
            self.parent = self.FormToPySideWidget(form)
        self.PopulateForm()

        self.browser = None
        self.layout = None
        self.text = ""

        return 1

    def PopulateForm(self):
        self.layout = QtWidgets.QVBoxLayout()
        self.browser = QtWidgets.QTextBrowser()
        self.browser.setLineWrapMode(QtWidgets.QTextEdit.NoWrap)
        self.browser.setHtml(self.text)
        self.browser.setReadOnly(True)
        self.browser.setFontWeight(18)
        self.layout.addWidget(self.browser)
        self.parent.setLayout(self.layout)

    def ScoreToColor(self, score):
        '''
        return the color rgb value according to score
        1. score ranges 0.8~1.0
        2. color format #ff{A}{A} , A is calculated according to Score
        '''
        score = float(score)
        if score < 0.8:
            return "#ffffff"  # white

        ColorScore = int(125 - (score - 0.8) * 10 * 62)
        return "#ff%.2x%.2x" % (ColorScore, ColorScore)

    def AssembleHtml(self, jsondata):
        '''
        jsondata :dic: {'vulFuncName':{'info':['funcName','BinPath','BinName'], "rank":[[[like info], score], [[like info], score]]}}
        '''
        _html_template = """
                              <html>
                              <head>
                              <style>%(style)s</style>
                              </head>
                              <body>
                              <table class="diff_tab"  cellspacing="3px">
                              %(rows)s
                              </table>
                              </body>
                              </html>
        """
        _style = """
                 table.diff_tab {
                    font-family: Courier monospace;
                    border: solid black;
                    font-size: 15px;
                    text-align: left;
                    width: 100%;
                  }
                  td {
                    border: 2px;
                    border-bottom-style:solid;
                  }
                  """

        head_row_temp = """
                    <tr> 
                    <td class="VulFuncName" rowspan="%(rowNumber)d"> %(vulnerableFunc)s </td>
                    <td class="CandidateName"> %(candidateFunc)s </td>
                    <td class="score" bgcolor=%(bgcolor)s> %(score)s </td>
                    </tr>
        """
        other_row_temp = """
                    <tr> 
                    <td > %(candidateFunc)s </td>
                    <td bgcolor=%(bgcolor)s> %(score)s </td>
                    </tr>
        """
        table_header = """
            <thead> 
                <tr> 
                <th>Functions In Chosen File</th> <th>Functions In IDA</th>  <th>Similarity Scores</th> 
                </tr> 
            </thead>
        """
        table_content = ""
        for vulFunc in jsondata:
            candidates = jsondata[vulFunc]['rank']
            rowNumber = len(candidates)
            if rowNumber == 0:
                continue
            vulInfo = jsondata[vulFunc]['info']
            vulFuncName = vulFunc
            vulFuncBin = vulInfo[2]

            headerRow = head_row_temp % {"rowNumber": rowNumber, "vulnerableFunc": vulFuncName,
                                         "candidateFunc": candidates[0][0][0], "score": candidates[0][1],
                                         "bgcolor": self.ScoreToColor(candidates[0][1])}
            table_content += headerRow
            if rowNumber > 1:
                for candidate in candidates[1:]:
                    table_content += other_row_temp % {"candidateFunc": candidate[0][0], "score": candidate[1],
                                                       "bgcolor": self.ScoreToColor(candidate[1])}

        src = _html_template % {"style": _style, "rows": table_header + table_content}
        logger.debug(src)
        return src

    def Show(self, jsondata, title):
        self.text = self.AssembleHtml(jsondata)
        """Creates the form is not created or focuses it if it was"""
        return PluginForm.Show(self, title, options=PluginForm.WOPN_PERSIST)


# function menus
class MyForm(Form):
    # Main Class for functionality of plugin
    def __init__(self):
        self.invert = False
        F = Form
        F.__init__(
            self,
            r"""STARTITEM 0
            Please choose your option.
            <##Function Feature Generation:{iButton1}> <##Function Similarity:{iButton2}>
            """,
            {
                'iButton1': F.ButtonInput(self.db_generate),
                'iButton2': F.ButtonInput(self.sim_cal)
            }
        )
        self._sqlitefilepath = idc.get_idb_path() + ".sqlite"
        idadir = idaapi.idadir("python")
        self.mainapp_path = os.path.join(idadir, "ASTExtraction",
                                         "main_app.py")  # path to the script of the ast encoding and similarity
        if not os.path.exists(self.mainapp_path):
            logger.error("Python File {} not exists!".format(self.mainapp_path))
        self.application = os.path.join(idadir, "ASTExtraction",
                                        "application.py")  # path to the script of the ast encoding and similarity
        if not os.path.exists(self.application):
            logger.error("Python File {} not exists!".format(self.application))

    # When you click on feature generation button
    def db_generate(self, code=0):
        logger.debug("ast generating...")
        # 1. set file to save; 2. show waiting box and progress 3. invoke ida script for ast generation 4. show sucess box.
        # 1.
        logger.debug("sqlite file path {}".format(self._sqlitefilepath))

        # 2.
        show_wait_box("Processing...")

        # 3.
        # 3.1 extract asts
        try:
            idaapi.require("ASTExtraction")
            idaapi.require("ASTExtraction.ast_generator")
            idaapi.require("ASTExtraction.DbOp")
            g = ASTExtraction.ast_generator.AstGenerator()
            dbop = ASTExtraction.DbOp.DBOP(self._sqlitefilepath)
            g.run(g.get_info_of_func)
            # 3.2 save asts to database
            replace_wait_box("Processing. Saving to Database.")
            recordsNo = g.save_to(dbop)
            logger.info("%d records are inserted into database." % recordsNo)
            del dbop
        except Exception as e:
            replace_wait_box(str(e))
            logger.error("ASTExtraction import error! {}".format(e))
            time.sleep(2)
        # 4.
        replace_wait_box("AST Encoding...")
        logger.error("ast encoding 0")
        cmd = 'python "{}" --dbpath "{}"'.format(self.application, self._sqlitefilepath)
        idaapi.msg("[Asteria] >>> AST Encoding...[{}]\n".format(cmd))
        logger.error("ast encoding 1")
        returncode = self.invoke_system_python(cmd, hidden_window=False)
        if returncode:
            idaapi.msg("[Asteria] >>> AST Encoding failed\n")
        hide_wait_box()
        idaapi.msg("[Asteria] >>> AST Encoding Finished\n")

    def invoke_system_python(self, cmd, hidden_window=True):
        '''
        cmd : str: command to be executed
        hidden_window: bool:
        return : subprocess.returncode
        '''
        startupinfo = subprocess.STARTUPINFO()
        # to hidden the cmd window
        if hidden_window:
            if 'win32' in str(sys.platform).lower():
                startupinfo.dwFlags = subprocess.CREATE_NEW_CONSOLE | subprocess.STARTF_USESHOWWINDOW
                startupinfo.wShowWindow = subprocess.SW_HIDE

            # p = subprocess.Popen(cmd,startupinfo= startupinfo, stdout=subprocess.PIPE)#, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, bufsize=0)
            # while p.poll() == None:
            #     idaapi.msg(p.stdout.readline())
            # return p.returncode

        p = subprocess.Popen(cmd,
                             startupinfo=startupinfo)  # , stdout=subprocess.PIPE, stderr=subprocess.STDOUT, bufsize=0)
        # while p.poll() == None:
        #     idaapi.msg(p.stdout.readline().strip())
        # p.stdout.flush()
        return p.wait()

    def sim_cal(self, code=0):
        logger.debug("Similarity Calculation...")
        # : 1. make sure current idb has generated sqlite db file 2. Choose sqlite db file to compare with 3. do calculation 4. show similarity results.
        # 1.
        if os.path.exists(self._sqlitefilepath):
            # 2.
            sqlite_database2, _ = QtWidgets.QFileDialog.getOpenFileName()  # another sqlite db file for similarity calculation with the sqlite db file generated by current idb
            idaapi.msg("[Asteria] >>> The sqlite file to be calculated is {}\n".format(sqlite_database2))
            # 3. 3.1: TODO: should check the python environment first
            # 3.2: do the calculation and save the results to file

            result_path = self._sqlitefilepath.split(".")[0] + ".json"
            cmd = 'python "{}" --result "{}" "{}" "{}"'.format(self.mainapp_path, result_path, sqlite_database2,
                                                               self._sqlitefilepath)
            logger.info("Calculation CMD: {}".format(cmd))
            show_wait_box("Calculating... Please wait.")
            returncode = self.invoke_system_python(cmd, hidden_window=True)
            hide_wait_box()
            if returncode:
                # cmd execution fails
                idaapi.msg("[Asteria] >>> Calcualtion failed, {}.\n".format(returncode))
            # 3.3 load the results
            if not os.path.exists(result_path):
                idaapi.msg("[Asteria] >>> Result file does not exists after the calculation, please see the log.\n")
                return

            results = json.load(open(result_path, 'r'))
            # logger.info(str(results))
            # show window of result
            window = CHtmlViewer()
            title = "Similarity Scores"
            window.Show(results, title)
            self.Close(1)
            # self.Free()
        else:
            show_wait_box("Please run AST Generation first.")
            while not user_cancelled():
                pass
            hide_wait_box()
            return


class AsteriaPlugin(idaapi.plugin_t):
    """
    This is the main class of the plugin. It subclasses plugin_t as required
    by IDA. It holds the modules of plugin, which themselves provides the
    functionality of the plugin.
    """
    # plugin information
    flags = idaapi.PLUGIN_UNL
    comment = "This is a plugin for the Asteria (https://github.com/Asteria-BCSD/Asteria)"
    wanted_name = "Asteria"  # name of plugin
    wanted_hotkey = "Alt-F1"  # hot key
    help = "Please see README in https://github.com/Asteria-BCSD/Asteria"

    def init(self):
        if not self.load_plugin_decompiler():
            idaapi.msg("[Asteria]Failed to load hexray plugin.")
            return idaapi.PLUGIN_SKIP

        # To load the modules in /path/to/ida/python/Asteria

        try:
            idaapi.require("ASTExtraction")
            # idaapi.require("ASTExtraction.ast_generator")
            # idaapi.require("ASTExtraction.DbOp")
        except Exception as e:
            idaapi.msg(
                "[Asteria] Plugin initialization failed. Please read the README file and copy 'ASTExtraction' dir of the project to '/path/to/ida/python/'\n")
            logger.error(e.args)
            return idaapi.PLUGIN_SKIP

        idaapi.msg("[Asteria] >>> Asteria plugin initialization finished!\n")
        return idaapi.PLUGIN_OK  # return PLUGIN_KEEP

    def run(self, arg):
        self.form = MyForm()
        self.form.Compile()
        self.form.Execute()
        logger.debug(">>> Asteria plugin exists")

    def term(self):
        idaapi.msg("[Asteria] >>> Asteria plugin ends.\n")

    def load_plugin_decompiler(self):
        '''
        load the hexray plugins
        :return: success or not
        '''
        is_ida64 = idc.get_idb_path().endswith(".i64")
        if not is_ida64:
            idaapi.load_plugin("hexrays")
            idaapi.load_plugin("hexarm")
        else:
            idaapi.load_plugin("hexx64")
        if not idaapi.init_hexrays_plugin():
            logger.error('[+] decompiler plugins load failed. IDAdb: %s' % idc.get_root_filenamePath())
            return False
        return True


def PLUGIN_ENTRY():
    # this function returns an instance of an idapython plugin

    return AsteriaPlugin()

You have to have idapython working in IDA Pro as well. I don't use the plugin but mainly use it through my terminal since I use it on a large folder of binaries. I am running it through wine on linux with the command: python API_ast_generator.py --ida_path /home/ajonsson/.wine/drive_c/IDAPro7.5/idat --directory /home/../../..//data/01_raw/malicious --database malicious.sqlite The directory malicious is where I store my binary files that I want to analyze.

And in the code I use:

if platform == 'linux':
        Header = "wine"
cmd_list = [Header, self.args.ida_path, "-c" ,"-A", '-S"%s %s"' % (self.Script, IDA_ARGS), binary_path]
cmd = " ".join(cmd_list)

That gives me an sqlite database with the ast_trees as pickle dump. When I open the malicious.sqlite database I see that the encoding columns is still empty at this point. Do you get to this point? To get the encoding I run application.py with the function that I provided in my commend above (comment nr. 2). Simply add that function and then

db_path = "malicious.sqlite"
app = Application(load_path='saved_model.pt')
app.encode_ast_in_db(db_path)

I later changed to using feather instead of sqlite because it is much faster and takes less space. So I first change the database from sqlite to feather format. Then this is my function to do the encodings:

def encode_ast_in_feather(self, file):
        '''
        :param file: path to feather file
        '''
        import swifter
        df = pd.read_feather(file)
        print('Encoding ASTs in {}'.format(file))
        print('Number of rows: {}'.format(df.shape[0]))
        encodings = df.swifter.apply(lambda x: self.encode_ast(pickle.loads(x["ast_pick_dump"])), axis=1)
        df["ast_encode"] = encodings
        df.to_feather(file)

Hope that works for you. Otherwise let me know.

Hi @Adalsteinnjons,

Thanks for sharing this information. I am gonna test it right away. Also, can you please mention which python version of IDA did you use, 2 or 3?
Based on the changes done in plugin, it seems it is Python3 now. Pl confirm once.

Hi @Adalsteinnjons,
I tried the above file but the SQLite file is not generated anywhere. I am not sure what is going wrong in the backend. The command run via Popen seems to be working fine with return code 0 (OK).

Can you please share the behavior of running the API_ast_generator.py script on your machine?

Hi @Adalsteinnjons,

Just a gentle reminder about the issue. I am still stuck with the thing. I would really appreciate your help.

Thanks.

Hi @Asteria-BCSD,
It will be really helpful if you can help with this.

Thanks

Hi, I will respond tomorrow.

Hi, I will respond tomorrow.

Sure, thanks!

Hi, I used IDA version 7.5 and python 3.9.
Do you have any error outputs? What do you see when you run
python API_ast_generator.py --ida_path <<path yo your idat dir>> --directory <<path to your binary folder>> --database <<dbname>>.sqlite ?

I don’t see any errors, that’s the issue. The command runs fine. I tried to see via putting print statements. No luck!

On Tue, 19 Apr 2022 at 2:20 PM, Adalsteinn Jonsson ***@***.***> wrote: Hi, I used IDA version 7.5 and python 3.9. Do you have any error outputs? What do you see when you run python API_ast_generator.py --ida_path <<path yo your idat dir>> --directory <<path to your binary folder>> --database <<dbname>>.sqlite ? — Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGRDZTAUTSSARMNK66YV6M3VFZXULANCNFSM5RK2VCUA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

--

Ok I see. Not sure what could be the problem on your end. I don't think I changed anythin in ast_generator.py.
Here is what it looks like when I execute it:

I think to debug you have to use l.error('debug messge') in the ast_generator.py file. Then you should see it in the command line. The print statements don't work

@kgautam01
Here is my ast_generator.py
Try using this one:

# encoding=utf-8
"""
Python2.7
IDAPython script running with Hexray plugin !!!
usage: idat -S ast_generator.py binary|binary.idb
Extracting the asts of all functions in the binary file and save the function information along with ast to the database file
"""

import idautils
import idaapi
from idc import *
from idaapi import *
from idautils import *
import logging, os, sys

root = os.path.dirname(os.path.abspath(__file__))
sys.path.append(root)
import pickle
from DbOp import DBOP

l = logging.getLogger("ast_generator.py")
l.addHandler(logging.StreamHandler())
l.addHandler(logging.FileHandler("ast_generator.log"))
l.handlers[0].setFormatter(logging.Formatter("%(filename)s : %(message)s"))
l.handlers[0].setLevel(logging.ERROR)
l.handlers[1].setLevel(logging.INFO)
IDA700 = False
l.info("IDA Version {}".format(IDA_SDK_VERSION))
if IDA_SDK_VERSION >= 700:
    # IDAPro 6.x To 7.x (https://www.hex-rays.com/products/ida/support/ida74_idapython_no_bc695_porting_guide.shtml)
    l.info("Using IDA7xx API")
    IDA700 = True
    GetOpType = get_operand_type
    GetOperandValue = get_operand_type
    SegName = get_segm_name
    autoWait = auto_wait
    GetFunctionName = get_func_name
    GetIdbPath = get_idb_path
    GetInputFile = get_root_filename
    GetInputFilePath = get_input_file_path
    import ida_pro

    Exit = ida_pro.qexit


# ---- prepare environment
def wait_for_analysis_to_finish():
    '''
    :return:
    '''
    l.info('[+] waiting for analysis to finish...')
    autoWait()
    l.info('[+] analysis finished')


def load_plugin_decompiler():
    '''
    load the hexray plugins
    :return: success or not
    '''
    is_ida64 = GetIdbPath().endswith(".i64")
    if not is_ida64:
        idaapi.load_plugin("hexrays")
        idaapi.load_plugin("hexarm")
    else:
        idaapi.load_plugin("hexx64")
    if not idaapi.init_hexrays_plugin():
        l.error('[+] decompiler plugins load failed. IDAdb: %s' % GetInputFilePath())
        idc.Exit(0)


wait_for_analysis_to_finish()
load_plugin_decompiler()

# -----------------------------------

# --------------------------
spliter = "************"


class Visitor(idaapi.ctree_visitor_t):
    # preorder traversal tree
    def __init__(self, cfunc):
        idaapi.ctree_visitor_t.__init__(self, idaapi.CV_FAST | idaapi.CV_INSNS)
        self.cfunc = cfunc
        self._op_type_list = []
        self._op_name_list = []
        self._tree_struction_list = []
        self._id_list = []
        self._statement_num = 0
        self._callee_set = set()
        self._caller_set = set()
        self.root = None  # root node of tree

    # Generate the sub tree
    def GenerateAST(self, ins):

        self._statement_num += 1
        AST = Tree()
        try:
            l.info("[insn] op  %s" % (ins.opname))
            AST.op = ins.op
            AST.opname = ins.opname

            if ins.op == idaapi.cit_block:
                self.dump_block(ins.ea, ins.cblock, AST)
            elif ins.op == idaapi.cit_expr:
                AST.add_child(self.dump_expr(ins.cexpr))

            elif ins.op == idaapi.cit_if:
                l.info("[if]" + spliter)
                cif = ins.details
                cexpr = cif.expr
                ithen = cif.ithen
                ielse = cif.ielse

                AST.add_child(self.dump_expr(cexpr))
                if ithen:
                    AST.add_child(self.GenerateAST(ithen))
                if ielse:
                    AST.add_child(self.GenerateAST(ielse))

            elif ins.op == idaapi.cit_while:
                cwhile = ins.details
                self.dump_while(cwhile, AST)

            elif ins.op == idaapi.cit_return:
                creturn = ins.details
                AST.add_child(self.dump_return(creturn))

            elif ins.op == idaapi.cit_for:
                l.info('[for]' + spliter)
                cfor = ins.details
                AST.add_child(self.dump_expr(cfor.init))
                AST.add_child(self.dump_expr(cfor.step))
                AST.add_child(self.dump_expr(cfor.expr))
                AST.add_child(self.GenerateAST(cfor.body))
            elif ins.op == idaapi.cit_switch:
                l.info('[switch]' + spliter)
                cswitch = ins.details
                cexpr = cswitch.expr
                ccases = cswitch.cases  # Switch cases: values and instructions.
                cnumber = cswitch.mvnf  # Maximal switch value and number format.
                AST.add_child(self.dump_expr(cexpr))
                self.dump_ccases(ccases, AST)
            elif ins.op == idaapi.cit_do:
                l.info('[do]' + spliter)
                cdo = ins.details
                cbody = cdo.body
                cwhile = cdo.expr
                AST.add_child(self.GenerateAST(cbody))
                AST.add_child(self.dump_expr(cwhile))
            elif ins.op == idaapi.cit_break or ins.op == idaapi.cit_continue:
                pass
            elif ins.op == idaapi.cit_goto:
                pass
            else:
                l.warning('[error] not handled op type %s' % ins.opname)

        except:
            l.warning("[E] exception here ! ")

        return AST

    def visit_insn(self, ins):
        # pre-order visit ctree Generate new AST
        # ins maybe None , why ?

        if not ins:
            return 1
        # l.info("[AST] address and op %s %s" % (hex(ins.ea), ins.opname))
        self.root = self.GenerateAST(ins)
        l.info(self.root)
        return 1

    def dump_return(self, creturn):
        '''
        return an expression?
        '''
        return self.dump_expr(creturn.expr)

    def dump_while(self, cwhile, parent):
        '''
        visit while statement
        return:
            condition: expression tuple
            body : block
        '''
        expr = cwhile.expr
        parent.add_child(self.dump_expr(expr))
        whilebody = None
        body = cwhile.body
        if body:
            parent.add_child(self.GenerateAST(body))

    def dump_ccases(self, ccases, parent_node):
        '''
        :param ccases:
        :return: return a list of cases
        '''
        for ccase in ccases:
            AST = Tree()
            AST.opname = 'case'
            AST.op = ccase.op
            l.info('case opname %s, op %d' % (ccase.opname, ccase.op))
            value = 0  # default
            size = ccase.size()  # List of case values. if empty, then 'default' case , ： 'acquire', 'append', 'disown', 'next', 'own
            if size > 0:
                value = ccase.value(0)
            AST.value = value
            block = self.dump_block(ccase.ea, ccase.cblock, AST)
            parent_node.add_child(AST)

    def dump_expr(self, cexpr):
        '''
        l.info the expression
        :return: AST with two nodes op and oprand : op Types.NODETYPE.OPTYPE, oprand : list[]
        '''
        # l.info "dumping expression %x" % (cexpr.ea)

        oprand = []  # a list of Tree()
        l.info("[expr] op %s" % cexpr.opname)

        if cexpr.op == idaapi.cot_call:
            # oprand = args
            # get the function call arguments
            self._get_callee(cexpr.ea)
            l.info('[call]' + spliter)
            args = cexpr.a
            for arg in args:
                oprand.append(self.dump_expr(arg))
        elif cexpr.op == idaapi.cot_idx:
            l.info('[idx]' + spliter)
            oprand.append(self.dump_expr(cexpr.x))
            oprand.append(self.dump_expr(cexpr.y))

        elif cexpr.op == idaapi.cot_memptr:
            l.info('[memptr]' + spliter)
            # TODO
            AST = Tree()
            AST.op = idaapi.cot_num  # consider the mem size pointed by memptr
            AST.value = cexpr.ptrsize
            AST.opname = "value"
            oprand.append(AST)
            # oprand.append(cexpr.m) # cexpr.m : member offset
            # oprand.append(cexpr.ptrsize)
        elif cexpr.op == idaapi.cot_memref:

            offset = Tree()
            offset.op = idaapi.cot_num
            offset.opname = "offset"
            offset.addr = cexpr.ea
            offset.value = cexpr.m
            oprand.append(offset)

        elif cexpr.op == idaapi.cot_num:
            l.info('[num]' + str(cexpr.n._value))
            AST = Tree()
            AST.op = idaapi.cot_num  # consider the mem size pointed by memptr
            AST.value = cexpr.n._value
            AST.opname = "value"
            oprand.append(AST)

        elif cexpr.op == idaapi.cot_var:

            var = cexpr.v
            entry_ea = var.mba.entry_ea
            idx = var.idx
            ltree = Tree()
            ltree.op = idaapi.cot_memptr
            ltree.addr = cexpr.ea
            ltree.opname = 'entry_ea'
            ltree.value = entry_ea
            oprand.append(ltree)
            rtree = Tree()
            rtree.value = idx
            rtree.op = idaapi.cot_num
            rtree.addr = cexpr.ea
            rtree.opname = 'idx'
            oprand.append(rtree)

        elif cexpr.op == idaapi.cot_str:
            # string constant
            l.info('[str]' + cexpr.string)
            AST = Tree()
            AST.opname = "string"
            AST.op = cexpr.op
            AST.value = cexpr.string
            oprand.append(AST)

        elif cexpr.op == idaapi.cot_obj:
            l.info('[cot_obj]' + hex(cexpr.obj_ea))
            # oprand.append(cexpr.obj_ea)
            # Many strings are defined as 'obj'
            # I wonder if 'obj' still points to other types of data?
            # notice that the address of 'obj' is not in .text segment
            if get_segm_name(getseg(cexpr.obj_ea)) not in ['.text']:
                AST = Tree()
                AST.opname = "string"
                AST.op = cexpr.op
                AST.value = GetString(cexpr.obj_ea)
                oprand.append(AST)

        elif cexpr.op <= idaapi.cot_fdiv and cexpr.op >= idaapi.cot_comma:
            # All binocular operators
            oprand.append(self.dump_expr(cexpr.x))
            oprand.append(self.dump_expr(cexpr.y))

        elif cexpr.op >= idaapi.cot_fneg and cexpr.op <= idaapi.cot_call:
            # All unary operators
            l.info('[single]' + spliter)
            oprand.append(self.dump_expr(cexpr.x))
        else:
            l.warning('[error] %s not handled ' % cexpr.opname)
        AST = Tree()
        AST.opname = cexpr.opname
        AST.op = cexpr.op
        for tree in oprand:
            AST.add_child(tree)
        return AST

    def dump_block(self, ea, b, parent):
        '''
        :param ea: block address
        :param b:  block_structure
        :param parent: parent node
        :return:
        '''
        # iterate over all block instructions
        for ins in b:
            if ins:
                parent.add_child(self.GenerateAST(ins))

    def get_pseudocode(self):
        sv = self.cfunc.get_pseudocode()
        code_lines = []
        for sline in sv:
            code_lines.append(tag_remove(sline.line))
        return "\n".join(code_lines)

    def get_caller(self): # perhaps change the 0 to 1 here to get callers in the flow as well
        call_addrs = list(idautils.CodeRefsTo(self.cfunc.entry_ea, 0))
        return len(set(call_addrs))

    def get_callee(self):
        return len(self._callee_set)

    def _get_callee(self, ea):
        '''
        :param ea:  where the call instruction points to
        :return: None
        '''
        l.info('analyse addr %s callee' % hex(ea))
        addrs = list(idautils.CodeRefsFrom(ea, 0))
        for addr in addrs:
            if addr == GetFunctionAttr(addr, 0):
                self._callee_set.add(addr)


class AstGenerator():

    def __init__(self, optimization_level="default", compiler='gcc'):
        '''
        :param optimization_level: the level of optimization when compile
        :param compiler: the compiler name like gcc
        '''
        if optimization_level not in ["O0", "O1", "O2", "O3", "Os", "default"]:
            l.warning("No specific optimization level !!!")
        self.optimization_level = optimization_level
        self.bin_file_path = GetInputFilePath()  # path to binary
        self.file_name = GetInputFile()  # name of binary
        # get process info
        self.bits, self.arch, self.endian = self._get_process_info()
        self.function_info_list = list()
        # Save the information of all functions, of which ast class is saved using pick.dump.
        # Each function is saved with a tuple (func_name. func_addr, ast_pick_dump, pseudocode, callee, caller)

    def _get_process_info(self):
        '''
        :return: 32 or 64 bit, arch, endian
        '''
        info = idaapi.get_inf_structure()
        bits = 32
        if info.is_64bit():
            bits = 64
        try:
            is_be = info.is_be()
        except:
            is_be = info.mf
        endian = "big" if is_be else "little"
        return bits, info.procName, endian

    def progreeBar(self, i):
        sys.stdout.write('\r%d%% [%s]' % (int(i), "#" * i))
        sys.stdout.flush()

    def run(self, fn, specical_name=""):
        '''
        :param fn: a function to handle the functions in binary
        :param specical_name: specific function name while other functions are ignored
        :return:
        '''
        if specical_name != "":
            l.info("specific functino name %s" % specical_name)
        for i in range(0, get_func_qty()):
            func = getn_func(i)
            self.progreeBar(int((i * 1.0) / get_func_qty() * 100))
            segname = get_segm_name(getseg(func.start_ea))
            if segname[1:3] not in ["OA", "OM", "te", "_t"]:
                continue
            func_name = GetFunctionName(func.start_ea)
            if len(specical_name) > 0 and specical_name != func_name:
                continue
            try:
                ast_tree, pseudocode, callee_num, caller_num = fn(func)
                self.function_info_list.append(
                    (func_name, hex(func.start_ea), pickle.dumps(ast_tree), pseudocode, callee_num, caller_num))
            except Exception as e:
                l.error("%s error" % fn)
                l.error(str(e))

    def save_to(self, db):
        '''
        :param db: DBOP instance
        :return:
        '''
        N = 0
        l.info("%s records to be inserted" % len(self.function_info_list))
        for info in self.function_info_list:
            try:
                db.insert_function(info[0], info[1], self.file_name, self.bin_file_path,
                                   self.arch + str(self.bits), self.endian, self.optimization_level, info[2], info[3],
                                   info[4], info[5])
                N += 1
            except Exception as e:
                l.error("insert operation exception when insert %s" % self.bin_file_path + " " + info[0])
                l.error(str(e))
                # l.error(e.message)
        return N

    @staticmethod
    def get_info_of_func(func):
        '''
        :param func:
        :return:
        '''
        # l.error("get_info_of_func")
        try:
            cfunc = idaapi.decompile(func.start_ea)
            vis = Visitor(cfunc)
            vis.apply_to(cfunc.body, None)
            # l.error(vis.root)
            # l.error(vis.get_callee())
            # l.error(vis.get_caller())
            # l.error('generate ast tree')
            # ast = vis.GenerateAST(vis.root)
            # ast.print_tree()
            return vis.root, vis.get_pseudocode(), vis.get_callee(), vis.get_caller()
        except:
            l.error("Function %s decompilation failed" % (GetFunctionName(func.start_ea)))
            raise


class Tree(object):
    def __init__(self):
        self.parent = None
        self.num_children = 0
        self.children = list()
        self.op = None  #
        self.value = None  #
        self.opname = ""  #

    def add_child(self, child):
        child.parent = self
        self.num_children += 1
        self.children.append(child)

    def size(self):
        try:
            if getattr(self, '_size'):
                return self._size
        except AttributeError as e:
            count = 1
            for i in range(self.num_children):
                count += self.children[i].size()
            self._size = count
            return self._size

    def depth(self):
        if getattr(self, '_depth'):
            return self._depth
        count = 0
        if self.num_children > 0:
            for i in range(self.num_children):
                child_depth = self.children[i].depth()
                if child_depth > count:
                    count = child_depth
            count += 1
        self._depth = count
        return self._depth

    def __str__(self):
        return self.opname


if __name__ == '__main__':
    import argparse

    ap = argparse.ArgumentParser()
    ap.add_argument("-o", "--optimization", default="default", help="optimization level when compilation")
    ap.add_argument("-f", "--function", default="", help="extract the specific function info")
    ap.add_argument("-g", "--compiler", default="gcc", help="compiler name adopted during compilation")
    ap.add_argument("-d", "--database", default="default.sqlite", type=str, help="path to database")
    args = ap.parse_args(idc.ARGV[1:])
    astg = AstGenerator(args.optimization, compiler=args.compiler)
    astg.run(astg.get_info_of_func, specical_name=args.function)
    # astg.run(astg.get_info_of_func, specical_name="SSL_get_ciphers") # this line code for test
    dbop = DBOP(args.database)
    astg.save_to(dbop)
    del dbop  # free to call dbop.__del__() , flush database
    Exit(0)

I think to debug you have to use l.error('debug message) in the ast_generator.py file. Then you should see it in the command line. The print statements don't work

Hi,
I have made all the changes in the required files.
Can this process be run remotely i.e, without GUI?
I am not sure whether that is causing the issue.

I am using IDA Pro 7.7, Python3.9, and Ubuntu20.04.

Also, if it is possible for you to meet online then that would be great to resolve the issue. Please let me know.

Does the GUI open up on your end? The IDA pro GUI?
It doesn't have to open up. On my side everything is on the command line.
If you have the GUI then you might have the path to ida and not idat.
IDA can be launched with one of the following command lines:

    ida input-file        (Start graphical interface)
    idat input-file       (Start text interface)

For me its this path:
.wine/drive_c/IDAPro7.5/idat
Make sure you have that and not the 'ida'.

Does the GUI open up on your end? The IDA pro GUI? It doesn't have to open up. On my side everything is on the command line. If you have the GUI then you might have the path to ida and not idat. IDA can be launched with one of the following command lines:
    ida input-file        (Start graphical interface)
    idat input-file       (Start text interface)
For me its this path: .wine/drive_c/IDAPro7.5/idat Make sure you have that and not the 'ida'.

No GUI opened at my end. I was following the same steps.
For me, the only difference is that the paths of wine and idat64 are different.
I just rechecked and changed the path to idat64 bin. It is now .wine/drive_/idat64. Using this as the path and rest of the command same as yours, I am getting wine usage error.
Command I used:
python ASTExtraction/API_ast_generator.py --ida_path ~/.wine/drive_c/idat64 --directory <path to dir> --database demo.sqlite

Did you set your environment variables in wine such that idaPython works? This was the first thing I had to do. To test if idapython works you can start the GUI of IDA Pro and then look in the lower left corner to see if you can use python.

Did you set your environment variables in wine such that idaPython works? This was the first thing I had to do. To test if idapython works you can start the GUI of IDA Pro and then look in the lower left corner to see if you can use python.

Hi,
Apologies for the delayed response!

I tried what you suggested. I have not added a path in wine but python still seems to work. PL find the below images. Though I received a warning message at the time of loading a binary file.

Ok I see. I had a couple of errors with the plugin in the beginning as well. I fixed most of them by updating the IdaPython commands to the newest version such that I was able to launch the asteria plugin in Ida pro. Perhaps you try to get that to work first since you get useful debugging messages here and not when you launch it from the command line. Good luck.

…

On Sat, Apr 23, 2022 at 11:59 Kuldeep Gautam ***@***.***> wrote: Did you set your environment variables in wine such that idaPython works? This was the first thing I had to do. To test if idapython works you can start the GUI of IDA Pro and then look in the lower left corner to see if you can use python. [image: image] <https://user-images.githubusercontent.com/28142119/164226062-f1507729-41eb-4c85-a495-a257558cf1d9.png> Hi, Apologies for the delayed response! I tried what you suggested. I have not added a path in wine but python still seems to work. PL find the below images. Though I received a warning message at the time of loading a binary file. [image: Screenshot from 2022-04-23 15-25-16] <https://user-images.githubusercontent.com/27409612/164889781-162fde2e-2169-4fd6-a812-17c492a1a50d.png> [image: Screenshot from 2022-04-23 15-26-23] <https://user-images.githubusercontent.com/27409612/164889785-c508a599-d255-46f8-bf46-08c11ad1f83b.png> — Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGWWUJ7OXA6U7Q455H7IORTVGPCYTANCNFSM5RK2VCUA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Ok I see. I had a couple of errors with the plugin in the beginning as well. I fixed most of them by updating the IdaPython commands to the newest version such that I was able to launch the asteria plugin in Ida pro. Perhaps you try to get that to work first since you get useful debugging messages here and not when you launch it from the command line. Good luck.
…
On Sat, Apr 23, 2022 at 11:59 Kuldeep Gautam @.> wrote: Did you set your environment variables in wine such that idaPython works? This was the first thing I had to do. To test if idapython works you can start the GUI of IDA Pro and then look in the lower left corner to see if you can use python. [image: image] https://user-images.githubusercontent.com/28142119/164226062-f1507729-41eb-4c85-a495-a257558cf1d9.png Hi, Apologies for the delayed response! I tried what you suggested. I have not added a path in wine but python still seems to work. PL find the below images. Though I received a warning message at the time of loading a binary file. [image: Screenshot from 2022-04-23 15-25-16] https://user-images.githubusercontent.com/27409612/164889781-162fde2e-2169-4fd6-a812-17c492a1a50d.png [image: Screenshot from 2022-04-23 15-26-23] https://user-images.githubusercontent.com/27409612/164889785-c508a599-d255-46f8-bf46-08c11ad1f83b.png — Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGWWUJ7OXA6U7Q455H7IORTVGPCYTANCNFSM5RK2VCUA . You are receiving this because you were mentioned.Message ID: @.>

Sure, let me try that once then!
Just to confirm, there is no need to set env variables in wine, right?

No, since python works you don't have to set it.

No, since python works you don't have to set it.

Can you share the location where your 'hexx64' plugin is present in the tool?

Hi @Adalsteinnjons,

I tried creating the setup with IDA pro 7.7. I am not able to generate ASTs as it returns an error with return code 1. Can you please specify the steps you followed to generate the ASTs? As per readme, I had copied the file and dir in their required place in plugins and python dirs inside IDA pro 7.7. While running the command, I am providing the absolute path to the 'idat64' bin as well. ''TVHEADLESS=1' flag was also not recognized.

It would be really helpful if you can share the steps you followed along with the python and IDA versions used in the process.

Thanks.

Hello,I also met the same problem, did you solve it?

The original AST generating script was written for IDA Pro 7.0 and higher prior to version 7.4.
If you wish to utilize it in IDA Pro 7.4 or later, please upgrade the IDA APIs by following the procedures in Porting from IDAPython 6.x-7.3, to #7.4.

	if __name__ == '__main__':
	# end_to_end_evaluation()
	# time_consumption_statistics()
	# db_names = [""]
	# app.encode_ast_in_db("/root/data/firmwares/vul.sqlite")
	# app.encode_ast_in_db("/root/data/firmwares/Dlinkfirmwares.sqlite")
	db_path = "/root/data/firmwares/Netgearfirmwares.sqlite"
	# app.encode_ast_in_db(db_path, table_name='NormalTreeLSTM', where_suffix=" where elf_file_name like 'libcrypto%' ")

	# 4.
	ida_kernwin.replace_wait_box("AST Encoding...")

	cmd = 'python "{}" --dbpath "{}"'.format(self.application, self._sqlitefilepath)
	idaapi.msg("[Asteria] >>> AST Encoding...[{}]\n".format(cmd))