How to traverse the ast tree?
Closed this issue · 12 comments
Say I have two source code file1.cpp and file2.cpp written in C++, I want to know the difference code between them and locate this difference code to the function. I kind of get that this cppparser is able to get ast of the source code, but i'm still confused how to traverse the ast tree and compare two ast trees from different source code? Could anyone help me with this?
Thanks for your wonderful reply! The example is very clear how the code is structed. However, do you have any idea on how to destructualize the code? What i'm saying is for a certain line of code within a function, I want to collect the all of the components, e.g, int a = b + c, then i need to collect int,a,=,b,+,c. It confuses me how to use this parser to traverse the data structure. I'd be very appreciated if you can give me any hints, a short example code might be the best :)
@dandelion915
Sure thing mate.
As I understand it, you want to get familiar with how AST expresses a piece of code.
If so, I highly recommend checking out this example file from my fork.
It goes through manually creating for-loops, arrays, functions, pointers, expressions (chained or single), ... . It should provide you with the initial insight into the AST structure needed to write your own code.
So for a basic int a = b+c;
, you have a CppVar
instance as in below:
auto* myVar = new CppVar(
new CppVarType("int"),
CppVarDecl(
"a",
new CppExpr(new CppExprAtom("b"), CppOperator::kPlus, new CppExprAtom("c"));,
AssignType::kUsingEqual)
);
Now, as you can see, b
and c
are expressed as CppExprAtom
instances. All expressions whether they are singular or chained, are expressed using nested CppExpr
instances. Variables can have their initial value assignments in CppVar
using CppVarDecl
.
There are much more details about the AST structure that cannot be covered here. Basically, you need a good debugging environment like MS VS or CLion to inspect the AST of a piece of code generated by CppParser
and then work your way up from there.
{ .... }
is called a block and is represented by a CppCompound
of type CppCompoundType::kBlock
. Its children are stored in CppCompund::members_
. CppCompund::add_member()
could be used to add them one by one.
Also, have a look at cppast.
Thank for your speedy reply. I have seen your wonderful example.But what i'm saying is a reverse process of your example. To be specific, let's say i have an input hello-world.cpp file like this:
#include <iostream>
int add(int a, int b)
{
int c = a + b;
return c;
}
int main()
{
std::cout << "Hello World\n";
int res = add(1, 2);
return 0;
}
The purpose is to get the function body with in add() and main() using cppparser:
CppParser parser;
CppCompoundPtr ast = parser.parseFile(std::string
("hello-world.cpp"));
const auto& members = ast->members();
CppIncludeEPtr hashInclude = members[0];
CppFunctionEPtr addFunc = members[1];
const auto& mainBodyMembers = addFunc->defn()->members();
CppVarEPtr abPlus = mainBodyMembers[0];
std::cout << abPlus->name() << std::endl; //c
AssignType atype = abPlus->assignType();
std::cout << (atype == AssignType::kUsingEqual) << std::endl; // =,kUsingEqual
const auto& avalue = abPlus->assignValue();
std::cout << (avalue->oper_ == CppOperator::kPlus) << std::endl; // +,kPlus
std::cout << *(avalue->expr1_.expr->expr1_.atom) << std::endl; //a
std::cout << *(avalue->expr2_.expr->expr1_.atom) << std::endl; //b
CppExprEPtr returnC = mainBodyMembers[1];
std::cout << addFunc->retType_->baseType() << std::endl; //return int
std::cout << *(returnC->expr1_.atom) << std::endl; //c
CppFunctionEPtr mainFunc = members[2];
const auto& mainBodyMembers2 = mainFunc->defn()->members();
CppExprEPtr coutHelloWorld = mainBodyMembers2[0];
CppExprAtom expr1 = coutHelloWorld->expr1_;
CppExprAtom expr2 = coutHelloWorld->expr2_;
std::cout << *(expr1.expr->expr1_.atom) << std::endl; //std::cout
std::cout << *(expr2.expr->expr1_.atom) << std::endl; //"Hello World\n"
CppVarEPtr callAdd = mainBodyMembers2[1];
std::cout << callAdd->name() << std::endl; //res
AssignType atype2 = callAdd->assignType();
std::cout << (atype2 == AssignType::kUsingEqual) << std::endl; // =
const auto& avalue2 = callAdd->assignValue();
std::cout << (avalue2->oper_ == CppOperator::kFunctionCall) << std::endl; // kFunctionCall
std::cout << *(avalue2->expr1_.atom) << std::endl; //add
const auto& expr3 = avalue2->expr2_.expr;
std::cout << *(expr3->expr1_.expr->expr1_.atom) << std::endl; //1
std::cout << (expr3->oper_ == CppOperator::kComma) << std::endl; //, kComma
std::cout << *(expr3->expr2_.expr->expr1_.atom) << std::endl; 2
You see, I did debug's hardwork to traverse the data structure from root to the leaf node to get the function body(such as "a","b","Hello World\n"). It confuses me how to use this parser to automatically traverse the data structure of different phrases. Or is there any solution to stop parsing below the function level and just save all the function body as strings? I'd be very appreciated if you can give me any hints, a short example code might be the best :)
I have been skimming through the codes for the past few days and I believe no such functionality has been implemented yet. There are some related codes in the *-info-accessor.h
headers but they are not what you want.
CppAST has visitor.cpp. For CppParser
we can use CppWriter
as a starting point for our visitor with some call back funtions.
I am planning to use CppParser in my own projects and eventually I am going to have to implement visitors and AST matchers as well, but i am not sure about the timeline though. @dandelion915
Hi @satya-das, are you accepting PRs?
Thanks for your reply. I have figured out the exposure of function body by a different thought. First using cppparser to get all the function name and return type name, then use string match algorithm to find the beginning of the function, then math the {
}
. I dont know if this can be any use of you, but I'll provide this code anyway :)
CppCompoundPtr ast = parser.parseFile(filename);
const auto& members = ast->members();
for (const auto& member : members) {
if (member->objType_ == CppObjType::kFunction) {
CppFunctionEPtr func = member;
const string& funcName = func->name_;
const string& retType = func->retType_->baseType();
cout << funcName << endl;
string functionBody = findFunctionBody(filename, funcName,retType);
string findFunctionBody(const string& filename, const string& functionName, const string& retType) {
ifstream file(filename);
string line;
string functionBody;
bool foundFunction = false;
bool insideFunction = false;
int openBrackets = 0;
int count = 0;
int found_count = -2;
while (getline(file, line)) {
count++;
if (!foundFunction && line.find(functionName) != string::npos && line.find(retType) != string::npos) {
foundFunction = true;
insideFunction = true;
found_count = count;
}
if (count == found_count + 1) {
found_count = count;
functionBody += line;
for (char c : line) {
if (c == '{') {
openBrackets++;
}
else if (c == '}') {
openBrackets--;
}
}
if (openBrackets == 0) {
break;
}
}
}
return functionBody;
But this temporary string-match based solution has varies problems. Like comments may also contain function name or {}, so I'm still looking for parsers to get the function body. For producers for cppparser, maybe only adding tostring function to the project can lead to my solution, but sadly there isn't one.
By the way, are you familiar with Clang? I searched and found Clang has such abilities, but there's no tutorial document to get start...
@dandelion915 Oh wonderful. Thank you.
Right now I am working on the visitor pattern to implement a generic visitor with the matching capability.
Clang is quite capable. Unfortunately, there is a catch to it :) It does not expose the entire AST with stable API which is a big "no" for me.
Also, the data structure of the clang AST is way more complex compared to cppparser
, but I guess it's not a good idea to compare these two.
Anyways, in case you need the Doxygen docs, you can access them at this link now.
Dear @dandelion915,
Please have a look at my fork's visitor and AST matcher classes. Everything seems to be working.
CppVisitorMatcher matcher({CppObjType::kExpression, CppObjType::kHashInclude});
ast->accept(&matcher);
This example matches only the expressions and hashInclude nodes. Here, ast
is the root node.
I have not merged it to my fork's main branch (called my-master
) yet, but the code is ready.
Hi @salehjg !
I have another question that I think you must know the answer.
I want to use parser.parseStream
to parse string/stream based code, but somehow this doesn't work.
I wonder which part could possibly go wrong. I'd be very appreciated if u could give me some hint.
int main() {
std::string input = string("int add(int a, int b);");
CppParser parser;
CppCompoundPtr ast = parser.parseStream(&input[0], input.size());
const auto& members = ast->members();
return 0;
}
The parseStream
accepts stream that ends with double null characters.
I have documented this and also calling parseStream
with non conforming stream will throw an exception.
Closing this issue for now.
A lot of improvements have been done and if you need any help please start another issue.