/ljclang

A LuaJIT-based interface to libclang

Primary LanguageCMIT LicenseMIT

LJClang — A LuaJIT-based interface to libclang

Introduction

LJClang is an interface to libclang for LuaJIT, modeled after and mostly API-compatible with luaclang-parser by Michal Kottman.

Requirements

  • LuaJIT 2.0 (latest Git HEAD of the master branch recommended)

  • LLVM/Clang — read the getting started guide to find out how to obtain Clang from source. libclang is built and installed along with the Clang compiler.

Building and usage

Most of LJClang is written in Lua (extensively using LuaJIT’s FFI), but due to currently existing limitations, a support C library has to be built.

In the provided Makefile, adjust the libclang include path, and issue make to build libljclang_support.so.

Note
LJClang has been tested on Ubuntu Linux and Windows (using Clang-Win32), but only minor modifications to the build process should be necessary to get it working with other OSes or configurations.

From here on, LJClang can be used with LuaJIT by issuing a require for "ljclang". One likely wants to use LJClang from its development directory without installing it to a system-wide path. Because it expects to find libljclang_support.so and several supporting Lua files, one approach is to wrap client programs into scripts starting LuaJIT with an environment containing appropriate LD_LIBRARY_PATH and LUA_PATH entries. For example, given the following function in .bashrc,

# "LuaJIT with added path of the script directory"
ljwp ()
{
    local scriptdir=$(cd `dirname $1`; pwd)
    LUA_PATH=";;$scriptdir/?.lua" LD_LIBRARY_PATH="$scriptdir" luajit "$@"
}

and assuming that LJClang resides in ~/dl/ljclang, the extractdecls.lua program described below could be run from anywhere like this:

$~/some/other/dir: ljwp ~/dl/ljclang/extractdecls.lua [args...]

Overview

LJClang provides a cursor-based, callback-driven API to the abstract syntax tree (AST) of C/C++ source files. These are the main classes:

  • Index — represents a set of translation units that could be linked together

  • TranslationUnit — a source file together with everything included by it either directly or transitively

  • Cursor — an element in the AST in a translation unit such as a typedef declaration or a statement

  • Type — the type of an element (for example, that of a variable, structure member, or a function’s input argument or return value)

To make something interesting happen, you usually create a single Index object, parse into it one or many translation units, and define a callback function to be invoked on each visit of a Cursor by libclang.

Example program

The extractdecls.lua script accompanied by LJClang can be used to extract various kinds of C declarations from (usually) headers and print them in various forms usable as FFI C declarations or descriptive tables with LuaJIT.

Usage: ./extractdecls.lua [our_options...] <file.h> [clang_options...]
  -p <filterPattern>
  -x <excludePattern1> [-x <excludePattern2>] ...
  -s <stripPattern>
  -1 <string to print before everything>
  -2 <string to print after everything>
  -C: print lines like
       static const int membname = 123;  (enums/macros only)
  -R: reverse mapping, only if one-to-one. Print lines like
       [123] = "membname";  (enums/macros only)
  -f <formatFunc>: user-provided body for formatting function (enums/macros only)
       Accepts args `k', `v'; `f' is string.format. Must return a formatted line.
       Example: "return f('%s = %s%s,', k, k:find('KEY_') and '65536+' or '', v)"
       Incompatible with -C or -R.
  -Q: be quiet
  -w: extract what? Can be
       EnumConstantDecl (default), TypedefDecl, FunctionDecl, MacroDefinition

In fact, the file ljclang_cursor_kind.lua is generated by this program and is used by LJClang to map values of the enumeration enum CXCursorKind to their names. The bootstrap target in the Makefile extracts the relevant information using these options:

-R -p '^CXCursor_' -x '_First' -x '_Last' -x '_GCCAsmStmt' -x '_MacroInstantiation' -s '^CXCursor_' \
    -1 'return { name={' -2 '}, }' -Q

Thus, the typedef declarations are filtered to begin with “CXCursor_” and all “secondary” names aliasing the one considered the main one are rejected. (For example, CXCursor_AsmStmt and CXCursor_GCCAsmStmt have the same value.) Finally, the prefix is stripped (-s) to yield lines like

[215] = "AsmStmt";

Reference

The module returned by require("ljclang") contains the following:

createIndex([excludePch : boolean [, showDiagnostics : boolean]])Index

Binding for clang_createIndex. Will create an Index into which you can parse TranslationUnits. Both input arguments are optional and default to false.

Note
Loading pre-compiled translation units in not implemented.
ChildVisitResult

An object containing a mapping of names to values permissible as values returned from cursor visitor callbacks: Break, Continue, Recurse.

regCursorVisitor(visitorfunc)vf_handle

Registers a child visitor callback function visitorfunc with LJClang, returning a handle which can be passed to Cursor:children(). The callback function receives two input arguments, (cursor, parent) — with the cursors of the currently visited entity as well as its parent, and must return a value from the ChildVisitResult enumeration to indicate whether or how libclang should carry on AST visiting.

Caution
The cursor passed to the visitor callback is only valid during one particular callback invocation. If it is to be used after the function has returned, it must be copied using the Cursor constructor mentioned below.
Cursor([cur : Cursor])Cursor

A constructor to create a permanent cursor from that received by the visitor callback.

Index

Index:parse(sourceFile : string, args : table [, opts : table])TranslationUnit

Binding for clang_parseTranslationUnit. This will parse a given source file sourceFile with the command line arguments args, which would be given to the compiler for compilation, containing e.g. include paths or defines. If sourceFile is the empty string, the source file is expected to be named in args.

The last optional argument opts is expected to be a sequence containing CXTranslationUnit_* enum names without the "CXTranslationUnit_" prefix, for example { "DetailedPreprocessingRecord" }.

Note
Both args and opts (if given) must not contain an element at index 0.

TranslationUnit

TranslationUnit:cursor()Cursor

Binding for clang_getTranslationUnitCursor. Returns the Cursor representing a given translation unit, which provides access to information about e.g. functions and types defined in a given file.

TranslationUnit:file(fileName : string)string

Binding for clang_getFile. Returns the absolute file path of fileName.

Note
The last modification date is currently not returned as in luaclang-parser.
TranslationUnit:diagnostics(){ Diagnostic* }

Binding for clang_getDiagnostic. Returns a table array of Diagnostic, which represent warnings and errors. Each diagnostic is a table indexable by these keys: text — the diagnostic message, and category — a diagnostic category (also a string).

Cursor

You can compare whether two Cursors represent the same element using the standard == Lua operator. Comparisons with any other type yield false.

Cursor:children(){ Cursor* }
Cursor:children(vf_handle)boolean

Binding over clang_visitChildren. This is the main function for AST traversal. The first form collects the direct descendants of the given cursor in a table, returning an empty one if none are found. The second, preferred form accepts a handle of a visitor function previously registered with regCursorVisitor() instead. Here, the returned value indicates whether the traversal was aborted prematurely due to the callback returning ChildVisitResult.Break.

Note
Currently, the recommended procedure is to encapsulate the logic of one particular “analysis” into one visitor callback, which may run different portions of code e.g. conditional on the cursor’s kind. (Instead of calling Cursor:children(visitor_function_handle) with a different visitor function while another invocation of it is active.)
Cursor:parent()Cursor

Binding for clang_getCursorSemanticParent. Returns a cursor to the semantic parent of a given element. For example, for a method cursor, returns its class. For a global declaration, returns the translation unit cursor.

Cursor:lexicalParent()Cursor

Binding for clang_getCursorLexicalParent. Returns a cursor to the lexical parent of a given element.

Cursor:name()string

Binding over clang_getCursorSpelling. Returns the name of the entity referenced by cursor. Cursor also has __tostring set to this method.

Cursor:displayName()string

Binding over clang_getCursorDisplayName. Returns the display name of the entity, which for example is a function signature.

Cursor:kind()string

Returns the cursor kind without the CXCursor_ prefix, e.g. "FunctionDecl".

Cursor:haskind(kind : string)boolean

Checks whether the cursor has kind given by kind, which must be a string of enum CXCursorKind names without the CXCursor_ prefix. For instance, if (cur:haskind("TypedefDecl")) then --[[ do something ]] end .

Cursor:arguments(){ Cursor* }

Binding of clang_Cursor_getArgument. Returns a table array of Cursors representing arguments of a function or a method. Returns an empty table if a cursor is not a method or function.

Cursor:translationUnit()TranslationUnit

Binding for clang_Cursor_getTranslationUnit. Returns the translation unit that a cursor originated from.

Cursor:resultType()Type

Binding for clang_getCursorResultType. For a function or a method cursor, returns the return type of the function.

Cursor:typedefType()Type

If the cursor references a typedef declaration, returns its underlying type.

Cursor:type()Type

Returns the Type of a given element or nil if not available.

Cursor:location([linesfirst : boolean])string, number, number, number, number [, number, number]

Binding for clang_getCursorExtent and clang_getSpellingLocation. Returns the file name, starting line, starting column, ending line and ending column of the given cursor. If the optional argument linesfirst is true, the numbers are ordered like starting line, ending line, starting column, ending column, starting offset, ending offset instead. If linesfirst has the string value 'offset', only starting offset, ending offset are returned.

Cursor:presumedLocation([linesfirst : boolean]) → `string, number, number, number, number

Cursor:definition()Cursor

Binding for clang_getCursorDefinition. For a reference or declaration, returns a cursor to the definition of the entity, otherwise returns nil.

Cursor:referenced()Cursor

Binding for clang_getCursorReferenced. For a reference type, returns a cursor to the element it references, otherwise returns nil.

Cursor:access()string

When cursor kind is "AccessSpecifier", returns one of "private", "protected" and "public".

Cursor:isDefinition()boolean

Binding for clang_isCursorDefinition. Determine whether the declaration pointed to by this cursor is also a definition of that entity.

Cursor:isVirtual()boolean

For a C++ method, returns whether the method is virtual.

Cursor:isStatic()boolean

For a C++ method, returns whether the method is static.

Cursor:enumValue([unsigned : boolean])enum cdata

If the cursor represents an enumeration constant (CXCursor_EnumConstantDecl), returns its numeric value as a signed 64-bit signed integer, or a 64-bit unsigned integer if unsigned is true.

Note
In C99, an enumeration constant must be in the range of values representable by an int (6.7.2.2#2). LJClang does not check for this constraint.
Cursor:enumval([unsigned : boolean])number

Returns the cdata obtained from enumValue() as a Lua number, converted using tonumber(). Again, no checking of any kind is carried out.

Type

You can compare whether two Types represent the same type using the standard == Lua operator. Comparisons with any other type yield false.

Type:name()string

Binding of clang_getTypeKindSpelling. Returns one of CXTypeKind as a string without the CXType_ prefix. Type also has __tostring set to this method.

Type:canonical()Type

Binding of clang_getCanonicalType. Returns underlying type with all typedefs removed.

Note
Unlike luaclang-parser, LJClang does not dispatch to clang_getPointeeType() for pointer types.
Type:pointee()Type

Binding of clang_getPointeeType. For pointer type returns the type of the pointee.

Type:isPod()boolean

Binding of clang_isPODType. Returns true if the type is a “Plain Old Data” type.

Type:isConst()boolean
Type:isConstQualified()boolean

Binding of clang_isConstQualifiedType. Returns true if the type has a const qualifier.

Type:declaration()Cursor

Binding of clang_getTypeDeclaration. Returns a Cursor to the declaration of a given type, or nil.

Type:arrayElementType()Type

Binding of clang_getArrayElementType.

Type:arraySize()Type

Binding of clang_getArraySize.

License

Copyright © 2013 Philipp Kutin

(Portions of the documentation copied or adapted from luaclang-parser, Copyright © 2012 Michal Kottman)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.