Prev Next

Code Miner Query Language (mFQL)

The Code Miner system provides fast and comprehensive access to the information in existing source code.  By parsing all source code and storing the resulting AST (Abstract Syntax Tree) in a read-optimized database, the system provides complete access to all aspects of the original source code, in a machine understandable format.

The core goal behind the system is to provide access to the data hidden within source code in a timely and effective manner.  Great pains have been taken to ensure maximal performance, while providing the simplest interfaces possible.  As a result the system can be used to analyze program structure, calculate metrics, trace relationships and even perform re factoring.

mFQL

mFQL is the query language of the Code Miner. The language itself is reasonably simple, providing a small number of commands. Simple as the language is, it supports queries of arbitrary size and complexity. The design provides extreme performance for all queries, great and small.

The language is set-based; it operates primarily on sets of abstract data obtained through discrete vertical indices. For our purposes, a set is an ordered array of numbers, each of which is a pointer to a node in the AST Store. A discrete vertical index provides a mechanism to retrieve sets by discrete value.

The language includes the three basic set-joining operations. These are 'intersect', 'union', and' except'. The 'except' join is, more precisely, a 'symmetric difference' join.  A 'complement' join can be achieved by using a short sub-query; this is detailed in the 'except' join documentation. The 'offsetIntersect' join is also discussed in detail there.

The Code Miner database provides three discrete vertical indices in its AST Store. These indices are 'node name', 'attribute name', and 'attribute value'.  Each vertical index can be queried for a discrete value, which will return a set of all nodes where that value is present.  The three vertical indices are queried using the functions 'getByNode', 'getByName' and 'getByValue', respectively.

Set 'traversal routines' provide mechanisms to filter sets based on patterns in the AST. The traversal routines are either destructive (move) or non-destructive (filter).  Destructive traversals modify the set member values to point to the target node; non-destructive traversals ensure the target node exists.  In both cases, nodes that cannot complete the traversal are removed.

Please note that all traversals in mFQL are upwards.  Downwards traversals are technically complex, as a node could have any number of child nodes. Conversely, upward traversals are much simpler, with every node having zero or one parent node.  For these reasons, downward traversals are not supported in the query language.

Although there are only a small number of operations in mFQL, the language is capable of expressing very finely grained and complex queries. The language is functional in design, and supports arbitrary nesting calls.

mFQL queries execute at lightning speed. The backend database was designed from the ground up for read performance.  The query parser was hand optimized.  Knowing that it always has pure ordered sets, the low-level code takes several shortcuts to perform joins with minimal work effort.

In order to use nBNF effectively one must possess a working knowledge of the target language, and an intimate knowledge of the grammar used to parse it.

Learn more