Dynamic syntax tree

Dynamic Syntax Tree (DST), was created by M. Barzanti [1] as an alternative to Abstract syntax tree and Parse tree representation of the structure of source code written in a programming language. Unlike Statically typed programming languages, the analysis of a Dynamic programming language has to estimate the Type system (types, variables, functions) from the code fragment semantics, as well as all Configuration files and Binary files. Dynamic Syntax Tree implementation also takes into account non-static and non-typed implicit and explicit Declaration (computer programming)s, so that the resulting information covers most of readable dynamic code fragments.
Mapping of language constructs from a dynamic language into a Dynamic Syntax Tree (DST) is a kind of semantic analysis (compilers). Information implied from the syntax is analyzed and the results are inserted back into the same tree, but in the form of complete static information. Preprocessing the source code will create a specialized Syntax Tree for each Class found. That can be applied to traditional programming languages too, like COBOL, where Classes can be represented by Programs, Methods by calls/Performs, Parameters by Using, etc.
Application in SAST tools
Dynamic syntax trees are mostly used in program analysis and program transformation systems.[2] Dynamic syntax trees are data structures used in some Static program analysis tools, due to their property of fast representing an optimized structure of program code and its related Binary file.[3]
Motivation
- Compared to the source code, a DST does not include certain elements, such as inessential punctuation and delimiters (braces, semicolons, parentheses, etc.).
- The whole source file, in a parsed form, must be processed and every expression or construct has to be analyzed. As a result, it extends some existing node information or adds new declaration nodes (implicit declarations). In static languages, it is typically sufficient to scan declaration constructs only (class declaration, variable declaration, etc.); single expressions are not needed for collecting available symbols and their descriptions.
- In addition to statically defined symbols, other tree nodes are added using the semantic analysis of the DST. Also, some existing information can be modified. Particular expressions are scanned to find, e.g., implicitly declared variables and dynamically added object members. Also, possible variable values, estimated types, and behavior influenced from configuration are discovered. The so far built DST is used for that. The process of mapping the Assignment Expression, containing Variable Usage (including the one declared in web pages or JavaScript) into the static declaration of used variables and Configuration files influencing the Behavior, is depicted on the Figure at the right side.
- The mapping collects known information from the expressions and stores it into the static representation of the source code (DST). The process of single Dynamic Syntax Tree mapping is performed every time the corresponding source code changes; hence, it is suitable to implement some optimizations. Also, there could be trees containing symbols from large static language libraries, e.g., from .NET assemblies, which manage typically thousands of symbols. Initializing all of them may be too slow.[4]
Optimization
Differently from Abstract syntax tree and CST Parse tree, in case of huge Classes having more than 65,535 objects, the DST Object Dictionary structure (68,083 nodes) will be paged into some small XML files, about 575 KB sized each. The same will be done with the Dynamic Syntax Tree itself: only 4,096 Classes at a time will be processed, max 135 KB each. There will be no case of RAM Random access memory usage over 700 MB, that means it can be able to perform a Static Analysis using a low-end Windows XP notebook with 1 GB of RAM and a single-core processor, and up to 5 simultaneously static analyses of different applications at a time can be achieved using only a 4 GB RAM, dual-core processor machine. So, performances will be scalable depending on hardware architecture, with the guarantee of performing at least a complete static analysis starting from a notebook and going up to a multi-core machine. An important optimization compared to AST and CST will be the number of tree nodes. For example, GCC 2.52 from SPEC 95 Standard Performance Evaluation Corporation benchmark suite comprises 604,000 AST nodes vs only 127,067 nodes paged in 2 DST, and Word 1.1A (1990) comprises 607,700 AST nodes vs only 204,249 nodes, paged in 3 DST. Interestingly, GCC 2.52 is about 140,000 lines of code and Word 1.1A is about 322,000. Static Analysis processing times using DST were half an hour for GCC and 1 hour and a half for Word 1.1A. Word 1.1A source was downloaded from the Computer History Museum site. Some of the additional information initialization can be postponed until it is requested, because not all of the source elements must be analyzed and mapped immediately.
See also
- Abstract semantic graph (ASG)
- Composite pattern
- Control flow graph
- Document Object Model (DOM)
- Semantic resolution tree (SRT)
- Symbol table
- Term graph
References
- ↑ D. Syman, M. Barzanti (1999). "Static Analysis of Applications written in modern languages".
- ↑ D. Syman, M. Barzanti (2001). "Static Analysis: a Dynamic Syntax Tree implementation".
- ↑ D. Syman, M. Barzanti (2013). "Static Analysis: new emerging algorithms".
- ↑ D. Syman, M. Barzanti (2012). "Dynamic_Syntax_Tree_Implementation_Results".
Further reading
- Syman, David; Barzanti, Marco. "Dynamic Syntax Tree: Optimized Binary Sandboxing".
- Syman, David; Barzanti, Marco. "Dynamic Syntax Tree: Implementation Results Q1-2016".
This article "Dynamic syntax tree" is from Wikipedia. The list of its authors can be seen in its historical. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.
