PE Grammar Tree, after ExprMode Computation =========================================== This file documents the augmentations to a Normalized PE Grammar Tree as inserted by the transformational package 'pg::peg::emodes'. The input to the transformation is assumed to be a 'Normalized PE Grammar Tree' as described in 'doc_normalize.txt', possibly with augmentations from other transformation, as long as they are not in conflict with the augmentations specified here. Background ---------- The purpose of the computations and transformations specified here is to determine as much static information as we can about the generation of semantic values from terminal and non-terminal matchers. Such information can then be used to reduce the places in a parser which generate and/or handle AST nodes to the bare minimum. I.e we wish to avoid the generation of AST nodes by the matcher for which we are sure that they will be thrown away by the matcher either immediately, or later on. The fundamental information from which everything else is derived are the optional 'mode's which can specified for non-terminal symbols (See nonterminal 'Attribute' in the PEG Grammar). They are the hints to the matcher on how semantic values are generated and used. The mode of a nonterminal symbol controls three things at the same time: What it does with semantic values generated by its associated expression, if it generates its own semantic value, and how. These three aspects are interelated to each other, hence using one piece of information to set all of them. For this discussion the aspect of "how" is not relevant, except where it intersects with the other two. For the first aspect, the nonterminal has two choices, to either keep the semantic value from its expression, or to discard it. I.e this is about accepting versus rejection of the value. This aspect is 'Acceptance'. For the second aspect the nonterminal has three choices, to either generate a value, to not generate a value at all, or to simply pass through whatever was done by its expression (a maybe). This aspect is 'Generation'. The interelation of these aspects shows up as impossible combinations when selecting a value from each set, as made explicit in the next table: Generation x Acceptance -> Mode ---------- ---------- ---- Maybe Yes value Yes Yes value No Yes -- impossible -- Maybe No -- impossible -- Yes No match, leaf [1] No No discard ---------- ---------- ---- [1] Here the third aspect, the "how" of the generation comes into play to distinguish these two modes. As said, this distinction is not relevant for the current discussion. What we now wish to achieve is to define the vales for these two aspects for all expression and definition modes, which, starting from the settings of the modes by the input grammar, is the most definite, restrictive and conservative as possible. - Most definite, try to remove as many Maybe's as possible in favor of Yes or No. - Most restrictive, prefer a No over Yes when trying to remove a Maybe. - Most conversative, do not try to remove a Maybe if we are not truly sure about its value. The two node properties computes are * gen(X): Node -> {Yes, No, Maybe} Yes: The node X definitely generates a semantic value. No: The node X definitely does not generate a semantic value. Maybe: The node X passes semantic values, should they be generated by a child of X. We do not know if values are generated or not. A derivative function is gen', defined as gen'(X,G): Node x {Yes, No, Pass} -> Bool gen'(X,G) = (gen(X) == G) Conversely we could define gen in terms of gen' as gen (X) = G, iff gen'(X,G) * acc(X): Node -> Bool True: The node X will keep semantic values generated by its children. False: The node X will throw any semantic values generated by its children away. General information ------------------- * The specification of 'doc_normalize.txt' applies fully. Structure --------- * The structure of the tree is unchanged. Attributes ---------- Name Type Details ---- ---- ------- pg::peg::emodes none Root node only. The presence of this attribute indicates that the emode computation has been been executed on the grammar and that we have valid results. Transformations whose execution invalidates the emode information have to remove this marker. ==== ==== ======= gen enum Expression and definition nodes only. Values are in {yes, no, maybe}. Contains the value of gen(X). ---- ---- ------- acc bool Expression and definition nodes only. Contains the value of acc(X). ---- ---- ------- Transformation -------------- It is possible to specify a mode transformation based on the mode computation. It would resolve discontinuities in the gen/acc stati (Use of expressions generating a value where none is accepted) by duplicating relevant nonterminal definitions and forcing them into specific modes (discard). If expressions to be duplicated contain calls to undefined nonterminal symbols then the new definitions will do so as well. This transformation will not be written for now. The reason for that is that it essentially defeats packrat parsing. The definitions which are duplicated are used in both contexts which accept and such which do not accept the generated value, otherwise the regular analysis would have mode the definition itself non-accepting and non-generating. While it is possible to put the match status of both the original and duplicated definition into the packrat cache under the same label there is one important distinction which cannot be avoided: The duplicated definition does not generate a semantic value. And we cannot exclude the possibilities that either a cached semantic value is used in a non-accepting context which is not prepared to handle this value (by discarding it), or that an accepting context uses cached information which is short of one expected semantic value. So the two definitions have to cache their information under different labels to avoid a mixup. But now it becomes possible that the match engine has to fully reparse a segment of input during backtracking despite actually having information about matchability, just under a different label in the cache. And this then costs us the O(n) of the packrat parser, pushing it back exponential time-complexity. Conclusion: The described transformation can be applied if and only if we have ensured that the matcher will never backtrack in the input for the grammar in question. In other words, other transformation like left-factorisation, eliminiation of left-recursion etc. have to be applied first. In even other words, the grammar has to be LL(1) (which implies that it will not use any lookahead operators either).