Delilah is a knowledge based or 'gnostic' language machine, doing Dutch. It is driven by an explicit and detailed grammar of (a fragment of) Dutch, which can be characterised as rigid, combinatory, categorial and modal. Applying the grammar yields both a syntactic and a semantic representation of a variety of the major structures of Dutch, including verb clustering and related forms of discontinuity, questioning and relativization, co-ordination and anaphora.
The grammar essentially steers the unification of complex symbols, gathered from an extensive and utterly detailed lexicon. The lexical and phrasal templates are HSPG-style, but the combinatorial restrictions are encoded in the modalities of the categorial grammar, rather than in the construction of the attribute-value matrices. The unification process is graph unification.
parsing and generation
The Delilah language machine both parses and generates Dutch. It applies the same grammar and lexicon in the two processes. Generation is initialised by offering or randomly selecting lexical features, e.g. concepts or phonological forms. The generation exploits combinatory categories and related feature-value matrices as agenda's for the completion of sentences. As a consequence, the sentences produced do not live on the human ratio, but are solely anchored in the grammar. It is machine generated.
Delilah entertains different logical forms. First, any derivation produces by sheer unification an underspecified quasi-logical form (QLF). This is constructed as a layered storage of linked lambda terms. Secondly, this quasi-logical form is converted post-derivationally to a family of fully specified logical forms, dubbed applied logical form (ALF). From this fully specified applied logical form, two normalisations are derived. The first is simply called logical form (LF), and pretends to catch the higher order properties of the interpretation in a first-order predicate logic format. The second derivative is called flat logical form (FLF), and rises from a compilation of all global dependencies in logical form to a local level: for each occurrence of each variable its semantic dependencies are spelled out. Basically, FLF is a conjunction of small clauses. It is meant to serve easy propositional inference. It is a conjunction of small clauses. Plurality of LFs is reflected at FLF as disjunctions of smaal
In the on-line representation, only LF and FLF surface. They are supposed to be weakly equivalent. They are computed for every single constituent and for the sentence as a whole. The semantic constants and concepts in LF and FLF are mnemic to english lemmata.
The main datastructure computed is the the unified attribute value structure we call template. This templates holds all information gathered by the derivation. It reflects the overt and hidden argument structure of the phrase, and all properties of the constituents. In this demo, the template is presented as an xml hierarchy and the full hierarchy of the datastructure can be folded and unfolded.
The lexicon consists of detailed templates. Each template fixes a complex symbol, specifying combinatoric and semantic properties in a directed graph. Lexical templates are phrasal. In that sense, every lexical template is a construction. The templates as datastructures are generated off-line by a complex set of lexical rules. These rules cover part of the specific knowledge of Dutch which Delilah.
Presently, the lexicon is modest with respect to the number of lemma's, but extensive per lemma. We are planning a major increase of the lexical coverage.
While parsing, Delilah selects analyses by applying a semantic filter. It produces every feasible analysis, but ranks them with respect to the semantic complexity of their QLF's. Only the simplest QLF's pass, as they represent the highest level of lexical aggregation. In this way, lexical and phrasal constructions are exploited for solving ambiguity.
In parsing, Delilah is controlled by a chart parser. This parser is robust in the sense that it selects the best, least fragmented coverage of a sentence and selects and represents the best fragments.
main components of the representation
sentence: string parsed or generated
analysis: single analysis of the sentence
tree: syntactic derivation tree - only for parsing
ctree: derivation tree with semantic and linearization information - short version of template
template: main datastructure - xml hierarchy
template-head: sub-template: properties of the head of the construction
template-arg: sub-template: properties of and argument of the construction
template-synsem: sub-template: semantic and syntactic properties
template-semantics: (set of) semantic representations
template-lf: (set of) standard fully specified logical form
template-flf: fully specified flat logical form - conjunction of disjunctions
template-type: the modalized categorial type, expressing combinatoric properties
template-phon: linearization encoding
semantics: separate representation of logical form and flat logical form - bundle of template-lf and template-flf