Content Model

PyXB’s content model is used to complete the link between the Component Model and the Binding Model. These classes are the ones that:

  • determine what Python class attribute is used to store which XML element or attribute;
  • distinguish those elements that can occur at most once from those that require an aggregation; and
  • ensure that the ordering and occurrence constraints imposed by the XML model group are satisfied, when XML is converted to Python instances and vice-versa.

Associating XML and Python Objects

Most of the classes involved in the content model are in the pyxb.binding.content module. The relations among these classes are displayed in the following diagram.

_images/ContentModel.jpg

In the standard code generation template, both element and attribute values are stored in Python class fields. As noted in Deconflicting Names it is necessary to ensure an attribute and an element which have the same name in their containing complex type have distinct names in the Python class corresponding to that type. Use information for each of these is maintained in the type class. This use information comprises:

  • the original name of the element/attribute in the XML
  • its deconflicted name in Python
  • the private name by which the value is stored in the Python instance dictionary

Other information is specific to the type of use. The complexTypeDefinition retains maps from the component’s name the attribute use or element use instance corresponding to the component’s use.

Attribute Uses

The information associated with an attribute use is recorded in an AttributeUse instance. This class provides:

A map is used to map from expanded names to AttributeUse instances. This map is defined within the class definition itself.

Element Uses

The element analog to an attribute use is an element declaration, and the corresponding information is stored in a ElementDeclaration instance. This class provides:

A map is used to map from expanded names to ElementDeclaration instances. This map is defined within the class definition itself. As mentioned before, when the same element name appears at multiple places within the element content the uses are collapsed into a single attribute on the complex type; thus the map is to the ElementDeclaration, not the ElementUse.

Validating the Content Model

As of PyXB 1.2.0, content validation is performed using the Finite Automata with Counters (FAC) data structure, as described in Regular Expressions with Numerical Constraints and Automata with Counters, Dag Hovland, Lecture Notes in Computer Science, 2009, Volume 5684, Theoretical Aspects of Computing - ICTAC 2009, Pages 231-245.

This structure allows accurate validation of occurrence and order constraints without the complexity of the original back-tracking validation solution from PyXB 1.1.1 and earlier. It also avoids the incorrect rejection of valid documents that (rarely) occurred with the greedy algorithm introduced in PyXB 1.1.2. Conversion to this data structure also enabled the distinction between element declaration and element use nodes, allowing diagnostics to trace back to the element references in context.

The data structures for the automaton and the configuration structure that represents a processing automaton are:

_images/FACAutomaton.jpg

The implementation in PyXB is generally follows the description in the ICTAC 2009 paper. Calculation of first/follow sets has been enhanced to support term trees with more than two children per node. In addition, support for unordered catenation as required for the “all” model group is implemented by a state that maintains a distinct sub-automaton for each alternative, requiring a layered approach where executon of an automaton is suspended until the subordinate automaton has accepted and a transition out of it is encountered.

For more information on the implementation, please see the FAC module. This module has been written to be independent of PyXB infrastructure, and may be re-used in other code in accordance with the PyXB license.

FAC and the PyXB Content Model

As depicted in the Content Model class diagram each complex type binding class has a _Automaton which encodes the content model of the type as a Finite Automaton with Counters. This representation models the occurrence constraints and sub-element orders, referencing the specific element and wildcard uses as they appear in the schema. Each instance of a complex binding supports an AutomatonConfiguration that is used to validate the binding content against the model.

An ElementUse instance is provided as the metadata for automaton states that correspond an element declaration in the schema. Similarly, a WildcardUse instance is used as the metadata for automaton states that correspond to an instance of the xs:any wildcard schema component. Validation in the automaton delegates through the SymbolMatch_mixin interface to see whether content in the form of a complex type binding instance is conformant to the restrictions on symbols associated with a particular state.

When parsing, a transition taken results in the storage of the consumed symbol into the appropriate element attribute or wildcard list in the binding instance. In many cases, the transition from one state to a next is uniquely determined by the content; as long as this condition holds, the AutomatonConfiguration instance retains a single underlying FAC Configuration representing the current state.

To generate the XML corresponding to a binding instance, the element and wildcard content of the instance are loaded into a Python dictionary, keyed by the ElementDeclaration. These subordinate elements are appended to a list of child nodes as transitions that recognize them are encountered. As of PyXB 1.2.0 the first legal transition in the order imposed by the schema is taken, and there is no provision for influencing the order in the generated document when multiple orderings are valid.