Package pyxb :: Package utils :: Module unicode

Module unicode

This module contains support for Unicode characters as required to support the regular expression syntax defined in annex F of the XML Schema definition.

In particular, we need to be able to identify character properties and block escapes, as defined in F.1.1, by name.

Block data: http://www.unicode.org/Public/3.1-Update/Blocks-4.txt
Property list data: http://www.unicode.org/Public/3.1-Update/PropList-3.1.0.txt
Full dataset: http://www.unicode.org/Public/3.1-Update/UnicodeData-3.1.0.txt

The Unicode database active at the time XML Schema 1.0 was defined is archived at http://www.unicode.org/Public/3.1-Update/UnicodeCharacterDatabase-3.1.0.html, and refers to Unicode Standard Annex #27: Unicode 3.1.

Classes

[hide private]

CodePointSetError
Raised when some abuse of a CodePointSet is detected.

CodePointSet
Represent a set of Unicode code points.

Variables

[hide private]

SupportsWideUnicode = False

_NameStartChar = CodePointSet(ord(':'), (ord('A'), ord('Z')), ...

_NameChar = CodePointSet(_NameStartChar).extend([ord('-'), ord...

SingleCharEsc = {'n': CodePointSet(0x0A), 'r': CodePointSet(0x...

WildcardEsc = CodePointSet(ord('\n'), ord('\r')).negate()

MultiCharEsc = {}

__package__ = 'pyxb.utils'

c = ']'

Variables Details

[hide private]

_NameStartChar

Value:

CodePointSet(ord(':'), (ord('A'), ord('Z')), ord('_'), (ord('a'), ord(
'z')), (0xC0, 0xD6), (0xD8, 0xF6), (0xF8, 0x2FF), (0x370, 0x37D), (0x3
7F, 0x1FFF), (0x200C, 0x200D), (0x2070, 0x218F), (0x2C00, 0x2FEF), (0x
3001, 0xD7FF), (0xF900, 0xFDCF), (0xFDF0, 0xFFFD), (0x10000, 0xEFFFF))

_NameChar

Value:

CodePointSet(_NameStartChar).extend([ord('-'), ord('.'), (ord('0'), or
d('9')), 0xB7, (0x0300, 0x036F), (0x203F, 0x2040)])

SingleCharEsc

Value:

{'n': CodePointSet(0x0A), 'r': CodePointSet(0x0D), 't': CodePointSet(0
x09)}