Niem embedding schematron in xml schema documents

Webb Roberts, Georgia Tech Research Institute

This document specifies the data model, XML components, and XML data for use with the National Information Exchange Model (NIEM) version 3.0.

This document is draft of the specification for NIEM-conformant XML Schema documents, components, and instances. It represents the design that has evolved from the collaborative work of the NIEM Business Architecture Committee (NBAC) and the NIEM Technical Architecture Committee (NTAC) and their predecessors.

This specification is a product of the NIEM Program Management Office (PMO).

Send comments on this specification via email to niem-comments@lists.gatech.edu.

1. Introduction

This Naming and Design Rules (NDR) document specifies XML Schema documents for use with the National Information Exchange Model (NIEM). NIEM is an information sharing framework based on the World Wide Web Consortium (W3C) Extensible Markup Language (XML) Schema standard. In February 2005, the U.S. Departments of Justice (DOJ) and Homeland Security (DHS) signed a cooperative agreement to jointly develop NIEM by leveraging and expanding the Global Justice XML Data Model (GJXDM) into multiple domains. NIEM is a result of a combined government and industry effort to improve information interoperability and exchange within the United States at federal, state, tribal, and local levels of government.

NIEM specifies a set of reusable information components for defining standard information exchange messages, transactions, and documents on a large scale: across multiple communities of interest and lines of business. These reusable components are rendered in XML Schema documents as type, element, and attribute definitions that comply with the W3C XML Schema specification. The resulting reference schemas are available to government practitioners and developers at http://niem.gov/.

The W3C XML Schema standard enables information interoperability and sharing by providing a common language for describing data precisely. The constructs it defines are basic metadata building blocks — baseline data types and structural components. Users employ these building blocks to describe their own domain-oriented data semantics and structures, as well as structures for specific information exchanges and components for reuse across multiple information exchanges. Rules that profile allowable XML Schema constructs and describe how to use them help ensure that those components are consistent and reusable.

This document specifies principles and enforceable rules for NIEM data components and schemas. Schemas and components that obey the rules set forth here are considered to be NIEM-conformant.

This document was developed to specify NIEM 3.0. Later releases of NIEM may be specified by later versions of this document. The document covers the following issues in depth:

This document does NOT address the following:

This document is intended as a technical specification. It is not intended to be a tutorial or a user guide.

1.2. Audience

This document targets practitioners and developers who employ NIEM for information exchange and interoperability. Such information exchanges may be between or within organizations. The NIEM reference schemas provide system implementers much content on which to build specific exchanges. However, there is a need for extended and additional content. The purpose of this document is to define the rules for such new content so that it will be consistent with the NIEM reference schemas. These rules are intended to establish and, more important, enforce a degree of standardization on a national level.

2. Document conventions and normative content

This document uses formatting and syntactic conventions to clarify meaning and avoid ambiguity.

2.1. Document references

This document relies on references to many outside documents. Such references are noted by bold, bracketed inline terms. For example, a reference to RFC 2119 is shown as [RFC 2119]. All reference documents are recorded in Appendix A, References, below.

2.2. Formatting

In addition to special formatting for definitions, principles, and rules, this document uses consistent formatting to identify NIEM components.

Courier : All words appearing in Courier font are values, objects, keywords, or literal XML text.

Italics: All words appearing in italics, when not titles or used for emphasis, are special terms with definitions appearing in this document.

Throughout the document, fragments of XML Schema or XML instances are used to clarify a principle or rule. These fragments are specially formatted in Courier font and appear in text boxes. An example of such a fragment follows:

Figure 2-1: Example of an XML fragment 2.3. Clark notation and qualified names

This document uses both Clark notation and QName notation to represent qualified names.

QName notation is defined by [XML Namespaes], §4, Qualified Names . A QName for the XML Schema string datatype is xs:string . Namespace prefixes used within this specification are listed in Section 2.4, Use of namespaces, below.

This document sometimes uses Clark notation to represent qualified names in normative text. Clark notation is described by [ClarkNS], and provides the information in an XML qualified name (as defined by [XML Namespaes]) without the need to define a namespace prefix and then reference that namespace prefix. A Clark notation representation for the qualified name for the XML Schema string datatype is string .

Each Clark notation value consists of a namespace URI surrounded by curly braces, concatenated with a local name. Clark notation is frequently used to represent the qualified name for an attribute with no namespace, which is ambiguous when represented using QName notation. For example, the element targetNamespace , which has no [namespace name] property, is represented in Clark notation as <>targetNamespace .

2.4. Use of namespaces

The following namespace prefixes are used consistently within this specification. These prefixes are not normative; this document issues no requirement that these prefixes be used in any conformant artifact.

2.5. Normative and informative content

This document includes a variety of content. Some content is normative (binding and enforceable in implementations), while other content is informative (explanatory, but not part of the NIEM specification). In general, the informative material appears as supporting text and specific rationales for the normative material.

Conventions used within this document include:

[Definition: ]

A formal definition of a term associated with NIEM.

Definitions are normative.

A guiding principle for NIEM.

The principles represent the requirements, concepts, and goals that have helped shape the NIEM. Principles are informative, not normative, but act as the basis on which the rules are defined.

Accompanying each principle is a short discussion section that justifies the application of the principle to NIEM design.

Principles are numbered in the order in which they appear in the document.

An enforceable rule for NIEM.

Add info about rule classes

Rules state specific requirements on artifacts, such as schemas and instances. Most rules apply to conformant schemas, while others apply to instances. The rules are normative.

Rules are stated using both XML InfoSet terminology (elements and attributes) and XML Schema terminology (schema components). The choice of terminology is driven by which standard best expresses the rule. Certain concepts are more clearly expressed using XML InfoSet information items, others using the XML Schema data model; still others are best expressed using a combination of terminology drawn from both standards.

Remove the bit here about rationales

Rules have rationales that justify the need for the rule. For clarity, there may be multiple rules that have the same rationale.

Rules and supporting text may use Extended Backus-Naur Form (EBNF) notation as defined by [XML] §6, Notation .

Rules are numbered according to the section in which they appear and the order in which they appear within that section. For example, Rule 5-1 is the first rule in Section 5.

Each rule is accompanied by a description of its applicability. This identifies the type of schema to which the rule applies or indicates whether the rule is applicable to XML documents or element information items. Each entry in the list is a code from Table 5-1, Codes representing conformance targets, below. If a code appears in the applicability list for a rule, then the rule applies to the corresponding conformance target. The conformance targets are defined in Section 4, Conformance targets, below.

2.6. Use of normative Schematron

This document defines many normative rules using Schematron rule-based validation syntax, as defined by [Schematron]. Effort has been made to make the rules precise and unambiguous. Very detailed text descriptions of rules can introduce ambiguity, and they are not directly executable by users. Providing NDR rules that are expressed as Schematron rules ensures that the rules are precise, and that they are directly executable through commercially-available and free tools.

Many rules herein do not have executable Schematron supporting them. Some are not fit for automatic validation, and others may be difficult or cumbersome to express in Schematron. In neither case are such rules be any less normative. A rule that has no Schematron is just as normative as a rule that does have Schematron.

The Schematron rules are written using XPath2 as defined by [XPath 2]. These executable rules are normative.

An execution of a Schematron pattern that issues a failed assert represents a validation error, and signfies that the assessed artifact vioates a requirement of a conformance rule.

An execution of a Schematron pattern that issues a report indicates cause for concern. This may be:

In either case, the Schematron reporting mechanism may be used to identify specific location within artifacts that need further attention.

2.7. Normative XPath functions

The Schematron within this document is supported by functions, to make the rules more comprehensible, and to abstract away process-specific operations. Each function has a normative XPath interface and a normative text definition. Any implementation provided for these functions should be considered informative, not normative, but may be useful for certain implementations of the rules.

The following XPath functions are defined normatively when used within Schematron by this specification:

nf:get-document-element($context as element()) as element()
nf:get-target-namespace($element as element()) as xs:anyURI
nf:resolve-element($context as element(), $qname as xs:QName) as element()*
nf:resolve-namespace($context as element(), $namespace-uri as xs:string) as element(xs:schema)*
nf:resolve-type($context as element(), $qname as xs:QName) as element()*
nf:has-effective-conformance-target($context as element(), $match as xs:anyURI*) as xs:boolean
2.8. Normative Schematron namespace declarations

The following Schematron namespace declarations are normative for the Schematron rules and supporting code within this specification:

Figure 2-2: Normative Schematron namespace declarations

Note that the binding of the prefix xml to the XML namespace ( http://www.w3.org/XML/1998/namespace ) is implicit.

3. Terminology

This document uses standard terminology to explain the principles and rules that describe NIEM.

3.1. RFC 2119 terminology

Within normative content (rules and definitions), the key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in [RFC 2119].

3.2. XML terminology [Definition: ]

The term XML document is as defined by [XML], §2.

3.3. XML information set terminology

When discussing XML documents, this document uses terminology and language as defined by [XML Infoset]. The term information item is frequently omitted.

[Definition: ]

An element element is an element information item, as defined by [XML Infoset], §2.2, Element Information Items .

Additional terms include:

[Definition: ]

The term document element refers to the element information item that is the value of the [document element] property of a document information item for an [XML document] , as defined by [XML Infoset] §2.1, The Document Information Item .

3.4. XML Schema terminology

This document uses many terms from [XML Schema Structures] and [XML Schema Datatypes] in a normative way.

[Definition: ]

The term schema component is as defined by [XML Schema Structures] §2.2, XML Schema Abstract Data Model , which states:

Schema component is the generic term for the building blocks that comprise the abstract data model of the schema.

[Definition: ]

The term XML Schema is as defined by [XML Schema Structures] §2.2, XML Schema Abstract Data Model , which states:

An XML Schema is a set of schema components.

[Definition: ]

The term base type definition is as defined by [XML Schema Structures], §2.2.1.1.

[Definition: ]

The term simple type definition is as defined by [XML Schema Structures], §2.2.1.2.

[Definition: ]

The term complex type definition is as defined by [XML Schema Structures], §2.2.1.3.

[Definition: ] [Definition: ]

In this document, the name of the referenced schema component may appear without the suffix schema component (e.g., the term complex type definition may be used instead of complex type definition schema component ) to enhance readability of the text.

3.5. XML Namespaces terminology

This document uses XML Namespaces as defined by [XML Namespaes] and [XML Namespaces Errata].

3.6. Conformance Target Attribute Specification terminology

[CTAS] defines several terms used normatively within this specification.

[Definition: ]

The term conformance target is as defined by [CTAS], which states:

A conformance target is a class of artifact, such as an interface, protocol, document, platform, process or service, that is the subject of conformance clauses and normative statements. There may be several conformance targets defined within a specification, and these targets may be diverse so as to reflect different aspects of a specification. For example, a protocol message and a protocol engine may be different conformance targets.

[Definition: ]

The term conformance target identifier is as defined by [CTAS], which states:

A conformance target identifier is an internationalized resource identifier that uniquely identifies a conformance target.

4. Conformance targets 4.1. Conformance targets defined

This section defines and describes conformance targets of this specification. Each conformance target has a formal definition, along with a notional description of the characterstics and intent of each. These include:

4.1.1. Reference schema document [Definition: ]

A conformant reference schema document is a schema document that is intended to provide the authoritative definitions of broadly reusable data components. It is a conformance target of this specification. A reference schema document MUST conform to all rules of this specification that apply to this conformance target. An XML document with a conformance target identifier of http://reference.niem.gov/niem/specification/naming-and-design-rules/3.0/#ReferenceSchemaDocument MUST be a conformant reference schema document.

A conformant reference schema document is a schema document that is intended to be the authoritative definition schema for a namespace. Examples include NIEM Core and NIEM domains.

Some characteristics of a reference schema document:

Any schema that defines components that are intended to be incorporated into NIEM Core or a NIEM domain may be defined as a reference schema.

The rules for reference schema documents are more stringent than are the rules for other classes of NIEM-conformant schemas. Reference schema documents are intended to support the broadest reuse. They are very uniform in their structure. As they are the primary definitions for data components, they do not need to restrict other data definitions, and they are not allowed to use XML Schema’s restriction mechanisms. Reference schema documents are intended to be as regular and simple as possible.

4.1.2. Extension schema document [Definition: ]

A conformant extension schema document is a schema document that is intended to provide definitions of data components that are intended for reuse within a more narrow scope than reference schema documents. It is a conformance target of this specification. An extension schema document MUST conform to all rules of this specification that apply to this conformance target. An XML document with a conformance target identifier of http://reference.niem.gov/niem/specification/naming-and-design-rules/3.0/#ExtensionSchemaDocument MUST be a conformant reference schema document.

An extension schema is an XML Schema document that meets all of the following criteria:

An extension schema in an information exchange specification serves several functions. First, it defines new content within a new namespace, which may be an exchange-specific namespace or a namespace shared by several exchanges. This content is NIEM-conformant but has fewer restrictions on it than do [conformant reference schema documents] . Second, the extension schema document bases its content on content from reference schemas documents, where appropriate. Methods of deriving content include using (by reference) existing components, as well as creating extensions and restrictions of existing components.

For example, an information exchange specification may define a type for an exchange-specific phone number and base that type on a type defined by the NIEM Core reference schema document. This exchange-specific phone number type may restrict the NIEM Core type to limit those possibilities that are permitted of the base type. Exchange extensions and restrictions must include annotations and documentation to be conformant, but they are allowed to use restriction, choice, and some other constructs that are not allowed in reference schema documents.

Note that exchange specifications may define schemas that meet the criteria of reference schemas for those components that its developers wish to nominate for later inclusion in NIEM Core or in domains.

4.1.3. Schema document set [Definition: ]

A conformant schema document set is a collection of XML Schema documents that together are capable of validating a conformant instance XML document. It is a conformance target of this specification. A conformant schema document set MUST conform to all rules of this specification that apply to this conformance target.

4.1.4. Instance documents and elements

This document has specific rules about how NIEM content should be used in XML documents. As well as containing rules for XML Schema documents, this NDR contains rules for NIEM-conformant XML content at a finer granularity than the XML document.

[Definition: ]

A conformant instance XML document is an XML document that is an instance of a conformant schema document set. It is a conformance target of this specification. A conformant instance XML document MUST conform to all rules of this specification that apply to this conformance target.

A conformant instance XML document is an XML document that satisfies all of the following criteria:

Just make these term references

In this definition and the next definition below, the term XML document is as specified in [XML]. The terms document information item, document element, element information item, namespace name, and local name are as specified in [XML Infoset]. The term valid is as specified in [XML Schema Structures].

Schema-validity may be assessed against a single set of schemas or against multiple sets of schemas. Assessment against schemas is as directed by an IEPD, other instructions, or tools.

Note that the document element (root element) of a NIEM-conformant XML document is not required to be a NIEM-conformant element information item. Other specifications, such as the IEPD specification, may add additional constraints to these to specify IEPD or exchange conformance.

[Definition: ]

A conformant element information item is an element information item that satisfies all of the following criteria:

Because each NIEM-conformant element information item must be locally schema-valid, each element must validate against the schema definition of the element, even if the element information item is allowed within the document because of a wildcard with processContents of skip . Within a NIEM-conformant XML document, each element that is from a NIEM namespace conforms to its schema specification.

5. Applicability of rules to conformance targets

Rules within this document are annotated with conformance target codes. Each rule may be annotated with one or more codes for a conformance target. A rule within this document that is annotated with one of the following codes applies to the corresponding conformance target.

Table 5-1: Codes representing conformance targets
CodeConformance target
REFconformant reference schema document
EXTconformant extension schema document
SETConformant schema document set
INSConformant instance XML document
6. Conformance target identifiers

The term conformance target identifier is defined by [CTAS].

6.1. Schema is CTAS-conformant [Rule 6-1] (REF, EXT) (Constraint)
  The document MUST be a conformant document as defined by the NIEM Conformance Targets Attribute Specification.  

The term conformant document is defined by §3.2 of [CTAS].

6.2. Document element has conformanceTargets [Rule 6-2] (REF, EXT) (Constraint)
  An element MUST own an attribute conformanceTargets if and only if it is a [document element].  

The term [document element] is a defined term.

6.3. Schema claims reference schema conformance target. [Rule 6-3] (REF) (Constraint)
  The document MUST have an effective conformance target identifier of http://reference.niem.gov/niem/specification/naming-and-design-rules/3.0/#ReferenceSchemaDocument.  
6.4. Schema claims extension conformance target [Rule 6-4] (EXT) (Constraint)
  The document MUST have an effective conformance target identifier of http://reference.niem.gov/niem/specification/naming-and-design-rules/3.0/#ExtensionSchemaDocument.  
7. The NIEM conceptual model

NIEM provides a concrete data model, in the form of a set of XML Schema documents. These schemas may be used to build messages and information exchanges. The schemas spell out what kinds of objects exist and how those objects may be related. XML data that follows the rules of NIEM imply specific meaning. The varieties of XML Schema components used within NIEM-conformant schemas are selected to clarify the meaning of XML data. That is, schema components that do not have a clear meaning have been avoided. NIEM provides a framework within which XML data has a specific meaning.

One limitation of XML and XML Schema is that they do not describe the meaning of an XML document. The XML specification defines XML documents and defines their syntax but does not address the meaning of those documents. The XML Schema specification defines the XML Schema definition language, which describes the structure and constrains the contents of XML documents (schemas).

In a schema, the meaning of a schema component (e.g., element, attribute, or type) may be described using the xs:documentation element. Or, additional information may be included via the xs:appinfo element. Although this may enable humans to understand XML data, more information is needed to support the machine-understandable meaning of XML data. In addition, inconsistency among the ways that schema components may be put together may be a source of confusion.

The RDF Core Working Group of the World Wide Web consortium has developed a simple, consistent conceptual model, the RDF model. The RDF model is described and specified through a set of W3C Recommendations, the Resource Description Framework (RDF) specifications, making it a very well defined standard. The NIEM model and the rules contained in this NDR are based on the RDF model. This provides numerous advantages:

With the exception of Section 2, NIEM rules are explained in this document without reference to RDF or RDF concepts. Understanding RDF is not required to understand NIEM-conformant schemas or data based on NIEM. However, understanding RDF concepts may deepen understanding of NIEM.

The goal of this section is to clarify the meaning of XML data that is NIEM-conformant and to outline the implications of various modeling constructs in NIEM. The rules for NIEM- conformant schemas and instances are in place to ensure that a specific meaning can be derived from data. That is, the data makes specific assertions, which are well understood since they are derived from the rules for NIEM.

The key concepts underpinning the NIEM conceptual model are discussed in the remainder of this section:

7.1. NIEM and the RDF model

NIEM has its foundation in the RDF model. This helps to ensure that NIEM-conformant data has precise meaning. The RDF view of what data means is clarified by [RDF Semantics]:

…asserting a sentence makes a claim about the world…an assertion amounts to stating a constraint on the possible ways the world might be.

The RDF view of the meaning of data carries into NIEM: NIEM elements form statements that make claims about the world: that a person has a name, a residence location, a spouse, etc. The assertion of one set of facts does not necessarily rule out other statements: A person could have multiple names, could have moved, or could be divorced. Each statement is a claim asserted to be true by the originator of the statement.

This NDR discusses NIEM data in terms of objects, a term more accessible than the word used by RDF, resources. RDF defines the world in terms of resources. [RDF Semantics] describes what may constitute a resource:

…no assumptions are made here about the nature of resources; resource is treated here as synonymous with entity , i.e., as a generic term for anything in the universe of discourse.

RDF resources coincide with NIEM objects and associations. That is, both objects and associations in NIEM are RDF resources with the additional constraints:

NIEM associations are defined as n-ary properties as described in [N-ary], use case 3. NIEM object types are defined in Section 7.4.1, Object Types. NIEM associations are defined in Section 7.4.3, Association Types. Assertions are made via NIEM-conformant XML data, described by Section 8, XML Instance Rules.

The XML Schema types that define NIEM objects and associations are related to each other via elements and attributes. That is, a type contains elements and attributes, and an element or attribute has a value that is an instance of an XML Schema type. In NIEM, these elements and attributes are XML Schema representations of RDF properties, which are described by [RDF Primer], 2.1 Basic Concepts :

RDF is based on the idea that the things being described have properties which have values, and that resources can be described by making statements…that specify those properties and values.

This describes how NIEM works: schemas describe things and their properties. NIEM- conformant data specifies objects, the values of their properties, and the relationships between them.

There are several kinds of assertions that may be made with NIEM-conformant data. Examples include: