Please refer to the errata for this document, which may include some normative corrections.
This document is also available in these non-normative formats: XML and XHTML with visible change markup. See also translations.
Copyright ?nbsp;2004 W3C?/SUP> (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
XML Schema Part 0: Primer is a non-normative document intended to provide an easily readable description of the XML Schema facilities, and is oriented towards quickly understanding how to create schemas using the XML Schema language. XML Schema Part 1: Structures and XML Schema Part 2: Datatypes provide the complete normative description of the XML Schema language. This primer describes the language features through numerous examples which are complemented by extensive references to the normative texts.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is a W3C Recommendation, the first part of the Second Edition of XML Schema. This document has been reviewed by W3C Members and other interested parties and has been endorsed by the Director as a W3C Recommendation. It is a stable document and may be used as reference material. W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionality and interoperability of the Web.
This document has been produced by the W3C XML Schema Working Group as part of the W3C XML Activity. The goals of the XML Schema language are discussed in the XML Schema Requirements document. The authors of this document are the members of the XML Schema Working Group. Different parts of this specification have different editors.
This document was produced under the 24 January 2002 Current Patent Practice (CPP) as amended by the W3C Patent Policy Transition Procedure. The Working Group maintains a public list of patent disclosures relevant to this document; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) with respect to this specification should disclose the information in accordance with section 6 of the W3C Patent Policy.
The English version of this specification is the only normative version. Information about translations of this document is available at http://www.w3.org/2001/05/xmlschema-translations.
This second edition is not a new version, it merely incorporates the changes dictated by the corrections to errors found in the first edition as agreed by the XML Schema Working Group, as a convenience to readers. A separate list of all such corrections is available at http://www.w3.org/2001/05/xmlschema-errata.
The errata list for this second edition is available at http://www.w3.org/2004/03/xmlschema-errata.
Please report errors in this document to www-xml-schema-comments@w3.org (archive).
1 Introduction
2 Basic Concepts: The Purchase
Order
2.1 The Purchase Order
Schema
2.2 Complex Type Definitions,
Element & Attribute Declarations
2.3 Simple
Types
2.4 Anonymous Type
Definitions
2.5 Element
Content
2.6 Annotations
2.7
Building Content
Models
2.8 Attribute
Groups
2.9 Nil Values
3 Advanced Concepts I: Namespaces,
Schemas & Qualification
3.1 Target Namespaces &
Unqualified Locals
3.2 Qualified
Locals
3.3 Global vs. Local
Declarations
3.4 Undeclared Target
Namespaces
4 Advanced
Concepts II: The International Purchase Order
4.1
A Schema in
Multiple Documents
4.2 Deriving Types by
Extension
4.3 Using Derived Types
in Instance Documents
4.4 Deriving Complex Types
by Restriction
4.5 Redefining Types &
Groups
4.6 Substitution
Groups
4.7 Abstract Elements and
Types
4.8 Controlling the
Creation & Use of Derived Types
5 Advanced Concepts III:
The Quarterly Report
5.1 Specifying
Uniqueness
5.2 Defining
Keys & their References
5.3 XML Schema
Constraints vs. XML 1.0 ID Attributes
5.4 Importing
Types
5.5 Any Element, Any
Attribute
5.6 schemaLocation
5.7
Conformance
A Acknowledgements
B Simple Types &
their Facets
C Using Entities
D
Regular
Expressions
E Index
E.1
XML Schema
Elements
E.2 XML Schema
Attributes
This document, XML Schema Part 0: Primer, provides an easily approachable description of the XML Schema definition language, and should be used alongside the formal descriptions of the language contained in Parts 1 and 2 of the XML Schema specification. The intended audience of this document includes application developers whose programs read and write schema documents, and schema authors who need to know about the features of the language, especially features that provide functionality above and beyond what is provided by DTDs. The text assumes that you have a basic understanding of XML 1.0 and Namespaces in XML. Each major section of the primer introduces new features of the language, and describes those features in the context of concrete examples.
Basic Concepts: The Purchase Order (?) covers the basic mechanisms of XML Schema. It describes how to declare the elements and attributes that appear in XML documents, the distinctions between simple and complex types, defining complex types, the use of simple types for element and attribute values, schema annotation, a simple mechanism for re-using element and attribute definitions, and nil values.
Advanced Concepts I: Namespaces, Schemas & Qualification (?), the first advanced section in the primer, explains the basics of how namespaces are used in XML and schema documents. This section is important for understanding many of the topics that appear in the other advanced sections.
Advanced Concepts II: The International Purchase Order (?), the second advanced section in the primer, describes mechanisms for deriving types from existing types, and for controlling these derivations. The section also describes mechanisms for merging together fragments of a schema from multiple sources, and for element substitution.
Advanced Concepts III: The Quarterly Report (?) covers more advanced features, including a mechanism for specifying uniqueness among attributes and elements, a mechanism for using types across namespaces, a mechanism for extending types based on namespaces, and a description of how documents are checked for conformance.
In addition to the sections just described, the primer contains a number of appendices that provide detailed reference information on simple types and a regular expression language.
The primer is a non-normative document, which means that it does not provide a definitive (from the W3C's point of view) specification of the XML Schema language. The examples and other explanatory material in this document are provided to help you understand XML Schema, but they may not always provide definitive answers. In such cases, you will need to refer to the XML Schema specification, and to help you do this, we provide many links pointing to the relevant parts of the specification. More specifically, XML Schema items mentioned in the primer text are linked to an index [Index ()] of element names and attributes, and a summary table of datatypes, both in the primer. The table and the index contain links to the relevant sections of XML Schema parts 1 and 2.
The purpose of a schema is to define a class of XML documents, and so the term "instance document" is often used to describe an XML document that conforms to a particular schema. In fact, neither instances nor schemas need to exist as documents per se -- they may exist as streams of bytes sent between applications, as fields in a database record, or as collections of XML Infoset "Information Items" -- but to simplify the primer, we have chosen to always refer to instances and schemas as if they are documents and files.
Let us start by considering an instance document in a file called po.xml. It describes
a purchase order generated by a home products ordering and billing application:
The purchase order consists of a main element, purchaseOrder,
and the subelements shipTo, billTo,
comment, and items. These subelements (except
comment) in turn contain other subelements, and so on, until a
subelement such as USPrice contains a number rather than any
subelements. Elements that contain subelements or carry attributes are said to
have complex types, whereas elements that contain numbers (and strings, and
dates, etc.) but do not contain any subelements are said to have simple types.
Some elements have attributes; attributes always have simple types.
The complex types in the instance document, and some of the simple types, are defined in the schema for purchase orders. The other simple types are defined as part of XML Schema's repertoire of built-in simple types.
Before going on to examine the purchase order schema, we digress briefly to mention the association between the instance document and the purchase order schema. As you can see by inspecting the instance document, the purchase order schema is not mentioned. An instance is not actually required to reference a schema, and although many will, we have chosen to keep this first section simple, and to assume that any processor of the instance document can obtain the purchase order schema without any information from the instance document. In later sections, we will introduce explicit mechanisms for associating instances and schemas.
The purchase order schema is contained in the file po.xsd:
The purchase order schema consists of a schema
element and a variety of subelements, most notably element,
complexType,
and simpleType
which determine the appearance of elements and their content in instance
documents.
Each of the elements in the schema has a prefix xsd:
which is associated with the XML Schema namespace through the declaration,
xmlns:xsd="http://www.w3.org/2001/XMLSchema", that appears in the
schema
element. The prefix xsd: is used by convention to denote the XML
Schema namespace, although any prefix can be used. The same prefix, and hence
the same association, also appears on the names of built-in simple types, e.g.
xsd:string.
The purpose of the association is to identify the elements and simple types as
belonging to the vocabulary of the XML Schema language rather than the
vocabulary of the schema author. For the sake of clarity in the text, we just
mention the names of elements and simple types (e.g. simpleType),
and omit the prefix.
In XML Schema, there is a basic difference between complex types which allow elements in their content and may carry attributes, and simple types which cannot have element content and cannot carry attributes. There is also a major distinction between definitions which create new types (both simple and complex), and declarations which enable elements and attributes with specific names and types (both simple and complex) to appear in document instances. In this section, we focus on defining complex types and declaring the elements and attributes that appear within them.
New complex types are defined using the complexType
element and such definitions typically contain a set of element declarations,
element references, and attribute declarations. The declarations are not
themselves types, but rather an association between a name and the constraints
which govern the appearance of that name in documents governed by the associated
schema. Elements are declared using the element
element, and attributes are declared using the attribute
element. For example, USAddress is defined as a complex type, and
within the definition of USAddress we see five element declarations
and one attribute declaration:
The consequence of this definition is that any element appearing in
an instance whose type is declared to be USAddress (e.g.
shipTo in po.xml) must consist
of five elements and one attribute. These elements must be called
name, street, city, state
and zip as specified by the values of the declarations'
name attributes, and the elements must appear in the same sequence
(order) in which they are declared. The first four of these elements will each
contain a string, and the fifth will contain a number. The element whose type is
declared to be USAddress may appear with an attribute called
country which must contain the string US.
The USAddress definition contains only declarations
involving the simple types: string, decimal and NMTOKEN. In
contrast, the PurchaseOrderType definition contains element
declarations involving complex types, e.g. USAddress, although note
that both declarations use the same type
attribute to identify the type, regardless of whether the type is simple or
complex.
In defining PurchaseOrderType, two of the element declarations,
for shipTo and billTo, associate different element
names with the same complex type, namely USAddress. The consequence
of this definition is that any element appearing in an instance document (e.g.
po.xml)
whose type is declared to be PurchaseOrderType must consist of
elements named shipTo and billTo, each containing the
five subelements (name, street, city,
state and zip) that were declared as part of
USAddress. The shipTo and billTo elements
may also carry the country attribute that was declared as part of
USAddress.
The PurchaseOrderType definition contains an
orderDate attribute declaration which, like the
country attribute declaration, identifies a simple type. In fact,
all attribute declarations must reference simple types because, unlike element
declarations, attributes cannot contain other elements or other attributes.
The element declarations we have described so far have each associated a name with an existing type definition. Sometimes it is preferable to use an existing element rather than declare a new element, for example:
<xsd:element ref="comment" minOccurs="0"/>
This declaration references an existing element, comment, that
was declared elsewhere in the purchase order schema. In general, the value of
the ref attribute
must reference a global element, i.e. one that has been declared under schema rather
than as part of a complex type definition. The consequence of this declaration
is that an element called comment may appear in an instance
document, and its content must be consistent with that element's type, in this
case, string.
The comment element is optional within
PurchaseOrderType because the value of the minOccurs
attribute in its declaration is 0. In general, an element is required to appear
when the value of minOccurs
is 1 or more. The maximum number of times an element may appear is determined by
the value of a maxOccurs
attribute in its declaration. This value may be a positive integer such as 41,
or the term unbounded to indicate there is no maximum number of
occurrences. The default value for both the minOccurs
and the maxOccurs
attributes is 1. Thus, when an element such as comment is declared
without a maxOccurs
attribute, the element may not occur more than once. Be sure that if you specify
a value for only the minOccurs
attribute, it is less than or equal to the default value of maxOccurs,
i.e. it is 0 or 1. Similarly, if you specify a value for only the maxOccurs
attribute, it must be greater than or equal to the default value of minOccurs,
i.e. 1 or more. If both attributes are omitted, the element must appear exactly
once.
Attributes may appear once or not at all, but no other number of
times, and so the syntax for specifying occurrences of attributes is different
than the syntax for elements. In particular, attributes can be declared with a
use
attribute to indicate whether the attribute is required (see for
example, the partNum attribute declaration in po.xsd),
optional, or even prohibited.
Default values of both attributes and elements are declared using the
default attribute, although this attribute has a slightly different
consequence in each case. When an attribute is declared with a default value,
the value of the attribute is whatever value appears as the attribute's value in
an instance document; if the attribute does not appear in the instance document,
the schema processor provides the attribute with a value equal to that of the
default
attribute. Note that default values for attributes only make sense if the
attributes themselves are optional, and so it is an error to specify both a
default value and anything other than a value of optional for
use.
The schema processor treats defaulted elements slightly differently. When an
element is declared with a default value, the value of the element is whatever
value appears as the element's content in the instance document; if the element
appears without any content, the schema processor provides the element with a
value equal to that of the default
attribute. However, if the element does not appear in the instance document, the
schema processor does not provide the element at all. In summary, the
differences between element and attribute defaults can be stated as: Default
attribute values apply when attributes are missing, and default element values
apply when elements are empty.
The fixed attribute is used in both attribute and
element declarations to ensure that the attributes and elements are set to
particular values. For example, po.xsd contains a
declaration for the country attribute, which is declared with a
fixed
value US. This declaration means that the appearance of a
country attribute in an instance document is optional (the default
value of use is
optional), although if the attribute does appear, its value must be
US, and if the attribute does not appear, the schema processor will
provide a country attribute with the value US. Note
that the concepts of a fixed value and a default value are mutually exclusive,
and so it is an error for a declaration to contain both fixed and
default attributes.
The values of the attributes used in element and attribute declarations to constrain their occurrences are summarized in Table 1.
| Table 1. Occurrence Constraints for Elements and Attributes | |||||||
|---|---|---|---|---|---|---|---|
|
|
Notes | |||||
| (1, 1) -, - | required, -, - | element/attribute must appear once, it may have any value | |||||
| (1, 1) 37, - | required, 37, - | element/attribute must appear once, its value must be 37 | |||||
| (2, unbounded) 37, - | n/a | element must appear twice or more, its value must be 37; in general, minOccurs and maxOccurs values may be positive integers, and maxOccurs value may also be "unbounded" | |||||
| (0, 1) -, - | optional, -, - | element/attribute may appear once, it may have any value | |||||
| (0, 1) 37, - | n/a | element may appear once, if it does not appear it is not provided; if it does appear and it is empty, its value is 37; if it does appear and it is not empty, its value must be 37 | |||||
| n/a | optional, 37, - | attribute may appear once, if it does appear its value must be 37, if it does not appear its value is 37 | |||||
| (0, 1) -, 37 | n/a | element may appear once; if it does not appear it is not provided; if it does appear and it is empty, its value is 37; otherwise its value is that given | |||||
| n/a | optional, -, 37 | attribute may appear once; if it does not appear its value is 37, otherwise its value is that given | |||||
| (0, 2) -, 37 | n/a | element may appear once, twice, or not at all; if the element does not appear it is not provided; if it does appear and it is empty, its value is 37; otherwise its value is that given; in general, minOccurs and maxOccurs values may be positive integers, and maxOccurs value may also be "unbounded" | |||||
| (0, 0) -, - | prohibited, -, - | element/attribute must not appear | |||||
| Note that neither minOccurs, maxOccurs, nor use may appear in the declarations of global elements and attributes. | |||||||
Global elements, and global attributes, are created by declarations that
appear as the children of the schema
element. Once declared, a global element or a global attribute can be referenced
in one or more declarations using the ref attribute
as described above. A declaration that references a global element enables the
referenced element to appear in the instance document in the context of the
referencing declaration. So, for example, the comment element
appears in po.xml at the same
level as the shipTo, billTo and items
elements because the declaration that references comment appears in
the complex type definition at the same level as the declarations of the other
three elements.
The declaration of a global element also enables the element to appear at the
top-level of an instance document. Hence purchaseOrder, which is
declared as a global element in po.xsd, can appear as
the top-level element in po.xml. Note that
this rationale will also allow a comment element to appear as the
top-level element in a document like po.xml.
There are a number of caveats concerning the use of global elements and
attributes. One caveat is that global declarations cannot contain references;
global declarations must identify simple and complex types directly. Put
concretely, global declarations cannot contain the ref attribute,
they must use the type
attribute (or, as we describe shortly, be followed by an anonymous type
definition). A second caveat is that cardinality constraints cannot be
placed on global declarations, although they can be placed on local declarations
that reference global declarations. In other words, global declarations cannot
contain the attributes minOccurs, maxOccurs, or
use.
We have now described how to define new complex types (e.g.
PurchaseOrderType), declare elements (e.g.
purchaseOrder) and declare attributes (e.g.
orderDate). These activities generally involve naming, and so the
question naturally arises: What happens if we give two things the same name? The
answer depends upon the two things in question, although in general the more
similar are the two things, the more likely there will be a conflict.
Here are some examples to illustrate when same names cause problems. If the two things are both types, say we define a complex type called USStates and a simple type called USStates, there is a conflict. If the two things are a type and an element or attribute, say we define a complex type called USAddress and we declare an element called USAddress, there is no conflict. If the two things are elements within different types (i.e. not global elements), say we declare one element called name as part of the USAddress type and a second element called name as part of the Item type, there is no conflict. (Such elements are sometimes called local element declarations.) Finally, if the two things are both types and you define one and XML Schema has defined the other, say you define a simple type called decimal, there is no conflict. The reason for the apparent contradiction in the last example is that the two types belong to different namespaces. We explore the use of namespaces in schema in a later section.
The purchase order schema declares several elements and attributes that have
simple types. Some of these simple types, such as string and decimal, are built
in to XML Schema, while others are derived from the built-in's. For example, the
partNum attribute has a type called SKU (Stock Keeping
Unit) that is derived from string. Both built-in
simple types and their derivations can be used in all element and attribute
declarations. Table
2 lists all the simple types built in to XML Schema, along with examples of
the different types.
| Table 2. Simple Types Built In to XML Schema | ||||
|---|---|---|---|---|
| Simple Type | Examples (delimited by commas) | Notes | ||
| string | Confirm this is electric | |||
| normalizedString | Confirm this is electric | see (3) | ||
| token | Confirm this is electric | see (4) | ||
| base64Binary | GpM7 | |||
| hexBinary | 0FB7 | |||
| integer | ...-1, 0, 1, ... | see (2) | ||
| positiveInteger | 1, 2, ... | see (2) | ||
| negativeInteger | ... -2, -1 | see (2) | ||
| nonNegativeInteger | 0, 1, 2, ... | see (2) | ||
| nonPositiveInteger | ... -2, -1, 0 | see (2) | ||
| long | -9223372036854775808, ... -1, 0, 1, ... 9223372036854775807 | see (2) | ||
| unsignedLong | 0, 1, ... 18446744073709551615 | see (2) | ||
| int | -2147483648, ... -1, 0, 1, ... 2147483647 | see (2) | ||
| unsignedInt | 0, 1, ...4294967295 | see (2) | ||
| short | -32768, ... -1, 0, 1, ... 32767 | see (2) | ||
| unsignedShort | 0, 1, ... 65535 | see (2) | ||
| byte | -128, ...-1, 0, 1, ... 127 | see (2) | ||
| unsignedByte | 0, 1, ... 255 | see (2) | ||
| decimal | -1.23, 0, 123.4, 1000.00 | see (2) | ||
| float | -INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN | equivalent to single-precision 32-bit floating point, NaN is "not a number", see (2) | ||
| double | -INF, -1E4, -0, 0, 12.78E-2, 12, INF, NaN | equivalent to double-precision 64-bit floating point, see (2) | ||
| boolean | true, false, 1, 0 | |||
| duration | P1Y2M3DT10H30M12.3S | 1 year, 2 months, 3 days, 10 hours, 30 minutes, and 12.3 seconds | ||
| dateTime | 1999-05-31T13:20:00.000-05:00 | May 31st 1999 at 1.20pm Eastern Standard Time which is 5 hours behind Co-Ordinated Universal Time, see (2) | ||
| date | 1999-05-31 | see (2) | ||
| time | 13:20:00.000, 13:20:00.000-05:00 | see (2) | ||
| gYear | 1999 | 1999, see (2) (5) | ||
| gYearMonth | 1999-02 | the month of February 1999, regardless of the number of days, see (2) (5) | ||
| gMonth | --05 | May, see (2) (5) | ||
| gMonthDay | --05-31 | every May 31st, see (2) (5) | ||
| gDay | ---31 | the 31st day, see (2) (5) | ||
| Name | shipTo | XML 1.0 Name type | ||
| QName | po:USAddress | XML Namespace QName | ||
| NCName | USAddress | XML Namespace NCName, i.e. a QName without the prefix and colon | ||
| anyURI |
|
|||
| language | en-GB, en-US, fr | valid values for xml:lang as defined in XML 1.0 | ||
| ID | XML 1.0 ID attribute type, see (1) | |||
| IDREF | XML 1.0 IDREF attribute type, see (1) | |||
| IDREFS | XML 1.0 IDREFS attribute type, see (1) | |||
| ENTITY | XML 1.0 ENTITY attribute type, see (1) | |||
| ENTITIES | XML 1.0 ENTITIES attribute type, see (1) | |||
| NOTATION | XML 1.0 NOTATION attribute type, see (1) | |||
| NMTOKEN |
|
XML 1.0 NMTOKEN attribute type, see (1) | ||
| NMTOKENS |
|
XML 1.0 NMTOKENS attribute type, i.e. a whitespace separated list of NMTOKEN's, see (1) | ||
| Notes: (1) To retain compatibility between XML Schema and XML 1.0 DTDs, the simple types ID, IDREF, IDREFS, ENTITY, ENTITIES, NOTATION, NMTOKEN, NMTOKENS should only be used in attributes. (2) A value of this type can be represented by more than one lexical format, e.g. 100 and 1.0E2 are both valid float formats representing "one hundred". However, rules have been established for this type that define a canonical lexical format, see XML Schema Part 2. (3) Newline, tab and carriage-return characters in a normalizedString type are converted to space characters before schema processing. (4) As normalizedString, and adjacent space characters are collapsed to a single space character, and leading and trailing spaces are removed. (5) The "g" prefix signals time periods in the Gregorian calendar. | ||||
New simple types are defined by deriving them from existing simple
types (built-in's and derived). In particular, we can derive a new simple type
by restricting an existing simple type, in other words, the legal range of
values for the new type are a subset of the existing type's range of values. We
use the simpleType
element to define and name the new simple type. We use the restriction
element to indicate the existing (base) type, and to identify the "facets" that
constrain the range of values. A complete list of facets is provided in Appendix B.
Suppose we wish to create a new type of integer called
myInteger whose range of values is between 10000 and 99999
(inclusive). We base our definition on the built-in simple type integer, whose range
of values also includes integers less than 10000 and greater than 99999. To
define myInteger, we restrict the range of the integer base type by
employing two facets called minInclusive
and maxInclusive:
The example shows one particular combination of a base type and two facets
used to define myInteger, but a look at the list of built-in simple
types and their facets (Appendix B) should
suggest other viable combinations.
The purchase order schema contains another, more elaborate, example
of a simple type definition. A new simple type called SKU is
derived (by restriction) from the simple type string. Furthermore,
we constrain the values of SKU using a facet called pattern in
conjunction with the regular expression "\d{3}-[A-Z]{2}" that is
read "three digits followed by a hyphen followed by two upper-case ASCII
letters":
This regular expression language is described more fully in Appendix D.
XML Schema defines twelve facets which are listed in Appendix B. Among
these, the enumeration
facet is particularly useful and it can be used to constrain the values of
almost every simple type, except the boolean type. The
enumeration
facet limits a simple type to a set of distinct values. For example, we can use
the enumeration
facet to define a new simple type called USState, derived from
string,
whose value must be one of the standard US state abbreviations:
USState would be a good replacement for the string type currently
used in the state element declaration. By making this replacement,
the legal values of a state element, i.e. the state
subelements of billTo and shipTo, would be limited to
one of AK, AL, AR, etc. Note that the
enumeration values specified for a particular type must be unique.
XML Schema has the concept of a list type, in addition to the so-called
atomic types that constitute most of the types listed in Table 2. (Atomic
types, list types, and the union types described in the next section are
collectively called simple types.) The value of an atomic type is indivisible
from XML Schema's perspective. For example, the NMTOKEN value
US is indivisible in the sense that no part of US,
such as the character "S", has any meaning by itself. In contrast, list types
are comprised of sequences of atomic types and consequently the parts of a
sequence (the "atoms") themselves are meaningful. For example, NMTOKENS is a list
type, and an element of this type would be a white-space delimited list of
NMTOKEN's,
such as "US UK FR". XML Schema has three built-in list types, they are NMTOKENS, IDREFS, and ENTITIES.
In addition to using the built-in list types, you can create new
list types by derivation from existing atomic types. (You cannot create list
types from existing list types, nor from complex types.) For example, to create
a list of myInteger's:
And an element in an instance document whose content conforms to
listOfMyIntType is:
<listOfMyInt>20003 15037 95977 95945</listOfMyInt>
Several facets can be applied to list types: length,
minLength,
maxLength,
pattern, and
enumeration.
For example, to define a list of exactly six US states
(SixUSStates), we first define a new list type called
USStateList from USState, and then we derive
SixUSStates by restricting USStateList to only six
items:
Elements whose type is SixUSStates must have six items, and each
of the six items must be one of the (atomic) values of the enumerated type
USState, for example:
<sixStates>PA NY CA NY LA AK</sixStates>
Note that it is possible to derive a list type from the atomic type string. However, a
string may
contain white space, and white space delimits the items in a list type, so you
should be careful using list types whose base type is string. For example,
suppose we have defined a list type with a length facet
equal to 3, and base type string, then the
following 3 item list is legal:
Asie Europe Afrique
But the following 3 "item" list is illegal:
Asie Europe Am閞ique Latine
Even though "Am閞ique Latine" may exist as a single string outside of the list, when it is included in the list, the whitespace between Am閞ique and Latine effectively creates a fourth item, and so the latter example will not conform to the 3-item list type.
Atomic types and list types enable an element or an attribute value
to be one or more instances of one atomic type. In contrast, a union type
enables an element or attribute value to be one or more instances of one type
drawn from the union of multiple atomic and list types. To illustrate, we create
a union type for representing American states as singleton letter abbreviations
or lists of numeric codes. The zipUnion union type is built from
one atomic type and one list type:
When we define a union type, the memberTypes attribute value is
a list of all the types in the union.
Now, assuming we have declared an element called zips of type
zipUnion, valid instances of the element are:
<zips>CA</zips> <zips>95630 95977 95945</zips> <zips>AK</zips>
Two facets, pattern and
enumeration,
can be applied to a union type.
Schemas can be constructed by defining sets of named types such as
PurchaseOrderType and then declaring elements such as
purchaseOrder that reference the types using the type=
construction. This style of schema construction is straightforward but it can be
unwieldy, especially if you define many types that are referenced only once and
contain very few constraints. In these cases, a type can be more succinctly
defined as an anonymous type which saves the overhead of having to be named and
explicitly referenced.
The definition of the type Items in po.xsd contains two
element declarations that use anonymous types (item and
quantity). In general, you can identify anonymous types by the lack
of a type= in an
element (or attribute) declaration, and by the presence of an un-named (simple
or complex) type definition:
In the case of the item element, it has an anonymous complex
type consisting of the elements productName, quantity,
USPrice, comment, and shipDate, and an
attribute called partNum. In the case of the quantity
element, it has an anonymous simple type derived from positiveInteger
whose value ranges between 1 and 99.
The purchase order schema has many examples of elements containing other
elements (e.g. items), elements having attributes and containing
other elements (e.g. shipTo), and elements containing only a simple
type of value (e.g. USPrice). However, we have not seen an element
having attributes but containing only a simple type of value, nor have we seen
an element that contains other elements mixed with character content, nor have
we seen an element that has no content at all. In this section we'll examine
these variations in the content models of elements.
Let us first consider how to declare an element that has an attribute and contains a simple value. In an instance document, such an element might appear as:
<internationalPrice currency="EUR">423.46</internationalPrice>
The purchase order schema declares a USPrice element that is a
starting point:
<xsd:element name="USPrice" type="decimal"/>
Now, how do we add an attribute to this element? As we have said
before, simple types cannot have attributes, and decimal is a simple
type. Therefore, we must define a complex type to carry the attribute
declaration. We also want the content to be simple type decimal. So our
original question becomes: How do we define a complex type that is based on the
simple type decimal? The answer
is to derive a new complex type from the simple type decimal:
We use the complexType
element to start the definition of a new (anonymous) type. To indicate that the
content model of the new type contains only character data and no elements, we
use a simpleContent
element. Finally, we derive the new type by extending the simple decimal type. The
extension consists of adding a currency attribute using a standard
attribute declaration. (We cover type derivation in detail in Advanced Concepts II: The
International Purchase Order (?).) The internationalPrice
element declared in this way will appear in an instance as shown in the example
at the beginning of this section.
The construction of the purchase order schema may be characterized as elements containing subelements, and the deepest subelements contain character data. XML Schema also provides for the construction of schemas where character data can appear alongside subelements, and character data is not confined to the deepest subelements.
To illustrate, consider the following snippet from a customer letter that uses some of the same elements as the purchase order:
Notice the text appearing between elements and their child elements.
Specifically, text appears between the elements salutation,
quantity, productName and shipDate which
are all children of letterBody, and text appears around the element
name which is the child of a child of letterBody. The
following snippet of a schema declares letterBody:
The elements appearing in the customer letter are declared, and their types
are defined using the element and
complexType
element constructions we have seen before. To enable character data to appear
between the child-elements of letterBody, the mixed
attribute on the type definition is set to true.
Note that the mixed model in XML Schema differs fundamentally
from the mixed model
in XML 1.0. Under the XML Schema mixed model, the order and number of child
elements appearing in an instance must agree with the order and number of child
elements specified in the model. In contrast, under the XML 1.0 mixed model, the
order and number of child elements appearing in an instance cannot be
constrained. In summary, XML Schema provides full validation of mixed models in
contrast to the partial schema validation provided by XML 1.0.
Now suppose that we want the internationalPrice element to
convey both the unit of currency and the price as attribute values rather than
as separate attribute and content values. For example:
<internationalPrice currency="EUR" value="423.46"/>
Such an element has no content at all; its content model is empty. To define a type whose content is empty, we essentially define a type that allows only elements in its content, but we do not actually declare any elements and so the type's content model is empty:
In this example, we define an (anonymous) type having
complexContent, i.e. only elements. The complexContent
element signals that we intend to restrict or extend the content model of a
complex type, and the restriction of anyType declares
two attributes but does not introduce any element content (see Deriving Complex Types
by Restriction (?.4) for more details on restriction). The
internationalPrice element declared in this way may legitimately
appear in an instance as shown in the example above.
The preceding syntax for an empty-content element is relatively verbose, and
it is possible to declare the internationalPrice element more
compactly:
This compact syntax works because a complex type defined without any
simpleContent or complexContent is interpreted as
shorthand for complex content that restricts anyType.
The anyType represents an abstraction called the ur-type
which is the base type from which all simple and complex types are derived. An
anyType type does not constrain its content in any way. It is
possible to use anyType like other types, for example:
<xsd:element name="anything" type="xsd:anyType"/>
The content of the element declared in this way is unconstrained, so the
element value may be 423.46, but it may be any other sequence of characters as
well, or indeed a mixture of characters and elements. In fact,
anyType is the default type when none is specified, so the above
could also be written as follows:
<xsd:element name="anything"/>
If unconstrained element content is needed, for example in the case of
elements containing prose which requires embedded markup to support
internationalization, then the default declaration or a slightly restricted form
of it may be suitable. The text type described in Any Element, Any Attribute
(?.5) is an example of such a type that is suitable for such purposes.
XML Schema provides three elements for annotating schemas for the
benefit of both human readers and applications. In the purchase order schema, we
put a basic schema description and copyright information inside the documentation
element, which is the recommended location for human readable material. We
recommend you use the xml:lang attribute with any documentation
elements to indicate the language of the information. Alternatively, you may
indicate the language of all information in a schema by placing an
xml:lang attribute on the schema element.
The appinfo
element, which we did not use in the purchase order schema, can be used to
provide information for tools, stylesheets and other applications. An
interesting example using appinfo is a
schema
that describes the simple types in XML Schema Part 2: Datatypes. Information
describing this schema, e.g. which facets are applicable to particular simple
types, is represented inside appinfo
elements, and this information was used by an application to automatically
generate text for the XML Schema Part 2 document.
Both documentation
and appinfo
appear as subelements of annotation,
which may itself appear at the beginning of most schema constructions. To
illustrate, the following example shows annotation
elements appearing at the beginning of an element declaration and a complex type
definition:
The annotation
element may also appear at the beginning of other schema constructions such as
those indicated by the elements schema,
simpleType,
and attribute.
The definitions of complex types in the purchase order schema all declare
sequences of elements that must appear in the instance document. The occurrence
of individual elements declared in the so-called content models of these types
may be optional, as indicated by a 0 value for the attribute minOccurs
(e.g. in comment), or be otherwise constrained depending upon the
values of minOccurs
and maxOccurs.
XML Schema also provides constraints that apply to groups of elements appearing
in a content model. These constraints mirror those available in XML 1.0 plus
some additional constraints. Note that the constraints do not apply to
attributes.
XML Schema enables groups of elements to be defined and named, so that the elements can be used to build up the content models of complex types (thus mimicking common usage of parameter entities in XML 1.0). Un-named groups of elements can also be defined, and along with elements in named groups, they can be constrained to appear in the same order (sequence) as they are declared. Alternatively, they can be constrained so that only one of the elements may appear in an instance.
To illustrate, we introduce two groups into the
PurchaseOrderType definition from the purchase order schema so that
purchase orders may contain either separate shipping and billing addresses, or a
single address for those cases in which the shippee and billee are co-located:
The choice group
element allows only one of its children to appear in an instance. One child is
an inner group element
that references the named group shipAndBill consisting of the
element sequence shipTo, billTo, and the second child
is a singleUSAddress. Hence, in an instance document, the
purchaseOrder element must contain either a shipTo
element followed by a billTo element or a
singleUSAddress element. The choice group
is followed by the comment and items element
declarations, and both the choice group
and the element declarations are children of a sequence
group. The effect of these various groups is that the address element(s) must be
followed by comment and items elements in that order.
There exists a third option for constraining elements in a group:
All the elements in the group may appear once or not at all, and they may appear
in any order. The all group (which
provides a simplified version of the SGML &-Connector) is limited to the
top-level of any content model. Moreover, the group's children must all be
individual elements (no groups), and no element in the content model may appear
more than once, i.e. the permissible values of minOccurs
and maxOccurs
are 0 and 1. For example, to allow the child elements of
purchaseOrder to appear in any order, we could redefine
PurchaseOrderType as:
By this definition, a comment element may optionally appear
within purchaseOrder, and it may appear before or after any
shipTo, billTo and items elements, but it
can appear only once. Moreover, the stipulations of an all group do not
allow us to declare an element such as comment outside the group as
a means of enabling it to appear more than once. XML Schema stipulates that an
all
group must appear as the sole child at the top of a content model. In other
words, the following is illegal:
Finally, named and un-named groups that appear in content models (represented
by group and
choice,
sequence,
all
respectively) may carry minOccurs
and maxOccurs
attributes. By combining and nesting the various groups provided by XML Schema,
and by setting the values of minOccurs
and maxOccurs,
it is possible to represent any content model expressible with an XML 1.0 DTD.
Furthermore, the all group
provides additional expressive power.
Suppose we want to provide more information about each item in a purchase
order, for example, each item's weight and preferred shipping method. We can
accomplish this by adding weightKg and shipBy
attribute declarations to the item element's (anonymous) type
definition:
Alternatively, we can create a named attribute group containing all
the desired attributes of an item element, and reference this group
by name in the item element declaration:
Using an attribute group in this way can improve the readability of schemas, and facilitates updating schemas because an attribute group can be defined and edited in one place and referenced in multiple definitions and declarations. These characteristics of attribute groups make them similar to parameter entities in XML 1.0. Note that an attribute group may contain other attribute groups. Note also that both attribute declarations and attribute group references must appear at the end of complex type definitions.
One of the purchase order items listed in po.xml, the
Lawnmower, does not have a shipDate element. Within
the context of our scenario, the schema author may have intended such absences
to indicate items not yet shipped. But in general, the absence of
an element does not have any particular meaning: It may indicate that the
information is unknown, or not applicable, or the element may be absent for some
other reason. Sometimes it is desirable to represent an unshipped
item, unknown information, or inapplicable information
explicitly with an element, rather than by an absent element. For
example, it may be desirable to represent a "null" value being sent to or from a
relational database with an element that is present. Such cases can be
represented using XML Schema's nil mechanism which enables an element to appear
with or without a non-nil value.
XML Schema's nil mechanism involves an "out of band" nil signal. In
other words, there is no actual nil value that appears as element content,
instead there is an attribute to indicate that the element content is nil. To
illustrate, we modify the shipDate element declaration so that nils
can be signalled:
<xsd:element name="shipDate" type="xsd:date" nillable="true"/>
And to explicitly represent that shipDate has a nil
value in the instance document, we set the nil
attribute (from the XML Schema namespace for instances) to true:
<shipDate xsi:nil="true"></shipDate>
The nil
attribute is defined as part of the XML Schema namespace for instances,
http://www.w3.org/2001/XMLSchema-instance, and so it must appear in
the instance document with a prefix (such as xsi:) associated with
that namespace. (As with the xsd: prefix, the xsi:
prefix is used by convention only.) Note that the nil mechanism applies only to
element values, and not to attribute values. An element with xsi:nil="true"
may not have any element content but it may still carry attributes.
A schema can be viewed as a collection (vocabulary) of type
definitions and element declarations whose names belong to a particular
namespace called a target namespace. Target namespaces enable us to distinguish
between definitions and declarations from different vocabularies. For example,
target namespaces would enable us to distinguish between the declaration for
element in
the XML Schema language vocabulary, and a declaration for element
in a hypothetical chemistry language vocabulary. The former is part of the
http://www.w3.org/2001/XMLSchema target namespace, and the latter
is part of another target namespace.
When we want to check that an instance document conforms to one or more schemas (through a process called schema validation), we need to identify which element and attribute declarations and type definitions in the schemas should be used to check which elements and attributes in the instance document. The target namespace plays an important role in the identification process. We examine the role of the target namespace in the next section.
The schema author also has several options that affect how the identities of elements and attributes are represented in instance documents. More specifically, the author can decide whether or not the appearance of locally declared elements and attributes in an instance must be qualified by a namespace, using either an explicit prefix or implicitly by default. The schema author's choice regarding qualification of local elements and attributes has a number of implications regarding the structures of schemas and instance documents, and we examine some of these implications in the following sections.
In a new version of the purchase order schema, po1.xsd, we
explicitly declare a target namespace, and specify that both locally defined
elements and locally defined attributes must be unqualified. The target
namespace in po1.xsd is
http://www.example.com/PO1, as indicated by the value of the
targetNamespace
attribute.
Qualification of local elements and attributes can be globally
specified by a pair of attributes, elementFormDefault
and attributeFormDefault,
on the schema
element, or can be specified separately for each local declaration using the
form
attribute. All such attributes' values may each be set to
unqualified or qualified, to indicate whether or not
locally declared elements and attributes must be unqualified.
In po1.xsd we globally
specify the qualification of elements and attributes by setting the values of
both elementFormDefault
and attributeFormDefault
to unqualified. Strictly speaking, these settings are unnecessary
because the values are the defaults for the two attributes; we make them here to
highlight the contrast between this case and other cases we describe later.
To see how the target namespace of this schema is populated, we examine in
turn each of the type definitions and element declarations. Starting from the
end of the schema, we first define a type called USAddress that
consists of the elements name, street, etc. One
consequence of this type definition is that the USAddress type is
included in the schema's target namespace. We next define a type called
PurchaseOrderType that consists of the elements
shipTo, billTo, comment, etc.
PurchaseOrderType is also included in the schema's target
namespace. Notice that the type references in the three element declarations are
prefixed, i.e. po:USAddress, po:USAddress and
po:comment, and the prefix is associated with the namespace
http://www.example.com/PO1. This is the same namespace as the
schema's target namespace, and so a processor of this schema will know to look
within this schema for the definition of the type USAddress and the
declaration of the element comment. It is also possible to refer to
types in another schema with a different target namespace, hence enabling re-use
of definitions and declarations between schemas.
At the beginning of the schema po1.xsd, we declare
the elements purchaseOrder and comment. They are
included in the schema's target namespace. The purchaseOrder
element's type is prefixed, for the same reason that USAddress is
prefixed. In contrast, the comment element's type, string, is not
prefixed. The po1.xsd schema
contains a default namespace declaration, and so unprefixed types such as
string and
unprefixed elements such as element and
complexType
are associated with the default namespace
http://www.w3.org/2001/XMLSchema. In fact, this is the target
namespace of XML Schema itself, and so a processor of po1.xsd will know to
look within the schema of XML Schema -- otherwise known as the "schema for
schemas" -- for the definition of the type string and the
declaration of the element called element.
Let us now examine how the target namespace of the schema affects a conforming instance document:
The instance document declares one namespace,
http://www.example.com/PO1, and associates it with the prefix
apo:. This prefix is used to qualify two elements in the document,
namely purchaseOrder and comment. The namespace is the
same as the target namespace of the schema in po1.xsd, and so a
processor of the instance document will know to look in that schema for the
declarations of purchaseOrder and comment. In fact,
target namespaces are so named because of the sense in which there exists a
target namespace for the elements purchaseOrder and
comment. Target namespaces in the schema therefore control the
validation of corresponding namespaces in the instance.
The prefix apo: is applied to the global elements
purchaseOrder and comment elements. Furthermore,
elementFormDefault
and attributeFormDefault
require that the prefix is not applied to any of the locally declared
elements such as shipTo, billTo, name and
street, and it is not applied to any of the attributes
(which were all declared locally). The purchaseOrder and
comment are global elements because they are declared in the
context of the schema as a whole rather than within the context of a particular
type. For example, the declaration of purchaseOrder appears as a
child of the schema
element in po1.xsd, whereas the
declaration of shipTo appears as a child of the complexType
element that defines PurchaseOrderType.
When local elements and attributes are not required to be qualified, an
instance author may require more or less knowledge about the details of the
schema to create schema valid instance documents. More specifically, if the
author can be sure that only the root element (such as
purchaseOrder) is global, then it is a simple matter to qualify
only the root element. Alternatively, the author may know that all the elements
are declared globally, and so all the elements in the instance document can be
prefixed, perhaps taking advantage of a default namespace declaration. (We
examine this approach in Global vs. Local
Declarations (?.3).) On the other hand, if there is no uniform pattern of
global and local declarations, the author will need detailed knowledge of the
schema to correctly prefix global elements and attributes.
Elements and attributes can be independently required to be qualified,
although we start by describing the qualification of local elements. To specify
that all locally declared elements in a schema must be qualified, we set the
value of elementFormDefault
to qualified:
<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:po="http://www.example.com/PO1"
targetNamespace="http://www.example.com/PO1"
elementFormDefault="qualified"
attributeFormDefault="unqualified">
<element name="purchaseOrder" type="po:PurchaseOrderType"/>
<element name="comment" type="string"/>
<complexType name="PurchaseOrderType">
<!-- etc. -->
</complexType>
<!-- etc. -->
</schema>
And in this conforming instance document, we qualify all the elements explicitly:
Alternatively, we can replace the explicit qualification of every element
with implicit qualification provided by a default namespace, as shown here in
po2.xml:
In po2.xml, all the
elements in the instance belong to the same namespace, and the namespace
statement declares a default namespace that applies to all the elements in the
instance. Hence, it is unnecessary to explicitly prefix any of the elements. As
another illustration of using qualified elements, the schemas in Advanced Concepts III:
The Quarterly Report (?) all require qualified elements.
Qualification of attributes is very similar to the qualification of elements.
Attributes that must be qualified, either because they are declared globally or
because the attributeFormDefault
attribute is set to qualified, appear prefixed in instance
documents. One example of a qualified attribute is the xsi:nil
attribute that was introduced in Nil Values (?.9). In fact,
attributes that are required to be qualified must be explicitly prefixed because
the Namespaces in
XML specification does not provide a mechanism for defaulting the namespaces
of attributes. Attributes that are not required to be qualified appear in
instance documents without prefixes, which is the typical case.
The qualification mechanism we have described so far has controlled
all local element and attribute declarations within a particular target
namespace. It is also possible to control qualification on a declaration by
declaration basis using the form
attribute. For example, to require that the locally declared attribute
publicKey is qualified in instances, we declare it in the following
way:
Notice that the value of the form
attribute overrides the value of the attributeFormDefault
attribute for the publicKey attribute only. Also, the form
attribute can be applied to an element declaration in the same manner. An
instance document that conforms to the schema is:
Another authoring style, applicable when all element names are unique within
a namespace, is to create schemas in which all elements are global. This is
similar in effect to the use of <!ELEMENT> in a DTD. In the example below,
we have modified the original po1.xsd such that
all the elements are declared globally. Notice that we have omitted the elementFormDefault
and attributeFormDefault
attributes in this example to emphasize that their values are irrelevant when
there are only global element and attribute declarations.
<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:po="http://www.example.com/PO1"
targetNamespace="http://www.example.com/PO1">
<element name="purchaseOrder" type="po:PurchaseOrderType"/>
<element name="shipTo" type="po:USAddress"/>
<element name="billTo" type="po:USAddress"/>
<element name="comment" type="string"/>
<element name="name" type="string"/>
<element name="street" type="string"/>
<complexType name="PurchaseOrderType">
<sequence>
<element ref="po:shipTo"/>
<element ref="po:billTo"/>
<element ref="po:comment" minOccurs="0"/>
<!-- etc. -->
</sequence>
</complexType>
<complexType name="USAddress">
<sequence>
<element ref="po:name"/>
<element ref="po:street"/>
<!-- etc. -->
</sequence>
</complexType>
<!-- etc. -->
</schema>
This "global" version of po1.xsd will
validate the instance document po2.xml which, as we
described previously, is also schema valid against the "qualified" version of
po1.xsd. In
other words, both schema approaches can validate the same, namespace defaulted,
document. Thus, in one respect the two schema approaches are similar, although
in another important respect the two schema approaches are very different.
Specifically, when all elements are declared globally, it is not possible to
take advantage of local names. For example, you can only declare one global
element called "title". However, you can locally declare one element called
"title" that has a string type, and is a subelement of "book". Within the same
schema (target namespace) you can declare a second element also called "title"
that is an enumeration of the values "Mr Mrs Ms".
In Basic Concepts: The Purchase Order (?) we explained the basics of XML Schema using a schema that did not declare a target namespace and an instance document that did not declare a namespace. So the question naturally arises: What is the target namespace in these examples and how is it referenced?
In the purchase order schema, po.xsd, we did not
declare a target namesp