Basic Content Document Specification, Version 1.0

OpenReader Consortium Preliminary Working Draft (13 May 2006)

This version:
http://openreader.org/spec/bcd10.html
Latest version:
http://openreader.org/spec/bcd10.html
Previous version:
(archived)
Contributors:
Jon Noring (editor)

Table of Contents


1. Introduction

This specification details the structure, conformance requirements and recommendations of a Basic Content Document. It is a module specification within the suite of specifications used to define the OpenReader Publication Framework Specification.

By intentional design, a Basic Content Document is essentially an XHTML 1.1 document. Thus, many of the tools and methodologies used to author high-quality, structurally-oriented XHTML 1.1 documents may be used to author Basic Content Documents.

In addition, with only very minor changes, a Basic Content Document may be easily converted to a conforming Basic OEBPS 1.2 Document. OEBPS 1.2 is the latest version (as of the time of release of this specification) in the family of ebook exchange formats published by the International Digital Publishing Forum.

The Basic Content Document vocabulary (the supported elements, attributes and values, and content model) is a carefully-crafted subset of both XHTML 1.1 and Basic OEBPS 1.2. The selected vocabulary is efficient and streamlined yet powerful, structurally-oriented yet flexible, and forward-looking with regards to the upcoming XHTML 2.0 specification.

1.1 Normative Edition

The normative edition of this specification is the XHTML 1.1 document located at http://openreader.org/spec/bcd10.html .

Other formatted editions may be offered besides the normative edition, but they will not be considered normative.

The XHTML 1.1 normative edition is authored so that the markup in the document body (that contained in the body element) conforms to this specification.

1.2 Definitions

Several important words and terms used in this specification are defined in the Common Definitions Document, Version 1.0.

1.3 Requirement Levels

The following key words (“imperatives”) are used in this specification to denote requirement level consistent with RFC 2119:

  • must
  • must not
  • required
  • should
  • should not
  • recommended
  • may
  • not required
  • optional

1.4 Highlighting Conventions

To aid in readability and understandability, special text highlighting conventions are used in this specification (in addition to ordinary text emphasis) to emphasize important items.

1.4.1 Imperative Level

The requirement level imperatives described in Section 1.3 are highlighted based on three basic imperative levels: required, recommended, and optional.

1.4.2 Elements, Attributes and Attribute Values

The normative XHTML 1.1 edition of this specification includes special markup for every mention of elements, attributes, attribute values, and other related code. (For details, refer to the comment in the source document header.) This allows these markup constructs to be specially highlighted, using CSS, during presentation (including their status and requirement level) so they may be more easily recognized.

Since the normative edition of this specification may be rendered with different CSS style sheets, converted into other formats, rendered on visually limited hardware, or presented with text-to-speech engines, some or all of this highlighting may be lost. Care has been taken to assure that, in the absence of highlighting, every mention of these markup constructs will be clear and unambiguous.

Element Highlighting
Status Requirement Level
Required Cond. Req. Optional
Normal body li div
Deprecated br
Removed noscript
Attribute Highlighting
Status Requirement Level
Required Cond. Req. Optional Fixed
Normal xmlns href title xml:space
Deprecated
Removed style

In the above tables, there are four requirement levels:

  1. “Required” means the element/attribute must appear, in some capacity, in all Binder documents.

  2. “Conditionally Required” means the element/attribute must appear under certain element usage situations, and is optional in other situations.

  3. “Optional” means the element/attribute is optional under all situations.

  4. “Fixed” (applicable only to attributes) means the attribute is fixed to a certain value in the DTD and there is no separate requirement the attribute must appear in the associated element.

Similarly, there are three status levels:

  1. “Normal” means the element/attribute has normal status in this specification.

  2. “Deprecated” means the element/attribute has been deprecated, and support for it may be removed in a future version of this specification.

  3. “Removed” means the element/attribute is no longer supported in this specification, but is nevertheless mentioned.

An empty cell in the tables means there is no mention in this specification of an element/attribute having the associated status and requirement level.

Attribute values are highlighted as en-US.

Other types of “code” are highlighted as PCDATA.

1.5 Referenced Specifications and Standards

This specification is built upon a wide and stable base of compatible open specifications and standards. Following are the various specifications and standards referenced in some manner by this specification.

OpenReader Specifications:

W3C Specifications and Notes:

Internet Engineering Task Force (IETF):

International Organization for Standardization (ISO):

Others:

2. Basic Content Document: MIME Media Type “application/x-orp-bcd1+xml

The MIME Media Type of a conforming Basic Content Document is “application/x-orp-bcd1+xml”. This MIME media type is not IANA registered.

Other specifications and applications using or referencing Basic Content Documents by MIME media type should use “application/x-orp-bcd1+xml”, rather than one based on the “text/” media type name. The reason is that Basic Content Documents may be encoded in UTF-16 (see Section 3.1) and “application/” media type names are more appropriate when both UTF-8 and UTF-16 encodings are allowed (RFC 3023).

3. Basic Content Document: General Requirements

By careful design, a Basic Content Document is a conforming XHTML 1.1 document with only a simple change in the DOCTYPE declaration. However, the full XHTML 1.1 Specification is not supported — only a selected subset of elements, attributes and attribute values is supported, along with various constraints, requirements and recommendations unique to this specification.

3.1 General Conformance Requirements

A conformant Basic Content Document must meet all of the following general and top-level requirements:

  1. Fully conforms to XML 1.0 (e.g., it is well-formed)

  2. Text encoding is UTF-8 or UTF-16 as specified in the latest Unicode standard.

  3. Includes an XML declaration with a text encoding declaration:

    <?xml version="1.0" encoding="UTF-8" ?>
    

    or

    <?xml version="1.0" encoding="UTF-16" ?>
    
  4. Valid to the Basic Content Document DTD, Version 1.0, which is externally referenced by public identifier as follows:

    <!DOCTYPE html PUBLIC
         "-//OpenReader//DTD Basic Content Document 1.0//EN"
         "http://openreader.org/dtd/bcd10.dtd">
    
  5. Does not include a DTD internal subset.

  6. For the document root element html, the default namespace is explicitly declared to be the XHTML namespace, and the required attribute xml:lang (see Section 4.1.1.4) specifies the primary language of the content document.

    Example where the primary language is U.S. English:

    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US">
    
  7. Does not declare any other namespaces, whether default or prefixed.

  8. Conforms with all the specific requirements and constraints described elsewhere in this specification.

3.2 Structure of a Basic Content Document

Based on the general requirements in Section 3.1, the following markup example is constructed, showing the general structure of a conforming Basic Content Document. It includes the required XML and DOCTYPE declarations, and top-level elements, attributes, and attribute values.

Document authors will find it a useful template (or “boilerplate”) to use as a starting point to build conforming Basic Content Documents. As noted in the general requirements, the text encoding declaration in the XML declaration may either be UTF-8 or UTF-16. Also, the value of the xml:lang attribute in html will vary depending upon the primary language of the Basic Content Document (see Section 4.1.1.4).

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC
     "-//OpenReader//DTD Basic Content Document 1.0//EN"
     "http://openreader.org/dtd/bcd10.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US">
<head>
   <title> ... required document title goes here ... </title>
   <!-- optional <meta> elements go here -->
</head>
<body>
   <!-- body content goes here -->
</body>
</html>

3.3 Some XML Requirements and Vocabulary-Independent Constructs

This specification is not intended to be a tutorial on how to author XML-conforming Basic Content Documents. Nevertheless, to aid content document authors in creating Basic Content Documents, which must be well-formed XML, and which may include the useful vocabulary-independent constructs that XML allows, this section presents a sampling of the most important markup-related XML requirements and vocabulary-independent constructs.

3.3.1 Important XML Markup Requirements

As specified in Section 3.1, all Basic Content Documents must be conforming XML 1.0 documents, which means, for example, they are well-formed. Following is a list of several specific XML markup requirements, but this list is by no means exhaustive. These requirements are mentioned since, when not followed, they contribute to a large fraction of encountered XML well-formedness and validation errors.

  • Element and attribute names are case-sensitive. For example <div> and <DIV> are different elements.

  • Attribute values must be enclosed in either single or double straight quotes. For example, class="abcd" and class='abcd' are conforming, but class=abcd, class="abcd' and class='abcd" are not.

  • All non-empty elements must have properly formed starting and closing tags.

  • All declared empty elements must be properly formed (see Section 3.3.5).

  • All elements must properly nest.

  • Depending upon the circumstance, certain markup characters, when used literally (e.g. & and <), must be escaped. Refer to the next Section 3.3.2 for details.

3.3.2 Character Entity and Numeric Character References

For XML 1.0 documents (and this includes Basic Content Documents), each individual character within character data is represented in one of two ways:

  1. directly in the document’s text encoding, and

  2. indirectly using a numeric character reference, or by a predefined or declared character entity reference which points to a numeric character reference.

For example, it is convenient to use numeric character references and allowed character entity references when the tool used to create an XML document is limited to ASCII encoding (UTF-8 conformant but limited to the Basic Latin script), and some characters fall outside of the ASCII range.

In certain circumstances, five of the characters used to define XML markup constructs (specifically & < > " and '), when used literally, must be represented (or “escaped”) by their numeric character references, or by their declared character entity reference equivalents. For this purpose, XML predefines entity references for these five characters which all XML processors must recognize:

Predefined XML Character Entity References
Character Predefined Entity Numeric Reference (hex) Numeric Reference (dec)
& &amp; &#x0026; &#38;
< &lt; &#x003C; &#60;
> &gt; &#x003E; &#62;
" &quot; &#x0022; &#34;
' &apos; &#x0027; &#39;

Listed below are the circumstances when the five markup characters, used literally and not as part of markup, must be escaped:

  1. The & and < characters, except when used within CDATA sections and Comments.

  2. The " and ' characters when they appear within an attribute value and match the attribute value delimiting quote mark. It is recommended that both always be escaped in attribute values.

  3. The > character in the very rare instance it appears in the string “]]>” when that string is not marking the end of a CDATA section. It is considered good practice to escape the > character wherever it is used literally.

Example of content document markup with both required and optional numeric and character entity references:

<h2 title='Jane&apos;s AT&amp;T R&#x00E9;sum&#x00E9;'>Jane's AT&amp;T R&#x00E9;sum&#x00E9;</h2>

A user agent will render the above markup as:

Jane's AT&T Résumé

3.3.3 CDATA Sections

CDATA sections may be used in XML documents (which includes Basic Content Documents) to escape blocks of text containing markup characters (e.g. “<” and “&”) when used literally. This is an alternative to individually escaping each markup character (see Section 3.3.2).

A CDATA section starts with “<![CDATA[” and terminates with “]]>”.

CDATA sections may be used anywhere character data may occur, except that they must not appear within an attribute value. They must not nest; the text content within a CDATA section must not contain the literal character sequence “]]>”.

Example:

<p>Insert the following in your document: &lt;h1&gt;Greetings!&lt;/h1&gt;</p>

is equivalent to

<p>Insert the following in your document: <![CDATA[<h1>Greetings!</h1>]]></p>

A user agent will render both of the above as:

Insert the following in your document: <h1>Greetings!</h1>

3.3.4 Comments

Comments may appear anywhere in an XML document (including a Basic Content Document) except before the XML declaration and within other markup. Comments are not part of the character data; they are primarily intended for content document authors to insert private commentary (notes) within the document.

A comment starts with “<!--” and terminates with “-->”; the comment text is between these two delimiters. The comment text must not contain the string “--” (two hyphens), but otherwise may include, without escaping, all the Unicode characters recognized in XML 1.0, including the XML markup characters. A comment must not terminate with the literal string “--->”.

Examples:

<!-- This is a comment -->
<p>XML markup characters</p> <!-- & < > " ' -->

To conform to this specification, user agents must not:

  1. render the contents of comments,
  2. execute any script or code (such as JavaScript) contained in comments,
  3. take any action based on the content in comments.

3.3.5 Empty Elements and Empty Content

Some elements in a DTD may be declared EMPTY. When used in an XML document, these elements must not contain any content and must use the empty-element syntax (also known as “minimized form”) as specified in XML 1.0.

Example of correct usage of declared empty element syntax (the element img is declared EMPTY in this specification):

<img src="myimage.png" alt="A cool image"/>

Note: a sequence of one or more white space characters may appear before the closing “/” in empty-element syntax.

In this specification, the empty-element syntax must only be used for declared empty elements; it must not be used for declared non-empty elements when they contain no content.

Example of correct and incorrect usage when a declared non-empty element contains no content (the element p is declared non-empty in this specification):

<p/>      <!-- not allowed! -->
<p></p>   <!-- correct usage -->

When declared non-empty elements contain no content, or only a sequence of one or more white space characters, this occurence is referred in this specification as “empty content.”

3.3.6 White Space Handling

White space characters and their handling by user agents is an important consideration to both Basic Content Document authors and user agent developers.

In XML, the white space characters are:

  • space (&#x0020;)

  • tab (&#x0009;)

  • carriage return (&#x000D;)

  • line feed (&#x000A;)

The rules for white space handling of both character data and attribute values by XML processors are addressed in Sections 2.10, 3.2.1 and 3.3.3 of the XML 1.0 Specification.

User agent requirements:

  1. For character data, XML processors are required to pass to the user agent all characters in a document that are not markup. This includes white space characters.

    Except where the XML attribute xml:space is specifically set to the value of preserve, or a similar override mechanism is applied (e.g., the CSS white-space property), in this specification user agents must normalize the character data of an element as follows:

    • Replace all sequences of two or more white space characters with a single space character (&#x0020;), and

    • Remove all leading and trailing spaces.

    Example:

    <p>
       <i> This
       </i>    is a
               paragraph.
    </p>
    

    and

    <p> <i> This    </i>   is   a paragraph. </p>
    

    are both equivalent to:

    <p><i>This</i> is a paragraph.</p>
    
  2. For white space in attribute values, XML requires that all XML processors normalize attribute values before sending the attribute value data to the user agent. Note that this normalization process treats attribute values not of type CDATA differently from those of type CDATA.

    To conform to this specification, user agents must normalize CDATA attribute values as if they were not of type CDATA. That is, for attribute values of type CDATA, user agents must replace a sequence of space (&#x0020;) characters with a single space (&#x0020;) character, and remove any leading and trailing space (&#x0020;) characters.

    Example (the title attribute is of datatype CDATA):

    <p title=" This is   a
    
        paragraph ">This is a paragraph.</p>
    

    is equivalent to:

    <p title="This is a paragraph">This is a paragraph.</p>
    

3.3.7 Unicode Space Characters and Related Topics

Basic Content Document authors are free to use all the Unicode characters in character data, except those disallowed by XML and this specification (refer to "Unicode in XML and other Markup Languages" for recommendations on the Unicode characters not suitable for use in XML, and related topics.) This flexibility allows for the richest content in Basic Content Documents, meeting nearly all international needs, but in certain situations will create a few complexities for Basic Content Document authors and user agent developers.

One of the more complex topics concerns the spacing characters used for inter-word separation. Because the concept of a “word” in most languages plays a fundamental role in various word-related operations, such as text searching, line breaking, etc., Basic Content Document authors and user agent developers need to understand how the Unicode space characters are used to enable inter-word separation, plus the related topics of line breaking (primarily for the purpose of visual presentation), and soft hyphens.

3.3.7.1 Unicode Space Characters

The Unicode Space Characters set (see Section 6.2 in the Unicode 4.1.0 specification) includes:

(For more details on these spacing characters, and other space-like characters, refer to Section 6.2 in the Unicode 4.1.0 standard. This specification does not specify how user agents are to exactly render these different space characters.)

3.3.7.2 Inter-word Separation

In this specification, user agents must treat any sequence of Unicode Space Characters and/or XML white space characters within character data as an inter-word separator.

3.3.7.3 Line Breaking

The related topic of line breaking is important for the purpose of visual rendering. This topic is covered in detail in the Unicode Standard Annex #14 Technical Report: Line Breaking Properties, which provides a comprehensive set of guidelines. User agents should follow, as closely as possible, the line break recommendations in this Unicode technical report.

In general, line breaking is allowed between words except where one or more no-break space characters are used between the words. The no-break space characters include:

  • No-break space (&#x00A0;, or &nbsp;, as defined in XHTML and in the OpenReader Character Entity References Common Set — this is the preferred character to use for no line breaking between words)

  • Figure space (&#x2007;)

  • Narrow no-break space (&#x202F;)

  • Zero width no-break space (&#xFEFF;)

User agents should not line break between two words separated by a sequence of one or more no-break space characters.

Basic Content Document authors should not use a no-break space character for any purpose other than indicating that no line break should occur between words.

[Informative Commentary] Like the deprecated br element, some content document authors inappropriately use a no-break space (primarily &nbsp;) to “pad” spacing in order to force a desired visual presentation, thereby working against the reflowability and adaptability of the content to various hardware, applications and end-user presentation settings.

Regarding line breaking within a word, user agents may do so per the allowance and the conventions of the language as detailed in the above referenced Unicode technical document on line breaking.

3.3.7.4 Soft Hyphen

Content document authors may insert within a word the “soft hyphen” character (&#x00AD; or &shy; as defined in XHTML and the OpenReader Character Entity References Common Set) to signal that the user agent may line break the word at that point.

In this specification, user agents must not render the soft hyphen character but may add the appropriate end-of-line character(s) (and other necessary text adjustments, depending upon language and conventions) for a line break placed after a soft hyphen. For all other purposes, such as word searching, user agents must ignore the soft hyphen character since it is technically not part of the word.

Note: The soft hyphen is not the same character as the plain hyphen (&#x002D;.) The plain hyphen character is considered a part of the word, and user agents must process it like any other character in the word.

Example of the use of a soft hyphen:

<p>A content document author may insert a soft hy&shy;phen within a word.</p>

Should a user agent, in presenting the contents to the end-user, line break before the word “Hyphen”, it will render the above example as follows:

A content document author may insert a soft
hyphen within a word.

If the user agent line breaks at the soft hyphen, it will render the above example as follows (using the common English language convention for hyphenation):

A content document author may insert a soft hy-
phen within a word.

4. Basic Content Document: Vocabulary Description

4.1 Vocabulary Components

This section describes the Basic Content Document vocabulary components. As noted in the Section 3 introduction, a Basic Content Document is a conforming XHTML 1.1 document with only a simple change in the DOCTYPE declaration. However, the full XHTML 1.1 Specification is not supported — only a selected subset of elements, attributes and attribute values is supported, along with various constraints, requirements and recommendations unique to this specification.

A conforming Basic Content Document must be valid to the Basic Content Document DTD.

4.1.1 Common Attribute Set

The Basic Content Document vocabulary, following XHTML 1.1, defines four [Common] attributes that may be applied to most elements. They are class, id, title, and xml:lang.

Because of their general importance, they are described in detail in this section. Two of the attribute descriptions, for class and id, specify constraints beyond what XHTML 1.1 allows.

4.1.1.1 class

The class attribute assigns one or more class names to an element; the element may be said to belong to these classes. A class name may be shared by several element instances in the same document. The class attribute is useful for finer description of document structure and content semantics, and allows for applying selector-based styles.

The value of class (of datatype NMTOKENS) must be a white space-separated list of class names (when more than one), and each class name (of datatype NMTOKEN) must:

  1. be an XML Name,

  2. not contain the “:” character,

  3. not start with the string “xml” (and all its case variants), since this is reserved in XML 1.0 for possible future standardization, and

  4. not start with the string “orp” (and all its case variants), since this is reserved for possible use in future versions of this specification.

Refer to the Note in Section 4.1.1.2.

4.1.1.2 id

The id attribute is used to give a unique identifier to an element. Its value must:

  1. be unique across all elements in a Basic Content Document,

  2. be an XML Name,

  3. not contain the “:” character,

  4. start with a Letter as defined in Appendix B of the XML 1.0 Specification — it cannot start with an underscore (“_”),

  5. not start with the string “xml” (and all its case variants), since this is reserved in XML 1.0 for possible future standardization, and

  6. not start with the string “orp” (and all its case variants), since this is reserved for possible use in future versions of this specification.

When compatibility with HTML 4 is desired, content document authors should restrict all characters in id to the Unicode Basic Latin script, with the further constraint that the first character must be a letter ([A–Za–z]), and the remaining characters, if any, may be any combination of letters, digits ([0–9]), hyphens (“-”), underscores (“_”), and periods (“.”).

Note: It is recommended that the id attribute only be used to identify individual elements for the purpose of linking/addressing/etc., and the class attribute only be used to identify specific document structure and content semantics (primarily for the purposes of presentation and document transformation.) Future versions of this and related content document specifications may elevate this recommendation to a requirement.

As explained in Section 4.3, it is recommended that content authors add unique identifiers, using the id attribute, to all Block level and the important Inline level elements in a Basic Content Document.

4.1.1.3 title

The title attribute may be used to provide an “advisory title/amplification” for the element. The attribute value of title is datatype text (CDATA); for the allowed Unicode character range, refer to Section 4.2.10.

It is recommended that user agents render, on demand, the value of the title attribute; for further information refer to the commentary for this attribute in the HTML 4.01 specification.

4.1.1.4 xml:lang

The xml:lang attribute, specially defined in XML 1.0, may be used to specify the language of the contained content and other attribute values.

The value of xml:lang must comply with RFC 3066, or its successor on the IETF Standards Track. Thus, the value will also conform to the separate requirement that xml:lang be an XML Name. (Language Codes) (Country Codes)

While xml:lang is optional for most elements, and is to be used consistent with XML 1.0, it is required for the root element html, and in this specification serves the additional purpose of setting the primary language of the Basic Content Document.

Note: It is recommended that content document authors apply the xml:lang attribute to any content which is of a different language or country code from that specified in html.

Example of this recommendation:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US">
...
<p>On June 26, 1963, in West Berlin, President John F. Kennedy
uttered the now-famous phrase: <span xml:lang="de-DE">Ich bin
ein Berliner.</span></p>
...
</html>

4.1.2 Supported Elements and Attributes

The table in this section lists (in alphabetical order by element) all the supported XHTML 1.1 elements and attributes in the Basic Content Document vocabulary. This table is non-normative — the Basic Content Document DTD, Version 1.0, is the normative vocabulary reference with respect to:

  • allowed elements, attributes and attribute values, and

  • element content model.

Each element in the “Element” column is linked to the corresponding general description in the HTML 4.01 specification, which provides a good overview of the purpose and use of the element and associated attributes. (As of the date of this specification, the XHTML specifications still refer to HTML 4.01 for descriptive details on elements and attributes.) The “May Contain” column summarizes the children elements and/or PCDATA (“parsed character data”) each element may, and in a few cases must, contain (i.e., a content model summary.) Within the “Supported Attributes” column, [Common] refers to the four common attributes described in Section 4.1.1. Other links in the table address particular constraints and requirements.

Document authors should find this table, and the many links it provides, a useful resource.

Element

Short Description

Supported Attributes

Document Structure Level

May Contain

a

Anchor (deprecated)

[Common], href

Inline

PCDATA; [Inline] (except a)

abbr

Abbreviation

[Common]

Inline

PCDATA; [Inline]

address

Address

[Common]

Block

PCDATA; [Inline]

area

Client-Side Image Map Area

[Common], alt, coords, href, nohref, shape

Miscellaneous

[Empty]

blockquote

Long Quotation

[Common], cite

Block

[Block]

body

Document Body

[Common]

Top

[Block]

br

Forced Line Break (deprecated)

[Common]

Inline

[Empty]

caption

Table Caption

[Common]

Table

PCDATA; [Inline]

cite

Citation

[Common]

Inline

PCDATA; [Inline]

code

Computer Code Fragment

[Common]

Inline

PCDATA; [Inline]

col

Table Column

[Common], span

Table

[Empty]

colgroup

Table Column Group

[Common], span

Table

col

dd

Definition Description

[Common]

List

[Block] or PCDATA; [Inline]

del

Deleted Text

[Common]

Inline

PCDATA; [Inline]

dfn

Instance Definition

[Common]

Inline

PCDATA; [Inline]

div

Generic Block Level Container

[Common]

Block

[Block] or PCDATA; [Inline]

dl

Definition List

[Common]

Block (List)

dt, dd pairs

dt

Definition Term

[Common]

List

PCDATA; [Inline]

em

Emphasis

[Common]

Inline

PCDATA; [Inline]

h1 to h6

Heading

[Common]

Block

PCDATA; [Inline]

head

Document Head

xml:lang

Top

title, meta

hr

Horizontal Rule

[Common]

Block

[Empty]

html

Document Root Element

xmlns, xml:lang

Top (Document Root)

head, body

img

Embedded Image (deprecated)

[Common], alt, src, usemap

Inline

[Empty]

ins

Inserted Text

[Common]

Inline

PCDATA; [Inline]

kbd

Text Entered by the User

[Common]

Inline

PCDATA; [Inline]

li

List Item (empty content rule)

[Common]

List

[Block] or PCDATA; [Inline]

map

Client-Side Image Map

[Common] (id is required)

Inline

area

meta

Generic Metadata Information

content, name, scheme, xml:lang

Head

[Empty]

ol

Ordered List

[Common]

Block (List)

li

p

Paragraph

[Common]

Block

PCDATA; [Inline]

pre

Preformatted Text

[Common], xml:space

Block

PCDATA; [Inline] (except img, sub, sup)

q

Inline Quotation

[Common], cite

Inline

PCDATA; [Inline]

samp

Program, Script, and Similar Output

[Common]

Inline

PCDATA; [Inline]

span

Generic Inline Level Container

[Common]

Inline

PCDATA; [Inline]

strong

Strong Emphasis

[Common]

Inline

PCDATA; [Inline]

sub

Subscript

[Common]

Inline

PCDATA; [Inline]

sup

Superscript

[Common]

Inline

PCDATA; [Inline]

table

Table

[Common], summary

Block (Table)

caption; col; colgroup; tbody; thead; tfoot; tr

tbody

Table Body

[Common]

Table

tr

td

Table Data Cell (empty content rule)

[Common], abbr, colspan, rowspan

Table

[Block] or PCDATA; [Inline]

tfoot

Table Footer

[Common]

Table

tr

th

Table Header Cell (empty content rule)

[Common], abbr, colspan, rowspan

Table

[Block] or PCDATA; [Inline]

thead

Table Header

[Common]

Table

tr

title

Document Title

xml:lang

Head

PCDATA

tr

Table Row

[Common]

Table

td; th

ul

Unordered List

[Common]

Block (List)

li

var

Instance of a Variable or Program Argument

[Common]

Inline

PCDATA; [Inline]

4.1.3 “Mnemonic” Character Entity References

The Basic Content Document DTD declares the 253 character entity references specified in the Character Entity References Common Set Specification, Version 1.0. These character entity references are identical to those supported in XHTML 1.1 (which, in turn, are inherited from HTML 4.01.) They include the five XML predefined character entity references (see Section 3.3.2.)

Basic Content Document authors may use these “mnemonic” character entities instead of the equivalent numeric character references, as explained in Section 3.3.2. User agents must recognize these character entities.

Example using numeric character references:

<h2>Jane&#x2019;s AT&#x0026;T R&#x00E9;sum&#x00E9;</h2>

The same example using “mnemonic” character entity references:

<h2>Jane&rsquo;s AT&amp;T R&eacute;sum&eacute;</h2>

Both the above examples will render as:

Jane’s AT&T Résumé

Future versions of this and related content document specifications may support an expanded common set of “mnemonic” character entity references derived from other document markup vocabularies such as TEI and DocBook.

4.2 Vocabulary Constraints, Requirements, and Recommendations

Besides supporting a carefully crafted subset of XHTML 1.1 elements, attributes, and attribute values (detailed in Sections 4.1.1 and 4.1.2), this specification places several constraints, beyond what XHTML 1.1 allows, on the usage of these vocabulary components. This section details the constraints (and requirements) not given elsewhere in this specification.

This “catch-all” section also includes several recommendations and important comments directed to content document authors and user agent developers.

4.2.1 a Element (deprecated)

The deprecated a element must not contain another a to any depth of nesting. This restriction is inherited from XHTML 1.0 Element Prohibitions, and cannot be enforced by a DTD.

Note: The a element is deprecated since support for it might be removed in a future version of this specification. Should support for a be removed in the future, its functionality will be replaced by another mechanism, such as one based on XLink, and/or based on XHTML 2.0 when it becomes a W3C Recommendation. Content document authors should certainly use the a element when needed — this is simply an advisory message to content document authors and developers as to possible future developments.

4.2.2 br Element (deprecated)

The br element is deprecated in this specification, and will likely be removed in a future version. XHTML 2.0 is planning to remove this element from its vocabulary, to be replaced by the Inline (i.e., text level) l element, which represents a semantic line of text within a block of text, such as a paragraph.

Content document authors should avoid using the br element whenever possible.

[Informative Commentary] In many situations content document authors can avoid using the deprecated br element by properly marking up the underlying document structure; the class attribute may be used for such “fine tuning.” A well-known problem from web page authoring with the br element is that many authors use it to “force” a desired visual presentation, thereby working against the reflowability and adaptability of the content to various hardware, applications and end-user presentation settings.

(Also refer to Section 4.2.8 on empty element syntax.)

4.2.3 “Mixed Flow” Elements (dd, div, li, td, th)

In XHTML 1.1, the DTD content model for each of the five elements dd, div, li, td, and th is “mixed flow”, meaning that an instance may simultaneously contain text (PCDATA), Inline elements and Block elements in any arbitrary order. Unfortunately, the XML 1.0 specification provides no mechanism to allow DTDs to further constrain such “mixed flow” content models.

It is generally considered very poor practice to randomly mix Block elements with PCDATA/Inline elements, and sometimes leads to ambiguities as to how user agents are to render such “mixed flow” markup. Thus, for conformance to this specification, every instance of use of these five elements must either contain (as children) only Block level elements, or contain only PCDATA/Inline elements.

Furthermore, it is recommended that these elements (excluding div) contain only Block level elements (thus having a content model similar to blockquote.) A future version of this specification may elevate this recommendation to a requirement.

Examples of permitted usage:

<-- PCDATA/Inline usage -->
<ul>
   <li>Here is a <em>list</em> item.</li>
</ul>
<-- Block level usage (recommended for <dd>, <li>, <td> and <th>) -->
<ul>
   <li>
      <p>Here is the <em>first paragraph</em> in a list item.</p>
      <p>Here is the <em>second paragraph</em> in a list item.</p>
   </li>
   ...
   <li><div>Here is a <em>different</em> list item.</div></li>
</ul>

Example of not permitted usage containing mixed Block, Inline and PCDATA:

<!-- Mixed usage not permitted in this specification -->
<ul>
   <li>
      <p>Here is the <em>first paragraph</em> in a list item.</p>
         Here is some <em>random</em> text placed in-between.
      <p>Here is the <em>second paragraph</em> in a list item.</p>
   </li>
</ul>

4.2.4 html Element

As noted in Section 3.1, for the required root element html, the attributes xmlns and xml:lang (for specifying the primary language of the Basic Content Document) are both required. In addition, the attribute xmlns must have the value “http://www.w3.org/1999/xhtml”.

Example where the primary language of the document is U.S. English:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US">

For further information on xml:lang, refer to Section 4.1.1.4.

4.2.5 img Element (deprecated)

The deprecated inline element img is used to embed graphics images in Basic Content Documents. It must not be used to embed multimedia such as audio and video.

The required IRI attribute, src, is used to reference the image resource.

For improved accessibility, the required alt attribute should contain a brief and informative textual description of the image (for datatype information see Section 4.2.10.) This text is an acceptable fallback when the user agent cannot, for whatever reason, properly display the image — or for non-visual presentation such as text-to-speech.

Content document authors should include the title attribute in img, and its value may be the same as that used for the required alt attribute. If the title attribute is not present, user agents should assign it, giving it the value found in the alt attribute. (For more information on the title attribute, and recommended user agent processing of this attribute, refer to Section 4.1.1.3.)

Note: The img element is deprecated since support for it might be removed in a future version of this specification. Should support for img be removed in the future, its functionality will be replaced by another mechanism, such as one based on XLink, and/or based on XHTML 2.0 when it becomes a W3C Recommendation. Content document authors should certainly use the img element when needed — this is simply an advisory message to content document authors and developers as to possible future developments.

(Also refer to Section 4.2.8 on empty element syntax.)

4.2.6 pre Element

The pre element must not contain the elements img, sub, or sup to any depth of nesting. This restriction is inherited from XHTML 1.0 Element Prohibitions, and cannot be enforced by a DTD.

The attribute xml:space is FIXED. If this attribute is used, it must have the value “preserve”.

4.2.7 Header Elements: head, title, and meta

The Basic Content Document header section, defined by the required head element, is used to include metadata about the Basic Content Document. Despite the requirement for a header section in a Basic Content Document (necessary for XHTML 1.1 and OEBPS 1.2 conformance), user agents are not required to use any of the header metadata information.

The head element must contain exactly one title element, and may include any number of meta elements. The meta elements, if present, must follow the title element.

Like head, the title element is required for conformance with XHTML 1.1 and OEBPS 1.2. Its purpose is to contain a descriptive title of the Basic Content Document itself. It should not be used to specify the title for the publication of which the content document is a component.

The optional meta empty element may be used to specify additional metadata about the Basic Content Document. Like title, the meta element should not be used to specify metadata for the publication of which the content document is a component.

Each use of the meta element must conform to one of the following:

  1. Dublin Core recommendations for embedding metadata in XHTML 1.1. Example:

    <meta name="DC.title" content="Chapter One" />
    
  2. The value of the required name attribute starts with “x-” for any metadata system other than the aforementioned Dublin Core system, such as a “homegrown” system devised by the content document author. Example:

    <meta name="x-markuptechnician" content="John Doe"/>
    

For the meta element, the content attribute is required. Its value is of datatype CDATA (see Section 4.2.10 for allowed characters.) The required name attribute must follow the same character restrictions given for the value of the class common attribute (see Section 4.1.1.1.)

User agents are not required to use the metadata information in the meta empty element.

(Refer to Section 4.2.8 on empty element syntax for the meta element.)

4.2.8 Empty Elements: area, br, col, hr, img, and meta

The elements area, br, col, hr, img, and meta are declared empty in the Basic Content Document DTD. These elements must not contain any content and must use the empty-element syntax (also known as “minimized form”) for declared empty elements as specified in XML 1.0. (Also refer to Section 3.3.5 in this specification.)

Note: In using the empty-element syntax, when compatibility with HTML 4 is desired, content document authors should include at least one space before the terminal “/”.

Examples of correct usage of declared empty element syntax:

<hr/>
<hr />
<img src="myimage.png" alt="A cool image" />
<meta name="x-markuptechnician" content="John Doe"/>

4.2.9 Empty Content

When non-empty elements (elements which may contain content), contain no content or only a sequence of one or more white space characters, this occurence is referred to as “empty content.” (For more information, refer to Section 3.3.5 in this specification.)

Examples of empty content:


<p></p>
<p>   </p>
<li></li>
<td> </td>
<span class="pagemarker" id="page121" title="Page 121"></span>

In this specification, the empty-element syntax (see Sections 3.3.5 and 4.2.8) must not be used for empty content. For example:

<p/>       <!-- not allowed! -->
<p></p>    <!-- correct usage -->

How empty content is visually presented by user agents largely depends upon the style language rules and settings.

For example, in CSS 2.1, empty content table cells (td and th) and list items (li) automatically generate the appropriate placeholder boxes — they are significant content-wise even when empty. (It is not uncommon for a table to intentionally have one or more empty cells.)

With typical default CSS 2.1 style sheet settings, empty content Block and Inline elements (such as p and span) will visually collapse to nothing. Unlike table cells and list items, they are not considered significant content-wise, but may serve non-content purposes such as for linking and the placing of special “mile markers” within the markup of the content document (see the span example above.)

Content authors should not use empty content Block and Inline elements solely for the purpose of forcing a particular visual presentation.

4.2.10 Attributes of Datatype CDATA (Text): abbr, alt, content, scheme, summary, title

Six attributes (abbr, alt, content, scheme, summary and title) are of datatype CDATA.

The allowed Unicode character range for attribute values of datatype CDATA is given in XML 1.0, with the following constraints:

  1. The literal < character must not appear. If this character is to be used literally, it must be escaped.

  2. The literal & character must not appear except as part of a predefined or declared character entity reference. If this character is to be used literally (not part of an entity reference), it must be escaped.

  3. The literal " and ' characters must not appear when they match the attribute delimiter quote marks; they must be escaped. It is recommended that both of these characters always be escaped when they appear in attribute values.

Three of these attributes are further described elsewhere:

4.2.11 Attributes With IRI References (cite, href, src)

Three attributes (cite, href and src) reference resources using IRI values (Internationalized Resource Identifiers, RFC 3987). IRI and URI (RFC 3986) are similar (a URI is an IRI), but IRI is designed to support nearly all of the Unicode character set beyond Basic Latin without the need for (essentially unreadable and wasteful) percent-encoded octets.

Note: This specification does not define the allowed IRI schemes. The allowed schemes, and usage details, are to be defined by publication frameworks which reference this specification.

The src attribute for the img element is further described in Section 4.2.5.

4.3 Authoring Recommendations

The authoring of Basic Content Documents is similar in some ways to web page authoring since both are based on HTML. The various requirements in this specification are intended to significantly tighten up content document markup to achieve several aims which mutually benefit publishers, user agent developers, end-users (readers, librarians and archivists, etc.), and others. Some of these aims (relative to general web page authoring) include:

  • Improved accessibility,

  • Increased cross-platform uniformity of presentation, and

  • Enhanced document structure for presentation and non-presentation purposes.

Nevertheless, the general flexibility of the XHTML-based Basic Content Document vocabulary unavoidably allows certain poor markup practices that work against these and other aims.

To create high-quality Basic Content Documents that better meet the aims of this specification, content document authors should follow:

Probably the most abused HTML web authoring practice is using tables for document layout. Table markup should only be used for representing tabular data. The practice of using tables for layout makes documents which are presentationally inflexible across various user agent platforms (poorly adaptable), more difficult to author, maintain and style, and are, in general, quite inaccessible. (For more information on this topic, refer to WCAG Guidelines 3 and 5.)

Authors should also properly structure documents, as outlined in the WCAG companion document Core Techniques for Web Content Accessibility Guidelines, Section 1. For example, use the header elements, h1 to h6, in appropriate fashion to indicate the hierarchical level and the title of a section in a document. (h2 is usually used for the top-level division in a document, such as a book chapter, while h1 is reserved for the title of a publication.)

To aid third-party linking to content documents, content document authors should apply unique identifiers (using the id attribute, see Section 4.1.1.2) to all Block level elements and to, at least, the more important Inline level elements.

4.4 Why Certain XHTML 1.1 Elements and Attributes Are Not Supported (Non-Normative)

The Basic Content Document vocabulary supports only a subset of all available XHTML 1.1 elements and attributes, and deprecates certain elements. This section briefly explains the reasons, with several examples.

In general, the reasons can be distilled into the following:

  • Publication Framework. Most non-content related functions are moved to the overarching publication framework (such as the OpenReader Publication Framework Specification) which support Basic Content Documents. That is, Basic Content Documents are not intended as standalone publications, but rather are a component of a publication framework.

  • XHTML 2.0 Conformance. It is planned to migrate conformance of the Basic Content Document vocabulary to the XHTML 2.0 specification once it becomes a W3C Recommendation. For the elements and attributes in XHTML 1.1 which will likely not be supported in XHTML 2.0, they are either not supported or are deprecated in this specification.

  • Controlled Expansion. It is important in any commercial publication framework open standard to have careful control over user agent conformance to achieve the goal of cross-application (and cross-platform) uniformity in processing and presentation — to avoid “proprietization” of the framework. This includes a careful, thoughtful, and controlled expansion over time of vocabulary support for certain features and functionality.

  • Structural Markup. It is important, for various reasons benefiting content document authors, that the markup in Basic Content Documents define only the document structure and content semantics (refer to Section 4.3.) Presentationally-oriented markup should not be supported. Styling for presentation is to be applied at the framework level.

Following are the more well-known XHTML 1.1 vocabulary components not supported in this specification, including the general reasons for exclusion:

  • Style Related Components ( <style>, <link>, and the style attribute): Structural Markup; Publication Framework

  • <base>: Publication Framework (resource pathnames defined at the framework level)

  • <i>, <b>: XHTML 2.0 Conformance; Structural Markup (use instead <em> and <strong>)

  • <object>: Controlled Expansion (it has not yet been decided whether to use some other mechanism, such as the vocabulary-independent XLink, to embed multimedia resources.)

  • <script>, <noscript>: Controlled Expansion

  • Forms-related: Controlled Expansion; XHTML 2.0 Conformance (XHTML 2.0 is currently planning to support XForms)

5. Default CSS Style Sheets

Although this specification does not mandate any particular styling language for the visual presentation of Basic Content Documents, many frameworks using this specification, such as the OpenReader Publication Framework Specification, will specify CSS 2.1 (or its successor) for the styling language.

For visual presentation using CSS 2.1, user agents first apply a default style sheet to the Basic Content Document. Next, publisher supplied style sheets (if any) are applied. Finally, any end-user supplied style sheets (or the equivalent), are applied. (This is a simplified, non-normative summary of style sheet cascading — for normative details on CSS 2.1 cascading refer to Section 6.4 in the CSS 2.1 Specification.)

User agents which render Basic Content Documents using CSS 2.1 should use one of the following default CSS style sheets:

For more book-like typographical presentation, the Enhanced default style sheet is preferred.

As necessary, user agents may deviate from the default CSS 2.1 style sheet to optimally adapt presentation to the limitations and peculiarities of particular platforms.

Nevertheless, to assure predictable and reasonably uniform rendering across platforms, user agents should strive, in good faith, to conform their default CSS 2.1 style sheet as closely as possible to one of the two recommended in this specification, preferably the Enhanced version.

6 Example Basic Content Documents

Examples of conforming Basic Content Documents are located in the examples directory at the OpenReader Consortium web site.

7. Tentative Future Plans (Non-Normative)

Like any specification, the Basic Content Document specification will evolve to meet ever-changing needs, unforeseen developments, and new opportunities. This non-normative section details the tentative current plans (as of the date of release of this specification) of the OpenReader Publication Working Group regarding future versions of this and sibling content document specifications. Although these plans are tentative, and subject to change, content document authors and user agent developers may find this section useful for future planning.

Additions and changes to this specification will be implemented in a careful, thoughtful and controlled fashion to maintain stability, compatibility, and conformant usage. Thus, an implementation timetable for additions and changes cannot be given.

Appendix A: The Basic Content Document DTD, Version 1.0

The URL to the normative Basic Content Document DTD, Version 1.0, is http://openreader.org/dtd/bcd10.dtd .

This DTD includes the Character Entity References Common Set, Version 1.0, by parameter-entity reference to http://openreader.org/dtd/ent10.ent . (Refer to Section 4.1.3.)

The Basic Content Document DTD, Version 1.0, is reproduced below (non-normative):

<!--

Title:      Basic Content Document DTD
Version:    1.0
Date:       27 February 2006
DTD-URL:    http://openreader.org/dtd/bcd10.dtd
Reference:  Basic Content Document Specification, Version 1.0
RefURL:     http://openreader.org/spec/bcd10.html
Rights:     Copyright 2006 OpenReader Consortium. All rights reserved.

Contributors:

     Jon Noring (editor)

Summary:

     This DTD is a pure subset of XHTML 1.1. Any document validating
     to this DTD will also validate to the XHTML 1.1 DTD (with the
     appropriate changes made in the DOCTYPE declaration.)

     The XHTML 1.1 DTD is located at

        http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd

Usage:

     <?xml version="1.0" encoding="UTF-8" ?>
     <!DOCTYPE html PUBLIC
          "-//OpenReader//DTD Basic Content Document 1.0//EN"
          "http://openreader.org/dtd/bcd10.dtd">
     <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US">
     <head>
        <title> ... required document title goes here ... </title>
        <!== optional <meta> elements go here ==>
     </head>
     <body>
        <!== body content goes here ==>
     </body>
     </html>


     Refer to the Basic Content Document Specification, Version 1.0
     for further information on usage.

-->


<!-- *************************************************** -->

<!-- GENERAL NOTATIONS ................................. -->

<!-- W3C XML 1.0 Recommendation -->

<!NOTATION w3c-xml
     PUBLIC "ISO 8879//NOTATION Extensible Markup Language (XML) 1.0//EN">

<!-- XML 1.0 CDATA -->

<!NOTATION cdata
     PUBLIC "-//W3C//NOTATION XML 1.0: CDATA//EN">

<!-- *************************************************** -->

<!-- ENTITIES WITH DATATYPE NOTATIONS .................. -->

<!-- Language code, as per [RFC 3066] -->

<!NOTATION languageCode
    PUBLIC "-//W3C//NOTATION XHTML Datatype: LanguageCode//EN">
<!ENTITY % LanguageCode.datatype "NMTOKEN">

<!-- nn for pixels or nn% for percentage length -->

<!NOTATION length
    PUBLIC "-//W3C//NOTATION XHTML Datatype: Length//EN">
<!ENTITY % Length.datatype "CDATA">

<!-- One or more digits (NUMBER) -->

<!NOTATION number
    PUBLIC "-//W3C//NOTATION XHTML Datatype: Number//EN">
<!ENTITY % Number.datatype "CDATA">

<!-- Textual content -->

<!NOTATION text
    PUBLIC "-//W3C//NOTATION XHTML Datatype: Text//EN">
<!ENTITY % Text.datatype "CDATA">

<!-- Internationalized Resource Identifiers (see RFC 3987) -->

<!NOTATION uri
    PUBLIC "-//W3C//NOTATION XHTML Datatype: URI//EN">
<!ENTITY % URI.datatype "CDATA">

<!-- *************************************************** -->

<!-- ELEMENT ENTITIES .................................. -->

<!ENTITY % Block.class
     "address | blockquote | div | dl | h1 | h2 | h3 | h4 |
      h5 | h6 | hr | ol | p | pre | table | ul">

<!ENTITY % Inline.class
     "a | abbr | br | cite | code | del | dfn | em | img |
      ins | kbd | map | q | samp | span | strong | sub |
      sup | var">

<!-- *************************************************** -->

<!-- COMMON ATTRIBUTE ENTITIES ......................... -->

<!ENTITY % Common.attrib
     "class        NMTOKENS                 #IMPLIED
      id           ID                       #IMPLIED
      title        %Text.datatype;          #IMPLIED
      xml:lang     %LanguageCode.datatype;  #IMPLIED">

<!-- *************************************************** -->

<!-- COMMA SEPARATED LIST OF COORDINATE LENGTHS ........ -->

<!ENTITY % Coords.datatype "CDATA" >

<!-- *************************************************** -->

<!-- CHARACTER ENTITY REFERENCES COMMON SET 1.0 ........ -->

<!ENTITY % ORPCharEnt
     PUBLIC "-//OpenReader//DTD Character Entities 1.0//EN"
     "http://openreader.org/dtd/ent10.ent">

%ORPCharEnt;

<!-- *************************************************** -->

<!-- ELEMENTS AND ATTRIBUTES ........................... -->

<!-- TOP LEVEL STRUCTURE ............................... -->

<!ELEMENT html (head, body)>
<!ATTLIST html
      xml:lang     %LanguageCode.datatype;  #REQUIRED
      xmlns        %URI.datatype;           #REQUIRED>

<!ELEMENT head (title, meta*)>
<!ATTLIST head
      xml:lang     %LanguageCode.datatype;  #IMPLIED>

<!ELEMENT body (%Block.class;)+>
<!ATTLIST body %Common.attrib;>

<!-- HEAD LEVEL ........................................ -->

<!ELEMENT title (#PCDATA)>
<!ATTLIST title
      xml:lang     %LanguageCode.datatype;  #IMPLIED>

<!ELEMENT meta EMPTY>
<!ATTLIST meta
      content      CDATA                    #REQUIRED
      name         NMTOKEN                  #REQUIRED
      scheme       CDATA                    #IMPLIED
      xml:lang     %LanguageCode.datatype;  #IMPLIED>

<!-- BLOCK LEVEL ....................................... -->

<!ELEMENT address (#PCDATA | %Inline.class;)*>
<!ATTLIST address %Common.attrib;>

<!ELEMENT blockquote (%Block.class;)+>
<!ATTLIST blockquote
      %Common.attrib;
      cite         %URI.datatype;           #IMPLIED>

<!ELEMENT div (#PCDATA | %Inline.class; | %Block.class;)*>
<!ATTLIST div %Common.attrib;>

<!ELEMENT dl (dt, dd)+>
<!ATTLIST dl %Common.attrib;>

<!ELEMENT h1 (#PCDATA | %Inline.class;)*>
<!ATTLIST h1 %Common.attrib;>

<!ELEMENT h2 (#PCDATA | %Inline.class;)*>
<!ATTLIST h2 %Common.attrib;>

<!ELEMENT h3 (#PCDATA | %Inline.class;)*>
<!ATTLIST h3 %Common.attrib;>

<!ELEMENT h4 (#PCDATA | %Inline.class;)*>
<!ATTLIST h4 %Common.attrib;>

<!ELEMENT h5 (#PCDATA | %Inline.class;)*>
<!ATTLIST h5 %Common.attrib;>

<!ELEMENT h6 (#PCDATA | %Inline.class;)*>
<!ATTLIST h6 %Common.attrib;>

<!ELEMENT hr EMPTY>
<!ATTLIST hr %Common.attrib;>

<!ELEMENT ol (li)+>
<!ATTLIST ol %Common.attrib;>

<!ELEMENT p (#PCDATA | %Inline.class;)*>
<!ATTLIST p %Common.attrib;>

<!ELEMENT pre
      (#PCDATA | a | abbr | br | cite | code | dfn | em |
       kbd | map | q | samp | span | strong | var)*>
<!ATTLIST pre
      %Common.attrib;
      xml:space    (preserve)               #FIXED "preserve">

<!ELEMENT table
      ( caption?, (col* | colgroup*),
      ( (thead?, tfoot?, tbody+) | (tr+) ) )>
<!ATTLIST table
      %Common.attrib;
      summary      %Text.datatype;          #IMPLIED>

<!ELEMENT ul (li)+>
<!ATTLIST ul %Common.attrib;>

<!-- INLINE LEVEL ...................................... -->

<!ELEMENT a (#PCDATA |
      abbr | br | cite | code | del | dfn | em | img | ins |
      kbd | map | q | samp | span | strong | sub | sup |
      var)*>
<!ATTLIST a
      %Common.attrib;
      href         %URI.datatype;           #REQUIRED>

<!ELEMENT abbr (#PCDATA | %Inline.class;)*>
<!ATTLIST abbr %Common.attrib;>

<!ELEMENT br EMPTY>
<!ATTLIST br %Common.attrib;>

<!ELEMENT cite (#PCDATA | %Inline.class;)*>
<!ATTLIST cite %Common.attrib;>

<!ELEMENT code (#PCDATA | %Inline.class;)*>
<!ATTLIST code %Common.attrib;>

<!ELEMENT del (#PCDATA | %Inline.class;)*>
<!ATTLIST del %Common.attrib;>

<!ELEMENT dfn (#PCDATA | %Inline.class;)*>
<!ATTLIST dfn %Common.attrib;>

<!ELEMENT em (#PCDATA | %Inline.class;)*>
<!ATTLIST em %Common.attrib;>

<!ELEMENT img EMPTY>
<!ATTLIST img
      %Common.attrib;
      alt          %Text.datatype;          #REQUIRED
      src          %URI.datatype;           #REQUIRED
      usemap       IDREF                    #IMPLIED>

<!ELEMENT ins (#PCDATA | %Inline.class;)*>
<!ATTLIST ins %Common.attrib;>

<!ELEMENT kbd (#PCDATA | %Inline.class;)*>
<!ATTLIST kbd %Common.attrib;>

<!ELEMENT map (area)+>
<!ATTLIST map
      class        NMTOKENS                 #IMPLIED
      id           ID                       #REQUIRED
      title        %Text.datatype;          #IMPLIED
      xml:lang     %LanguageCode.datatype;  #IMPLIED>

<!ELEMENT q (#PCDATA | %Inline.class;)*>
<!ATTLIST q
      %Common.attrib;
      cite         %URI.datatype;           #IMPLIED>

<!ELEMENT samp (#PCDATA | %Inline.class;)*>
<!ATTLIST samp %Common.attrib;>

<!ELEMENT span (#PCDATA | %Inline.class;)*>
<!ATTLIST span %Common.attrib;>

<!ELEMENT strong (#PCDATA | %Inline.class;)*>
<!ATTLIST strong %Common.attrib;>

<!ELEMENT sub (#PCDATA | %Inline.class;)*>
<!ATTLIST sub %Common.attrib;>

<!ELEMENT sup (#PCDATA | %Inline.class;)*>
<!ATTLIST sup %Common.attrib;>

<!ELEMENT var (#PCDATA | %Inline.class;)*>
<!ATTLIST var %Common.attrib;>

<!-- TABLE LEVEL ....................................... -->

<!ELEMENT caption (#PCDATA | %Inline.class;)*>
<!ATTLIST caption %Common.attrib;>

<!ELEMENT col EMPTY>
<!ATTLIST col
      %Common.attrib;
      span         %Number.datatype;        "1">

<!ELEMENT colgroup (col)*>
<!ATTLIST colgroup
      %Common.attrib;
      span         %Number.datatype;        "1">

<!ELEMENT tbody (tr)+>
<!ATTLIST tbody %Common.attrib;>

<!ELEMENT td (#PCDATA | %Inline.class; | %Block.class;)*>
<!ATTLIST td
      %Common.attrib;
      abbr         %Text.datatype;          #IMPLIED
      colspan      %Number.datatype;        "1"
      rowspan      %Number.datatype;        "1">

<!ELEMENT tfoot (tr)+>
<!ATTLIST tfoot %Common.attrib;>

<!ELEMENT th (#PCDATA | %Inline.class; | %Block.class;)*>
<!ATTLIST th
      %Common.attrib;
      abbr         %Text.datatype;          #IMPLIED
      colspan      %Number.datatype;        "1"
      rowspan      %Number.datatype;        "1">

<!ELEMENT thead (tr)+>
<!ATTLIST thead %Common.attrib;>

<!ELEMENT tr (th | td)+>
<!ATTLIST tr %Common.attrib;>

<!-- LIST LEVEL ........................................ -->

<!ELEMENT dd (#PCDATA | %Inline.class; | %Block.class;)*>
<!ATTLIST dd %Common.attrib;>

<!ELEMENT dt (#PCDATA | %Inline.class;)*>
<!ATTLIST dt %Common.attrib;>

<!ELEMENT li (#PCDATA | %Inline.class; | %Block.class;)*>
<!ATTLIST li %Common.attrib;>

<!-- MISCELLANEOUS ..................................... -->

<!ELEMENT area EMPTY>
<!ATTLIST area
      %Common.attrib;
      alt          %Text.datatype;          #REQUIRED
      coords       %Coords.datatype;        #IMPLIED
      href         %URI.datatype;           #IMPLIED
      nohref       (nohref)                 #IMPLIED
      shape        (rect | circle |
                    poly | default)         "rect">

Appendix B: Recommended Enhanced Default CSS Style Sheet

For more information, refer to Section 5.

/*

Recommended Enhanced Default CSS Style Sheet for the Basic Content
Document 1.0 Specification


This is the recommended Enhanced Default CSS Style Sheet for the Basic
Content Document 1.0 Specification. It closely conforms with the
default CSS style sheet (for HTML 4) presented in Appendix D of the
CSS 2.1 Specification: http://www.w3.org/TR/CSS21/sample.html

The significant enhancements include:

1. The <p> element has been default styled to more closely follow
   typical typographic practice for books: indentation with zero top
   and bottom margins. The exception is that there is no indentation
   for paragraphs immediately following any of the header elements, h1
   to h6, and when <p> occurs as the first child element contained
   within a Block level element such as <li>, <blockquote>, <div>,
   <td>, etc., where most conventions specify there be no indentation.

   For comparison, the web browser default CSS for <p> typically
   defaults to no indentation with a 1.33em (or so) top and bottom
   margin.

2. The <dt> element has been default styled to be bolder with the
   given non-zero top and bottom margins.


Note: All 53 elements defined in BCD 1.0 appear below.

*/



/* CSS Display Property Settings */

address, blockquote, body, dd,
div, dl, dt, h1, h2, h3, h4, h5,
h6, hr, html, ol, p, pre, ul        { display:         block }

a, abbr, br, cite, code, del, dfn,
em, img, ins, kbd, q, samp, span,
strong, sub, sup, var               { display:         inline }

/* (Note that "display: inline" is the CSS default, so the above is
    unnecessary. It is included here for completeness.) */

area, head, map, title, meta        { display:         none }

li                                  { display:         list-item }

table                               { display:         table }

tr                                  { display:         table-row }

thead                               { display:         table-header-group }

tbody                               { display:         table-row-group }

tfoot                               { display:         table-footer-group }

col                                 { display:         table-column }

colgroup                            { display:         table-column-group }

td, th                              { display:         table-cell }

caption                             { display:         table-caption }


/* General Block Element Styling */

body                                { padding:         8px;
                                      line-height:     1.2 }

h1                                  { font-size:       2.00em;
                                      margin:          0.67em 0.00em }

h2                                  { font-size:       1.50em;
                                      margin:          0.83em 0.00em }

h3                                  { font-size:       1.17em;
                                      margin:          1.00em 0.00em }

h4                                  { font-size:       1.00em;
                                      margin:          1.17em 0.00em }

h5                                  { font-size:       0.83em;
                                      margin:          1.33em 0.00em }

h6                                  { font-size:       0.75em;
                                      margin:          1.50em 0.00em }

h1, h2, h3, h4, h5, h6              { font-weight:     bolder }

p                                   { text-indent:     1.00em }

/* (Note that <p> is set with zero margins and a text indentation of
    1.00 em. This is different than the typical default for web
    browsers, which assign for <p> non-zero top and bottom margins,
    and zero indentation.) */

address, blockquote, dl, hr, ol,
pre, table, ul                      { margin:          1.20em 0.00em }

blockquote                          { margin-left:     2.00em;
                                      margin-right:    2.00em }

dd, ol, ul                          { margin-left:     2.00em }

address                             { font-style:      italic }

pre                                 { font-family:     monospace;
                                      white-space:     pre }

hr                                  { border:          1px inset }


/* General Inline Element Styling */

cite, em, var                       { font-style:      italic }

strong                              { font-weight:     bolder }

code, kbd, samp                     { font-family:     monospace }

del                                 { text-decoration: line-through }

ins                                 { text-decoration: underline }

sub, sup                            { font-size:       0.75em }

sub                                 { vertical-align:  sub }

sup                                 { vertical-align:  super }

br:before                           { content:         "\A";
                                      white-space:     pre-line }

/* Note that the value "pre-line" for the "white-space" property is
   not defined in CSS 2. It is defined in CSS 2.1. */


/* Special Styling For Table Elements */

table                               { border-spacing:  2px }

caption                             { text-align:      left;
                                      margin-bottom:   0.30em }

thead, tbody, tfoot                 { vertical-align:  middle }

td, th                              { vertical-align:  inherit }

td                                  { text-align:      left }

th                                  { font-weight:     bolder;
                                      text-align:      center }


/* Special Styling For Ordered and Unordered List Elements */

ul                                  { list-style-type: disc }

/* (Note that "list-style-type: disc" is the CSS default, so the above
    is unnecessary. It is included here for completeness.) */

ol                                  { list-style-type: decimal }


/* Special Styling For Definition List Elements */

dt                                  { font-weight:     bolder;
                                      margin-top:      1.20em;
                                      margin-bottom:   0.30em }

/* (Note <dt> has been enhanced from web-based default style sheets by
    being made bolder with non-zero top and bottom margins.) */


/* Special Styling For HTML Anchor/Hypertext Element <a> */

a                                   { text-decoration: underline }
a:link                              { color:           #0000FF }
a:link:hover                        { color:           #FF0000 }
a:visited                           { color:           #000088 }
a:visited:hover                     { color:           #FF0000 }


/* Styling Adjustment of <p> For Special Situations */

h1 + p, h2 + p, h3 + p, h4 + p,
h5 + p, h6 + p, p:first-child       { text-indent:     0.00em }

/* (Note: The above styling removes the indentation when <p> occurs
   right after a header, or is the first child of an allowed
   higher-level containing block, such as <blockquote>, <div>, <li>,
   <td>, etc. This follows the typical Western convention that the
   first paragraph in some definable section is not indented, while
   the remaining paragraphs are indented.) */