|
The Syntax rules of XML are very simple and very strict. The
rules are very easy to learn, and very easy to use.
Because of this, creating software that can read and manipulate
XML is very easy to do.
An example XML document
XML documents use a self-describing and simple syntax.
<?xml version="1.0"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
The first line in the document - the XML declaration - defines
the XML version of the document. In this case the document
conforms to the 1.0 specification of XML.
The next line describes the root element of the document (like
it was saying: "this document is a note"):
<note>
The next 4 lines describe 4 child elements of the root (to,
from, heading, and body):
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
And finally the last line defines the end of the root element:
</note>
Can you detect from this example that the XML document contains
a Note to Tove from Jani? Don't you agree that XML is pretty
self describing?
All XML elements must have a closing tag
With XML, it is illegal to omit the closing tag.
In HTML some elements do not have to have a closing tag. The
following code is legal in
HTML:
<p>This is a paragraph <p>This is another paragraph
In XML all elements must have a closing tag like this:
<p>This is a paragraph</p> <p>This is another paragraph</p>
Note : You might have noticed from the
previous example that the XML declaration did not have a closing
tag. This is not an error. The declaration is not a part of the
XML document itself. It is not an XML element, and it should not
have a closing tag.
XML tags are case sensitive
Unlike HTML, XML tags are case sensitive.
With XML, the tag <Letter> is different from the tag <letter>.
Opening and closing tags must therefore be written with the same
case:
<Message>This is incorrect</message>
<message>This is correct</message>
All XML elements must be properly nested
Improper nesting of tags make no sense to XML.
In HTML some elements can be improperly nested within each other
like this:
<b><i>This text is bold and italic</b></i>
In XML all elements must be properly nested within each other
like this:
<b><i>This text is bold and italic</i></b>
All XML documents must have a root tag
The first tag in an XML document is the root tag.
All XML documents must contain a single tag pair to define the
root element. All other elements must be nested within the root
element.
All elements can have sub elements (children). Sub elements must
be correctly nested within their parent element:
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
Attribute values must always be quoted
With XML, it is illegal to omit quotation marks around attribute
values.
XML elements can have attributes in name/value pairs just like
in HTML. In XML the attribute value must always be quoted. Study
the two XML documents below. The first one is incorrect, the
second is correct:
<?xml version="1.0"?>
<note date=12/11/99>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
<?xml version="1.0"?>
<note date="12/11/99">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
The error in the first document is that the date attribute in
the note element is not quoted.
This is correct: date="12/11/99". This is incorrect:
date=12/11/99.
With XML, White Space is Conserved
With XML, the white space in your document is not truncated .
This is unlike HTML. With HTML, a sentence like this: Hello my
name is Tove, will be displayed like this: Hello my name is Tove,
because HTML strips off the white space.
With XML, CR / LF is converted to LF
With XML, a new line is always stored as LF .
Do you know what a typewriter is?. Well, a typewriter is a type
of mechanical device they used in the previous century :-)
After you have typed one line of text on a typewriter, you have
to manually return the printing carriage to the left margin
position and manually feed the paper up one line.
In Windows applications, a new line in the text is normally
stored as a pair of CR LF (carriage return, line feed)
characters. In Unix applications, a new line is normally stored
as a LF character. Some applications use only a CR character to
store a new line.
There is nothing Special about XML
There is nothing special about XML. It is just plain text with
the addition of some XML tags enclosed in angle brackets.
Software that can handle plain text can also handle XML. In a
simple text editor, the XML tags will be visible and will not be
handled specially.
In an XML aware application however, the XML tags can be handled
specially. The tags may or may not be visible, or have a
functional meaning, depending on the nature of the application.
|