|
XML documents can contain foreign characters like Norwegian æøå,
or french êèé.
To let your XML parser understand these characters, you should
save your XML documents as Unicode.
Windows 95/98 Notepad
Windows 95/98 Notepad cannot save files in Unicode format.
You can use Notepad to edit and save XML documents that contain
foreign characters (like Norwegian or French æøå and êèé),
<?xml version="1.0"?>
<note>
<from>Jani</from>
<to>Tove</to>
<message>Norwegian: æøå. French: êèé</message>
</note>
But if you save the file and open it
with IE 5.0, you will get an ERROR MESSAGE.
Windows 95/98 Notepad with Encoding
Windows 95/98 Notepad files must be saved with an encoding
attribute.
To avoid this error you can add an encoding attribute to your
XML declaration, but you cannot use Unicode.
This encoding (open
it with IE 5.0), will NOT give an error message:
<?xml version="1.0" encoding="windows-1252"?>
This encoding (open
it with IE 5.0), will NOT give an error message:
<?xml version="1.0" encoding="ISO-8859-1"?>
This encoding (open
it with IE 5.0), WILL give an error message:
<?xml version="1.0" encoding="UTF-8"?>
This encoding (open
it with IE 5.0), WILL give an error message:
<?xml version="1.0" encoding="UTF-16"?>
Windows 2000 Notepad
Windows 2000 Notepad can save files as Unicode.
The Notepad editor in Windows 2000 supports Unicode. If you
select to save this XML file as Unicode (note that the document
does not contain any encoding attribute):
<?xml version="1.0"?>
<note>
<from>Jani</from>
<to>Tove</to>
<message>Norwegian: æøå. French: êèé</message>
</note>
you can open it with IE 5.0, WITHOUT getting an error
message.
Windows 2000 Notepad with Encoding
Windows 2000 Notepad files saved as Unicode use "UTF-16"
encoding.
If you add an encoding attribute to XML files saved as Unicode,
windows encoding values will generate an error.
This encoding (open
it with IE 5.0), WILL give an error message:
<?xml version="1.0" encoding="windows-1252"?>
This encoding (open
it with IE 5.0), WILL give an error message:
<?xml version="1.0" encoding="ISO-8859-1"?>
This encoding (open
it with IE 5.0), WILL give an error message:
<?xml version="1.0" encoding="UTF-8"?>
This encoding (open
it with IE 5.0), will NOT give an error message:
<?xml version="1.0" encoding="UTF-16"?>
Error Messages
If you try to load an XML document into Internet Explorer 5, you
can get two different errors indicating encoding problems:
An invalid character was found in text content.
You will get this error message if a character in the XML
document does not match the encoding attribute. Normally you
will get this error message if your XML document contains
"foreign" characters, and the file was saved with a single-byte
encoding editor like Notepad, and no encoding attribute was
specified.
Switch from current encoding to specified encoding not
supported.
You will get this error message if your file was saved as
Unicode/UTF-16 but the encoding attribute specified a
single-byte encoding like Windows-1252, ISO-8859-1 or UTF-8. You
can also get this error message if your document was saved with
single-byte encoding, but the encoding attribute specified a
double-byte encoding like UTF-16.
Conclusion
The conclusion is that the encoding attribute has to specify the
encoding used when the document was saved. My best advice to
avoid errors is this:
Always save XML files as Unicode, without any encoding
information.
Use an editor that supports Unicode (Windows 2000 Notepad does)
and always skip the encoding attribute.
|