Open Directory Site XML Tutorials

ASP | XML | VBScript | JavaScript | ADO | CSS | XMLDOM | PHP | Operating Systems

Home >> XML >>XML Encoding

XML Encoding

 

XML documents can contain foreign characters like Norwegian æøå, or french êèé.

To let your XML parser understand these characters, you should save your XML documents as Unicode.

Windows 95/98 Notepad

Windows 95/98 Notepad cannot save files in Unicode format.

You can use Notepad to edit and save XML documents that contain foreign characters (like Norwegian or French æøå and êèé),

<?xml version="1.0"?>

<note>

<from>Jani</from>

<to>Tove</to>

<message>Norwegian: æøå. French: êèé</message>

</note>

But if you save the file and open it with IE 5.0, you will get an ERROR MESSAGE.

Windows 95/98 Notepad with Encoding

Windows 95/98 Notepad files must be saved with an encoding attribute.

To avoid this error you can add an encoding attribute to your XML declaration, but you cannot use Unicode.

This encoding (open it with IE 5.0), will NOT give an error message:

<?xml version="1.0" encoding="windows-1252"?>

This encoding (open it with IE 5.0), will NOT give an error message:

<?xml version="1.0" encoding="ISO-8859-1"?>

This encoding (open it with IE 5.0), WILL give an error message:

<?xml version="1.0" encoding="UTF-8"?>

This encoding (open it with IE 5.0), WILL give an error message:

<?xml version="1.0" encoding="UTF-16"?>

Windows 2000 Notepad

Windows 2000 Notepad can save files as Unicode.

The Notepad editor in Windows 2000 supports Unicode. If you select to save this XML file as Unicode (note that the document does not contain any encoding attribute):

<?xml version="1.0"?>

<note>

<from>Jani</from>

<to>Tove</to>

<message>Norwegian: æøå. French: êèé</message>

</note>

you can open it with IE 5.0, WITHOUT getting an error message.

Windows 2000 Notepad with Encoding

Windows 2000 Notepad files saved as Unicode use "UTF-16" encoding.

If you add an encoding attribute to XML files saved as Unicode, windows encoding values will generate an error.

This encoding (open it with IE 5.0), WILL give an error message:

<?xml version="1.0" encoding="windows-1252"?>

This encoding (open it with IE 5.0), WILL give an error message:

<?xml version="1.0" encoding="ISO-8859-1"?>

This encoding (open it with IE 5.0), WILL give an error message:

<?xml version="1.0" encoding="UTF-8"?>

This encoding (open it with IE 5.0), will NOT give an error message:

<?xml version="1.0" encoding="UTF-16"?>

Error Messages

If you try to load an XML document into Internet Explorer 5, you can get two different errors indicating encoding problems:

An invalid character was found in text content.

You will get this error message if a character in the XML document does not match the encoding attribute. Normally you will get this error message if your XML document contains "foreign" characters, and the file was saved with a single-byte encoding editor like Notepad, and no encoding attribute was specified.

Switch from current encoding to specified encoding not supported.

You will get this error message if your file was saved as Unicode/UTF-16 but the encoding attribute specified a single-byte encoding like Windows-1252, ISO-8859-1 or UTF-8. You can also get this error message if your document was saved with single-byte encoding, but the encoding attribute specified a double-byte encoding like UTF-16.

Conclusion

The conclusion is that the encoding attribute has to specify the encoding used when the document was saved. My best advice to avoid errors is this:

Always save XML files as Unicode, without any encoding information.

Use an editor that supports Unicode (Windows 2000 Notepad does) and always skip the encoding attribute.

Cheap Web Hosting Articles - Web Site Design & Web Hosting Tutorials - Domain Hosting