Tue. Mar 19th, 2024

HTML Encoding (Character Sets)

To display an HTML page correctly, a web browser must know what character set (character encoding) to use.


What is Character Encoding?

ASCII was the first character encoding standard (also called character set). It was a unique binary 7 bits number used to define the 127 different alphanumeric characters that could be used on the internet.

ASCII supported numbers (0-9), English letters (A-Z), and some special characters like ! $ + – ( ) @ < > .

ANSI (Windows-1252) was the default character set for Windows (up to Windows 95). It supported 256 different codes.

ISO-8859-1, an extension to ASCII, was the default character set for HTML 4. It also supported 256 different codes.

Because ANSI and ISO was too limited, the default character encoding was changed to Unicode (UTF-8) in HTML5.

Unicode covers (almost) all the characters and symbols in the world.

All HTML 4 processors also support UTF-8.

The HTML charset Attribute

To display an HTML page correctly, a web browser must know the character set used in the page.

This is specified in the <meta> tag:

For HTML4:

<meta http-equiv=”Content-Type” content=”text/html;charset=ISO-8859-1″>

For HTML5:

<meta charset=”UTF-8″>

 

If a browser detect ISO-8859-1 in a web page, it normally defaults to ANSI, because ANSI is identical to ISO-8859-1 except that ANSI has 32 extra characters.

 

6,027 total views, 1 views today

Leave a Reply