HTML Encoding (Character Sets)
To display an HTML page correctly, a web browser must know what character set (character encoding) to use.
What is Character Encoding?
ASCII was the first character encoding standard (also called character set). It was a unique binary 7 bits number used to define the 127 different alphanumeric characters that could be used on the internet.
ASCII supported numbers (0-9), English letters (A-Z), and some special characters like ! $ + – ( ) @ < > .
ANSI (Windows-1252) was the default character set for Windows (up to Windows 95). It supported 256 different codes.
ISO-8859-1, an extension to ASCII, was the default character set for HTML 4. It also supported 256 different codes.
Because ANSI and ISO was too limited, the default character encoding was changed to Unicode (UTF-8) in HTML5.
Unicode covers (almost) all the characters and symbols in the world.
All HTML 4 processors also support UTF-8. |
The HTML charset Attribute
To display an HTML page correctly, a web browser must know the character set used in the page.
This is specified in the <meta> tag:
For HTML4:
For HTML5:
If a browser detect ISO-8859-1 in a web page, it normally defaults to ANSI, because ANSI is identical to ISO-8859-1 except that ANSI has 32 extra characters. |
6,364 total views, 1 views today