Charset Converter

View text in different character encodings. Compare hex dumps and byte lengths for UTF-8, UTF-16, ASCII, and Latin-1.

Frequently Asked Questions

Character encoding is a system that maps characters (letters, digits, symbols) to numeric values (bytes). Different encodings use different mappings and byte sizes. Common encodings include UTF-8, UTF-16, ASCII, and Latin-1 (ISO 8859-1).
UTF-8 uses 1–4 bytes per character and is backward-compatible with ASCII. UTF-16 uses 2 or 4 bytes per character. UTF-8 is more compact for English/Latin text, while UTF-16 can be more compact for East Asian scripts. UTF-8 is the dominant encoding on the web.
Different encodings use different numbers of bytes to represent the same character. For example, the character 'é' is 1 byte in Latin-1, 2 bytes in UTF-8, and 2 bytes in UTF-16. ASCII can only represent characters 0–127, so non-ASCII characters cannot be encoded.
A hex dump displays the raw bytes of data in hexadecimal format. Each byte is shown as two hex digits (00–FF). Hex dumps are used for debugging, analyzing binary data, and understanding how text is stored in different encodings at the byte level.
Use UTF-8 for web content, APIs, and modern applications — it's the universal default. Use ASCII when you only need English characters and minimal byte size. Latin-1 is useful for legacy Western European systems. UTF-16 is used internally by JavaScript and Java.