Question 1

What is character encoding?

Accepted Answer

Character encoding is a system that maps characters (letters, digits, symbols) to numeric values (bytes). Different encodings use different mappings and byte sizes. Common encodings include UTF-8, UTF-16, ASCII, and Latin-1 (ISO 8859-1).

Question 2

What is the difference between UTF-8 and UTF-16?

Accepted Answer

UTF-8 uses 1–4 bytes per character and is backward-compatible with ASCII. UTF-16 uses 2 or 4 bytes per character. UTF-8 is more compact for English/Latin text, while UTF-16 can be more compact for East Asian scripts. UTF-8 is the dominant encoding on the web.

Question 3

Why does the same text have different byte lengths in different encodings?

Accepted Answer

Different encodings use different numbers of bytes to represent the same character. For example, the character 'é' is 1 byte in Latin-1, 2 bytes in UTF-8, and 2 bytes in UTF-16. ASCII can only represent characters 0–127, so non-ASCII characters cannot be encoded.

Question 4

What is a hex dump?

Accepted Answer

A hex dump displays the raw bytes of data in hexadecimal format. Each byte is shown as two hex digits (00–FF). Hex dumps are used for debugging, analyzing binary data, and understanding how text is stored in different encodings at the byte level.

Question 5

When should I use which encoding?

Accepted Answer

Use UTF-8 for web content, APIs, and modern applications — it's the universal default. Use ASCII when you only need English characters and minimal byte size. Latin-1 is useful for legacy Western European systems. UTF-16 is used internally by JavaScript and Java.

Charset Converter

Frequently Asked Questions

What is character encoding?

What is the difference between UTF-8 and UTF-16?

Why does the same text have different byte lengths in different encodings?

What is a hex dump?

When should I use which encoding?