http://en.wikipedia.org/wiki/Unicode
UTF encodings include:
* UTF-7 — a relatively unpopular 7-bit encoding, often considered obsolete (not part of The Unicode Standard but rather an RFC)
* UTF-8 — an 8-bit, variable-width encoding, which maximizes compatibility with ASCII.
* UTF-EBCDIC — an 8-bit variable-width encoding, which maximizes compatibility with EBCDIC. (not part of The Unicode Standard)
* UTF-16 — a 16-bit, variable-width encoding
* UTF-32 — a 32-bit, fixed-width encoding
UTF-8 uses one to four bytes per code point and, being compact for Latin scripts and ASCII-compatible, provides the de facto standard encoding for interchange of Unicode text. It is also used by most recent Linux distributions as a direct replacement for legacy encodings in general text handling.
Cu un bit nu poti sa codezi decat doua caractere, 1 si 0, iar cu 2 biti poti sa formezi coduri pentru patru caractere: 00, 01, 10 si 11.