Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

UTF8 encoding is a variable length encoding. This means that sometimes it uses 1 byte, sometimes it uses 2, 3, or 4 bytes to represent

image text in transcribed

UTF8 encoding is a variable length encoding. This means that sometimes it uses 1 byte, sometimes it uses 2, 3, or 4 bytes to represent a character. How does it know that a character is 1 or 2 or 3 or 4 bytes? UTF-8 distinguishes this using a unary encoding. That is, if the bits of a character start with 0, then there is only 1 byte being read for this character. Oxxxxxxx represents a 1-byte character (basically the first 127 ascii characters). If the first bits of the character start with 110xxxxx, then there will be two bytes, 1110xxxx means 3 bytes, and 11110xxx means 4 bytes. See this table for example. Bytes 1st byte2nd byte 3rd byte 4th byte 110xxxxx10xxxxxx 1110xxxx10xx 11110xxx 10xxxxxx 1 0xxxxxx 10xxxxxx xxxx 10xxxxxx Hence, the capital A, which is 41 is represented as 00101001 encoded as A' in UTF8. The winky emoji: is 128,52 l in decimal which is 00000001 1 1 1 1 01 10 00001001 in binary, which is encoded into UTF8 as 11 110000 10011111 10011000 10001001 (with only 21 significant bits) Encoding errors happen when a file encoded in one format is decoded in another format. Let's say that the string 'Hi' was encoded with ASCII where H is represented in decimal as 72 and where i is represented in decimal as 105 How would it appear if decoded in UTF8! Let's say that the string(which is Chinese 'ni hao, which translates to hello in English) is encoded with UTF-8: whereis represented in decimal as 20320 and whereis represented in decimal as 22909 How would it appear if decoded with ASCII (show your work)? (Hint - for this you only need the 3-byte version of utf8.) UTF8 encoding is a variable length encoding. This means that sometimes it uses 1 byte, sometimes it uses 2, 3, or 4 bytes to represent a character. How does it know that a character is 1 or 2 or 3 or 4 bytes? UTF-8 distinguishes this using a unary encoding. That is, if the bits of a character start with 0, then there is only 1 byte being read for this character. Oxxxxxxx represents a 1-byte character (basically the first 127 ascii characters). If the first bits of the character start with 110xxxxx, then there will be two bytes, 1110xxxx means 3 bytes, and 11110xxx means 4 bytes. See this table for example. Bytes 1st byte2nd byte 3rd byte 4th byte 110xxxxx10xxxxxx 1110xxxx10xx 11110xxx 10xxxxxx 1 0xxxxxx 10xxxxxx xxxx 10xxxxxx Hence, the capital A, which is 41 is represented as 00101001 encoded as A' in UTF8. The winky emoji: is 128,52 l in decimal which is 00000001 1 1 1 1 01 10 00001001 in binary, which is encoded into UTF8 as 11 110000 10011111 10011000 10001001 (with only 21 significant bits) Encoding errors happen when a file encoded in one format is decoded in another format. Let's say that the string 'Hi' was encoded with ASCII where H is represented in decimal as 72 and where i is represented in decimal as 105 How would it appear if decoded in UTF8! Let's say that the string(which is Chinese 'ni hao, which translates to hello in English) is encoded with UTF-8: whereis represented in decimal as 20320 and whereis represented in decimal as 22909 How would it appear if decoded with ASCII (show your work)? (Hint - for this you only need the 3-byte version of utf8.)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions

Question

2. Show the trainees how to do it without saying anything.

Answered: 1 week ago

Question

Consider the function. f(x) = sin(x), 0

Answered: 1 week ago

Question

What are the Five Phases of SDLC? Explain each briefly.

Answered: 1 week ago

Question

How can Change Control Procedures manage Project Creep?

Answered: 1 week ago