<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Unicode and You</title>
	<atom:link href="http://betterexplained.com/articles/unicode/feed/" rel="self" type="application/rss+xml" />
	<link>http://betterexplained.com/articles/unicode/</link>
	<description>Learning shouldn't hurt. Let's share the insights that made difficult ideas click.</description>
	<lastBuildDate>Sat,  7 Nov 2009 23:27:48 -0800</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Avi</title>
		<link>http://betterexplained.com/articles/unicode/#comment-246910</link>
		<dc:creator>Avi</dc:creator>
		<pubDate>Tue, 23 Jun 2009 13:31:47 +0000</pubDate>
		<guid isPermaLink="false">http://betterexplained.com/articles/unicode/#comment-246910</guid>
		<description>Cool, thanks for the first answer, Kalid. I&#039;ll look over the article that you mentioned.

Avi</description>
		<content:encoded><![CDATA[<p>Cool, thanks for the first answer, Kalid. I&#8217;ll look over the article that you mentioned.</p>
<p>Avi</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kalid</title>
		<link>http://betterexplained.com/articles/unicode/#comment-246896</link>
		<dc:creator>Kalid</dc:creator>
		<pubDate>Mon, 22 Jun 2009 19:27:43 +0000</pubDate>
		<guid isPermaLink="false">http://betterexplained.com/articles/unicode/#comment-246896</guid>
		<description>@Kai: Thanks, glad you liked it.

@Avi: Thank you! Great questions

1) Unicode gives a number (called a code point) to every symbol, so &quot;a&quot; &quot;b&quot; and &quot;c&quot; each have their own number. Sometimes the same exact symbol (like a) will have two different numbers that represent it, to be compatible with the old formats like ASCII.

From a purist point of view, it&#039;d be nice to have every symbol have exactly 1 number, but from a practical standpoint the system needs to be backwards compatible. So latin characters appear where they are today, in the ASCII range under 127, and also in another &quot;proper&quot; location defined by the unicode standard.

2) Great question. You&#039;re right, in UTF-8, avi should only have 3 bytes (61 76 69). The preceding ones (EF BB BF) are part of the BOM (byte order mark) that defines whether the data is big or little endian. You can read more about it here:

http://betterexplained.com/articles/understanding-big-and-little-endian-byte-order/

(Scroll down for the part about Unicode).</description>
		<content:encoded><![CDATA[<p>@Kai: Thanks, glad you liked it.</p>
<p>@Avi: Thank you! Great questions</p>
<p>1) Unicode gives a number (called a code point) to every symbol, so &#8220;a&#8221; &#8220;b&#8221; and &#8220;c&#8221; each have their own number. Sometimes the same exact symbol (like a) will have two different numbers that represent it, to be compatible with the old formats like ASCII.</p>
<p>From a purist point of view, it&#8217;d be nice to have every symbol have exactly 1 number, but from a practical standpoint the system needs to be backwards compatible. So latin characters appear where they are today, in the ASCII range under 127, and also in another &#8220;proper&#8221; location defined by the unicode standard.</p>
<p>2) Great question. You&#8217;re right, in UTF-8, avi should only have 3 bytes (61 76 69). The preceding ones (EF BB BF) are part of the BOM (byte order mark) that defines whether the data is big or little endian. You can read more about it here:</p>
<p><a href="http://betterexplained.com/articles/understanding-big-and-little-endian-byte-order/" rel="nofollow">http://betterexplained.com/articles/understanding-big-and-little-endian-byte-order/</a></p>
<p>(Scroll down for the part about Unicode).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Avi</title>
		<link>http://betterexplained.com/articles/unicode/#comment-246720</link>
		<dc:creator>Avi</dc:creator>
		<pubDate>Sun, 21 Jun 2009 12:08:52 +0000</pubDate>
		<guid isPermaLink="false">http://betterexplained.com/articles/unicode/#comment-246720</guid>
		<description>Very good article - the most clear article I have encountered on the web.
Keep up the good work Kalid!
Two things I didn&#039;t understand:
1. &quot;Purists probably didn’t like this, because the full Latin character sets were defined elsewhere, and now one letter had 2 codepoints&quot;. &quot;Defines elsewhere&quot;, &quot;2 codepoints&quot; - I didn&#039;t get it.
2. Regarding the first byte count which indicates on the number of bytes in. I wrote &quot;avi&quot; and saved it as UTF8. I looked at hex editor and saw that the first byte was 1111...
The bytes there were: EF BB BF 61 76 69 so it amounts to 6 bytes -can somebody explain this issue to me.</description>
		<content:encoded><![CDATA[<p>Very good article &#8211; the most clear article I have encountered on the web.<br />
Keep up the good work Kalid!<br />
Two things I didn&#8217;t understand:<br />
1. &#8220;Purists probably didn’t like this, because the full Latin character sets were defined elsewhere, and now one letter had 2 codepoints&#8221;. &#8220;Defines elsewhere&#8221;, &#8220;2 codepoints&#8221; &#8211; I didn&#8217;t get it.<br />
2. Regarding the first byte count which indicates on the number of bytes in. I wrote &#8220;avi&#8221; and saved it as UTF8. I looked at hex editor and saw that the first byte was 1111&#8230;<br />
The bytes there were: EF BB BF 61 76 69 so it amounts to 6 bytes -can somebody explain this issue to me.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
