Wrong email address

Sat, Oct 25, 2003 In an entry below I listed the wrong email address to get ahold of me!  If you want to talk to me about pdc stuff send me email at pdc@bedafamily.com.  I haven't bothered to figure out how to hook up forwarding for the eightypercent.net domain yet.

scRGB as Unicode

Thu, Oct 23, 2003

Joel's post a couple of days ago reminded me of a conversation I had about a year ago with Michael Stokes, a PM in our group at the time dealing with color.  Specifically, we were talking about how color profiles work and how scRGB makes the world a better place.

I'm going to do my best to explain this, but I'm simplifying some of it and I can't claim to be a color expert.  There are many people out there who know much more about this than I do, but I don't think I've seen any body present this to the average programmer.  It goes like this -- there is color data (similar to an 8 bit char) and there is the color profile that describes how to interpret that data (similar to a codepage).  The combination of the color and the color profile give you everything you need, in theory, to get accurate color.

There is a simple standard color space called sRGB which is a formalization of the ad hoc "standard" that has developed over the years.  It is a color space that most monitors and graphics cards support by default.  In other words, if you send raw RGB data to today's graphics hardware, the thing you see on the screen will be within spitting distance of the sRGB standard.  This is in some ways a parallel to the ASCII with text.  What we end up with is a standard that works well if all you are interested in is showing something on a monitor and you don't care too much about the accuracy of the color.  This is all good and fine as long as you aren't too picky about the color of the shirt you are buying from JCrew.com.

Things start to get complicated once you start talking about gamut.  Any particular device that deals with color (camera, scanner, screen, printer, etc.) has a gamut.  The gamut is the range of physical colors that the device can operate with.  There are colors that you can see that your monitor cannot produce.  Similarly, there are colors that your monitor can produce that your printer can't print.  There are probably even colors that your printer can put on paper that your monitor can't display.  In other words, there are physical limitations and characteristics built in to any device.  Since profiles a color space, they too have a gamut.

The sad part here is that sRGB is defined around how computer display hardware happened to work.  There are colors that fall outside of the definition of sRGB.  In fact, there are lots of colors that fall outside of the sRGB gamut.  My camera (a Canon D60) can capture colors that can't be described by sRGB.  It is like trying to describe a sanskrit character with 7 bit ASCII.  My printer (an Epson 1270) can print colors are also outside of sRGB.  Since I'm in photography for artistic purposes, I'm a bit of a perfectionist and not happy with limiting the colors in my prints by what my monitor can and can't display.

The logical next step here is to redefine the color space so that we can represent colors that sRGB can't represent.  Lots of people have done this.  The most popular color space out there (outside of sRGB) would be AdobeRGB 98 that Adobe popularized via Photoshop. Reality is that if you define the gamut to be too narrow you end up excluding colors and if you expand the gamut too much you end with too few gradations in colors you care about.  This makes your smooth blue sky look like an Aztec pyramid.  Because there is a tradeoff here, everyone and their brother have created a color space to fit a particular need.  The result is that 0xFF0000 (using web syntax) means different things depending on the color space that you are using to interpret it.

Sophisticated software both built into Windows (and Mac OS) and into photoshop does the translation between the data and your screen.  Or from your data to your printer.  Or from you data into a "working color space" and then into the color space of your printer.  This color management software essentially manages mapping from one space to another.  Oh yeah, and don't forget that converting from one space to another is a lossy operation.  All is good and lightness, right?

Well, no. Just like 8 bit characters and codepages, it is all too easy to ignore this stuff and just hope for the best.  Almost every consumer level device outputs sRGB, most printer drivers assume sRGB incoming and almost every picture on the web is authored to sRGB (unless it is authored on a Mac which uses a 1.8 gamma space instead of the 2.2 gamma as defined in sRGB).  If most developers just stick their head in the sand and ignore it, then things will mostly work themselves out.  If you take a photo that is encoded in AdobeRGB and interpret it as sRGB, things will mostly work -- it will just look pretty washed out.

Add to this the fact that right now it isn't clear what piece of software is supposed to translate form what space in to whatever other space and at what time and we have a mess on our hands.  Your camera captures more data than can fit in to sRGB but it may throw that data away and store your photo in sRGB.  Photoshop loads this up and throws a dialog that most users have no idea what to do with.  When you print there are an array of choices in both Photoshop and the printer driver. To be honest, I do some of this stuff for a living and I've wasted a lot of paper trying to figure out what the right settings are.

The state of the color union is that most people have just given up and standardized on sRGB across the board.  This is really a shame because we can do so much more.  We are shortchanging ourselves.  This is like asking the entire world to just learn English.

The way out is Unicode, err, scRGB.  I'm honestly not sure what the 'sc' in scRGB stands for.  scRGB defines a gamut that is much wider than anything you or I can see.  All of the other custom color spaces that have been defined fit inside of scRGB.  However, this comes at a price.  If you were to take an image and encode it in scRGB at 8 bits per channel (bpc), your image would look like crap.  The expanded gamut means that we need much more precision to avoid the color granularity problem described above.  8bpc isn't good enough.  And you thought we would never need anything more than 16.7 million colors!  The cost of unifying all of the various color spaces is 16bpc or even 32bpc.  Sure it costs more, but your pictures are worth it, right?  Jumping up to 16 or 32bpc may seem like a huge leap, but it probably isn't that big a deal in practice.  Memory and disk space is cheap and this stuff can compress pretty well.  Unicode was painful at one point but it is definitely worth it now.  Beyond this, when persisting this stuff out, there are ways to encode scRGB data that recognizes that most of the extra precision isn't needed in many common cases.  This is similar to the UTF-8 or UTF-16 encodings for UCS-4 (in unicode speak).

The benifits to this switch are huge.  All of the color data that you might want to store can be represented in the scRGB model.  Whereas in today's model converting to any particular color space could be a mistake, converting to scRGB is generally never the wrong thing to do.  With the extended gamut you can be sure that you won't have to clamp or compress your colors and with the extra precision you mitigate the lossyness of the conversion.  If the printer driver speaks scRGB it can do the best job it knows how to get that on to the paper. 

scRGB is the Unicode of color.

For more detail, here is an (old) article on ExtremeTech on scRGB.

Update on April 30, 2005
This article has been in the top 3 results for "scRGB" on Google for quite a while.  I was chatting with Michael Stokes (Color Architect at Microsoft) and he had some comments to add on the naming of scRGB:

The "sc" in "scRGB" is not defined by IEC, but the annex uses the term "specular component" and that is probably the closest thing to a true definition that IEC has provided.  Historically it was called "sRGB-X" then "XsRGB" (but the marketing folks didn't like "excess" in the pronunciation) and then "sRGB-64" but that was too close to Nintendo 64 and finally "scRGB" but it is all the same thing.

Here's my card

Thu, Oct 23, 2003

This afternoon I went to some meetings on how to present yourself as a Microsoft employee for the PDC.  It was mostly predictable -- use common sense when talking to customers, don't badmouth competitors and be careful around the press.  However, one thing that stuck out in my mind is the fact that we are now encouranged to hand out our business cards.  In the past, it was considered a virtue to keep the development team from getting "distracted" by direction interaction with customers.  I like this new approach a lot more.

PDCBloggers.com

Thu, Oct 23, 2003 I know I'm late to the party here, but I finially signed up to get my site listed on PDCBloggers.com.  I'm super jazzed about the PDC and all the stuff that we are presenting there.

My PDC schedule

Sat, Oct 18, 2003

I made the cut and I'm going to be going to the PDC.  Here is the where and when of where I'm going to be for sure:

  • I will be proctoring some of the client "Hands On Labs."  I'll be working Monday, Tuesday and Wednesday mornings.
  • I'm going to be an Expert at the "Ask the Experts" event on Tuesday night.  I don't remember the exact name of the table/room we'll be in, but look for the closest thing to Avalon client graphics.
  • I'll be cheering Greg Schechter on as he presents an in depth intro to Avalon graphics at the CLI341 breakout on Tuesday afternoon (3:45-5:00).
  • I'm not scheduled to officially man the Avalon booth, but I'll probably be hanging out there some of the time.

Anything else you guys think I should hit?  Some of the "Birds of a Feather" sessions look like fun.  The one titled: "Strategic Decisions and Rich Client War Stories: Designing today for "Whidbey" and "Longhorn"" looks good along with the "VS.NET Secret Tips & Tricks" session.  "What is Windows in an HD World?" looks like fun too!

If you want to get ahold of me during the PDC, post a comment to something up here or send email to "pdc at bedafamily.com" and we'll figure out a way to meet.

Anita is sick

Sat, Oct 18, 2003 Best wishes go out to Anita Rowland as she recovers from surgery.  I worked with Anita back in the day on IE4/5.  I hope she has a fast and full recovery.

Joel on Unicode

Tue, Oct 14, 2003

I'm sure everyone has seen this now, but Joel is dead on wrt Unicode.  I worked on some of the codepage reloading stuff in IE so I've lived some of this stuff -- Both recognizing the BOM for Unicode and implementing the reload of an already started page when we hit the codepage meta.  Most of IE is fully unicode of course.  The only exception is when we deal with URLs because, back in the day, they were restricted to 7bit ascii.

There are some other issues that Joel didn't bring up that devs should be aware of:

  • Sorting unicode is hard.  Each language has its own sort rules.  Given any particular set of strings, they may be sorted differently depending on the language.
  • Capitilization can be hard also.  There was a bug in IE at one point where we didn't deal with capitilization of a japanese string right and someone's name was transformed into "dead body on beach."  At the very least that is a rude thing to call someone.
  • Be aware that text that looks the same on the screen can be backed by different unicode codepoints.  This is a problem with going to unicode for URLs as it makes domain squating more difficult to deal with.  Two people might own two domains that look the same when typed in but are instead actually two different domains from a unicode codepoint point of view.  As an analogy, I was looking at an ancient typewriter the other day that one of our PMs has outside of her office.  It doesn't have a '1' key!  The solution is to just use the 'l' (L) key instead.  Now imagine if the keys are font glyphs as displayed on the screen and the concept of '1' and 'l' are unicode codepoints.  'He11o' and 'Hello' would look exactly alike on the screen but would actually be different domains.

New Computer

Tue, Oct 14, 2003

The reason I haven't been posting is that I've been setting up a new computer here at home.  I got rid of my 1.4GHz Athlon and replaced it with a sweet 2.8GHz P4.  The 800MHz memory interface along with a new case and mirrored RAID 120GB SATA drives make this a sweet ride.  The main reason that I upgraded is that I've been taking a ton of photos and I haven't been as diligent at backing them up as I should be.  The mirrored drives are an extra level of insurance.

Plus, blue LEDs just look sweet.  Now all I need is a big tailpipe and "Powered by HKS" stickers!

I haven't picked a new videocard yet.  But when I do, I'm going to get the most powerful thing that I can afford.  Come to the PDC and you'll see what I've been working on and why!

Beda's Law

Sat, Oct 4, 2003

At the party I've come up with a law.  You heard it here first:

"If I can write 80% of a piece of software in a weekend, there is no business model for that software."

I fundamentally believe that there is no money to be made by selling people blogging tools (like Radio).  They are just too easy to write.  Providing a hosting/blogging/etc. service is another thing entirely.

Pics of the new car...

Sat, Oct 4, 2003

I really have to write some software for doing picture galleries so that I can put up a bunch of stuff from my trip.  In the meantime, here are some links to a thread I started over on the roadfly forum about the new ride.  There are some pictures from the trip up there.