Recently I’ve been getting more into the idea of cloud hosting for tiny apps. Google App Engine is a perfect fitting glass shoe for applications written in java and other jvm languages of the future. The the world of scala we have sbt, and in the world of sbt we have plugins and in a world of repeatable tasks, we even have templates. Anyway… Even with a world of such luxury we sometimes can’t afford to make certain types of assumptions.
Living in a western world, it easy for most to people to live complacently with applications configured to use western only character sets. In the case of of applications serving string data outside those character encodings we have a few options. One of the most common encodings for multi-byte string content is utf-8. To enforce it, we were given weapons. Get into the habit of always putting the following line at the top of the head section of any rendered html.
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
Recently I learned a new trick. As much as I despise java’s apparent fetish with xml configuration there is actually a handy block of code you can add to your web.xml config that should provide a suggestive hint to your servlet container to use a particular mime type for a given file extension when serving static content. It’s called
mime-mapping. I think the original idea was to be able to provide a custom mime mapping for custom file extensions but you can also use it to override the default mime time for the most common string based file types of the interwebs. In doing so, you can have the oppurtunity encode the character encoding within the mime type as follows.
> curl -I http://northeastscala.appspot.com HTTP/1.1 200 OK ETag: "zqO8fA" Date: Wed, 15 Dec 2010 03:42:39 GMT Expires: Wed, 15 Dec 2010 03:52:39 GMT Cache-Control: public, max-age=600 Content-Type: text/html Server: Google Frontend Transfer-Encoding: chunked
> curl -I http://northeastscala.appspot.com HTTP/1.1 200 OK ETag: "MJtj8Q" Date: Wed, 15 Dec 2010 03:43:23 GMT Expires: Wed, 15 Dec 2010 03:53:23 GMT Cache-Control: public, max-age=600 Content-Type: text/html; charset=utf-8 Server: Google Frontend Transfer-Encoding: chunked
Don’t expect utf-8 content to be served when your app first jumps off the app engine plane. I did and it was kind of embarassing. It’s best to be safe and strap yourself in next to a mime you can trust.