1 <?xml version="1.0" encoding="iso-8859-1"?> 2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 3 <html xmlns="http://www.w3.org/1999/xhtml"><head> 4 <title>Character Encodings</title> 5 <link href="styles.css" rel="stylesheet" type="text/css" /></head> 6 <body> 7 <h1>Character Encodings</h1> 8 <p>When writing applications with WebStack, you should try and use 9 Python's Unicode objects as much as possible. However, there are a 10 number of places where plain Python strings can be involved:</p> 11 <ul> 12 <li><a href="parameters-headers.html">Inspecting query strings</a></li> 13 <li><a href="responses.html">Sending output in a response</a></li> 14 <li><a href="parameters.html">Receiving uploaded content</a></li> 15 <li><a href="state.html">Accessing cookie information</a></li> 16 <li><a href="sessions.html">Accessing session information</a> (see the <a href="sessions-usage.html#Limitations">"Session Limitations and Guidelines"</a>)</li> 17 </ul> 18 <p>When Web pages (and other types of content) are sent to and from 19 users of your application, the text will be in some kind of character 20 encoding. For example, in English-speaking environments, the US-ASCII 21 encoding is common and contains the basic letters, numbers and symbols 22 used in English, whereas in Western Europe encodings like 23 ISO-8859-1 and ISO-8859-15 are typically used, since they contain 24 additional letters and symbols in order to support other languages. 25 Often, UTF-8 is used to encode text because it covers most languages 26 simultaneously and is therefore flexible enough for many applications.</p> 27 <p>When URLs are received in applications, in order for some of the 28 request parameters to be interpreted, the situation is a bit more 29 awkward. The original text is encoded in US-ASCII but will contain 30 special numeric codes that indicate character values in the 31 original text encoding - see the <a href="parameters.html">description 32 of query strings</a> for more information.</p> 33 <h2>Recommendations</h2> 34 <dl> 35 <dt>The following recommendations should help you avoid issues with 36 incorrect characters in the Web pages (and other content) that you 37 produce:</dt> 38 </dl> 39 <h3>Use Unicode Objects for Textual Content</h3> 40 <p>Handling text in specific encodings using normal Python strings can 41 be difficult, and handling text in multiple encodings in the same 42 application can be highly error-prone. Fortunately, Python has support 43 for Unicode objects which let you think of letters, numbers, symbols 44 and all other characters in an abstract way.</p> 45 <ul> 46 <li>Convert textual content to Unicode as soon as possible.</li> 47 <li>If you must include hard-coded messages in your application code, 48 make sure to specify the encoding using the <a href="http://www.python.org/peps/pep-0263.html">standard declaration</a> 49 at the top of your source file.</li> 50 <li>Remember that the standard library <code>codecs</code> 51 module contains useful functions to access streams as if Unicode 52 objects were being transmitted; for example:</li> 53 </ul> 54 <pre>import codecs<br /><br />class MyResource:<br /><br /> encoding = "utf-8"<br /><br /> def respond(self, trans):<br /> stream = trans.get_request_stream() # only reads strings<br /> unicode_stream = codecs.getreader(self.encoding)(stream) # reads Unicode objects<br /><br /> [Some activity...]<br /><br /> out = trans.get_response_stream() # writes strings and Unicode objects<br /></pre> 55 <h3>Use Strings for Binary Content</h3> 56 <p>If you are reading and writing binary content, Unicode objects are 57 inappropriate. Make sure to open files in binary mode, where necessary.</p> 58 <h3>Use Explicit Encodings and Be Consistent</h3> 59 <p>Although WebStack has some support for detecting character encodings 60 used 61 in requests, it is often best for your application to exercise control 62 over 63 which encoding is used when <a href="parameters.html">inspecting 64 request 65 parameters</a> and when <a href="responses.html">producing responses</a>. 66 The 67 best way to do this is to decide which encoding is most suitable for 68 the data 69 presented and received in your application and then to use it 70 throughout.</p><p>One 71 approach which works acceptably for smaller applications is to define 72 an attribute (or a global) which is conveniently accessible and which 73 can be used directly with various transaction methods. Here is an 74 outline of code which does this:</p> 75 <pre>from WebStack.Generic import ContentType<br /><br />class MyResource:<br /><br /> encoding = "utf-8" # We decide on "utf-8" as our chosen<br /> # encoding.<br /> def respond(self, trans):<br /> [Do various things.]<br /><br /> fields = trans.get_fields_from_body(encoding=self.encoding) # Explicitly use the encoding.<br /><br /> [Do other things with the Unicode values from the fields.]<br /><br /> trans.set_content_type(ContentType("text/html", self.encoding)) # The output Web page uses the encoding.<br /><br /> [Produce the response, making sure that self.encoding is used to convert Unicode to raw strings.]</pre> 76 <h3>Use EncodingSelector to Set the Default Encoding</h3><p>An arguably better approach is to use selectors (as described in <a href="selectors.html">"Selectors - Components for Dispatching to Resources"</a>), typically in a "site map" arrangement (as described in <a href="deploying.html">"Deploying a WebStack Application"</a>), specifically using the <code>EncodingSelector</code>:</p><pre>from WebStack.Generic import ContentType<br /><br />class MyResource:<br /><br /> def respond(self, trans):<br /> [Do various things.]<br /><br /> fields = trans.get_fields_from_body() # Encoding set by EncodingSelector.<br /><br /> [Do other things with the Unicode values from the fields.]<br /><br /> trans.set_content_type(ContentType("text/html")) # The output Web page uses the default encoding.<br /><br /> [Produce the response, making sure that self.encoding is used to convert Unicode to raw strings.]<br /><br />def get_site_map():<br /><br /> return EncodingSelector(MyResource(), "utf-8")</pre><h3>Tell Encodings to Other Components</h3> 77 <p>When using other components to generate content (see <a href="integrating.html">"Integrating with Other Systems"</a>), it may 78 be the case that such components will just write the generated content 79 straight to a normal stream (rather than one wrapped by a <code>codecs</code> 80 module function). In such cases, it is likely that for textual content 81 such as XML or related formats (XHTML, SVG, HTML) you will need to 82 instruct the component to use your chosen encoding; for example:</p> 83 <pre> # In the respond method, xml_document is an xml.dom.minidom.Document object...<br /> xml_document.toxml(self.encoding)</pre> 84 <p>This will then generate the appropriate characters in the output <span style="font-style: italic;">and</span> specify the correct encoding 85 for the XML document.</p> 86 </body></html>