# HG changeset patch # User paulb # Date 1201995193 0 # Node ID c9c8bbed41882ff74c00ddd77d97367b59811216 # Parent 8a2827cf1d9828cbda2c22d326e36467ec955f01 [project @ 2008-02-02 23:33:13 by paulb] Changed default_charset, added a safe_default_charset attribute. Made the path/value decoding/encoding methods more modular. diff -r 8a2827cf1d98 -r c9c8bbed4188 WebStack/Generic.py --- a/WebStack/Generic.py Sat Feb 02 23:32:31 2008 +0000 +++ b/WebStack/Generic.py Sat Feb 02 23:33:13 2008 +0000 @@ -51,8 +51,15 @@ """ # The default charset ties output together with body field interpretation. + # It is also used to interpret URLs and paths. - default_charset = "iso-8859-1" + default_charset = "utf-8" + + # The safe default charset provides some interpretation of incoming data of + # an unknown encoding. Generally, one should avoid making "last resort" + # interpretations, however. + + safe_default_charset = "iso-8859-1" # The default path info is provided here, although the manipulated virtual # path info is an instance attribute set through instances of subclasses of @@ -211,12 +218,28 @@ """ unquoted_path = urllib.unquote(path) + return self.decode_value(unquoted_path, encoding) + + def decode_value(self, value, encoding=None): + + """ + From the given 'value', use the optional 'encoding' (if specified) to decode the + information and convert it to Unicode. Upon failure for a specified 'encoding' + or where 'encoding' is not specified, use the default character encoding to + perform the conversion. + + Returns the 'value' as a Unicode value. + """ + if encoding is not None: try: - return unicode(unquoted_path, encoding) + return unicode(value, encoding) except UnicodeError: pass - return unicode(unquoted_path, self.default_charset) + try: + return unicode(value, self.default_charset) + except UnicodeError: + return unicode(value, self.safe_default_charset) def encode_path(self, path, encoding=None): @@ -226,10 +249,22 @@ encoded" string. """ + return urllib.quote(self.encode_value(path, encoding)) + + def encode_value(self, value, encoding=None): + + """ + Encode the given 'value', using the optional 'encoding' (if specified) or the + default encoding where 'encoding' is not specified, producing a plain string. + """ + if encoding is not None: - return urllib.quote(path.encode(encoding)) + return value.encode(encoding) else: - return urllib.quote(path.encode(self.default_charset)) + try: + return value.encode(self.default_charset) + except UnicodeError: + return value.encode(self.safe_default_charset) # Server-related methods.