paulb@357 | 1 | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
paulb@436 | 2 | <html xmlns="http://www.w3.org/1999/xhtml"><head> |
paulb@436 | 3 | |
paulb@436 | 4 | <title>URLs and Paths</title><meta name="generator" content="amaya 8.1a, see http://www.w3.org/Amaya/" /> |
paulb@436 | 5 | <link href="styles.css" rel="stylesheet" type="text/css" /></head> |
paulb@327 | 6 | <body> |
paulb@327 | 7 | <h1>URLs and Paths</h1> |
paulb@357 | 8 | <p>The URL at which your application shall appear is arguably the first |
paulb@357 | 9 | part |
paulb@357 | 10 | of the application's user interface that any user will see. Remember |
paulb@357 | 11 | that a user of your application does not have to be a real person; in |
paulb@357 | 12 | fact, |
paulb@327 | 13 | a user can be any of the following things:</p> |
paulb@327 | 14 | <ul> |
paulb@327 | 15 | <li>A real person entering the URL into a browser's address bar.</li> |
paulb@327 | 16 | <li>A real person linking to your application by writing the URL in a |
paulb@357 | 17 | separate Web page.</li> |
paulb@357 | 18 | <li>A program which has the URL defined within it and which may |
paulb@357 | 19 | manipulate the URL to perform certain kinds of operations.</li> |
paulb@327 | 20 | </ul> |
paulb@357 | 21 | <p>Some application developers have a fairly rigid view of what kind of |
paulb@357 | 22 | information a URL should contain and how it should be structured. In |
paulb@357 | 23 | this guide, we shall look at a number of different approaches.</p> |
paulb@327 | 24 | <h2>Interpreting Path Information</h2> |
paulb@357 | 25 | <p>What the URL is supposed to do is to say where (on the Internet or |
paulb@357 | 26 | on an |
paulb@357 | 27 | intranet) your application resides and which resource or service is |
paulb@357 | 28 | being |
paulb@327 | 29 | accessed, and these look like this:</p> |
paulb@327 | 30 | <pre>http://www.boddie.org.uk/python/WebStack.html</pre> |
paulb@357 | 31 | <p>In an application the full URL, containing the address of the |
paulb@357 | 32 | machine on which it is running, is not always interesting. In the |
paulb@357 | 33 | WebStack API (and in other Web programming frameworks), we also talk |
paulb@357 | 34 | about "paths" - a path is just the part of the |
paulb@357 | 35 | URL which refers to the resource or service, ignoring the actual |
paulb@357 | 36 | Internet |
paulb@357 | 37 | address, and so the above example would have a path which looks like |
paulb@357 | 38 | this:</p> |
paulb@327 | 39 | <pre>/python/WebStack.html</pre> |
paulb@327 | 40 | <p>When writing a Web application, most of the time you just need to |
paulb@357 | 41 | concentrate on the path because the address doesn't usually tell you |
paulb@357 | 42 | anything |
paulb@327 | 43 | you don't already know. What you need to do is to interpret the path |
paulb@357 | 44 | specified in the request in order to work out which resource or service |
paulb@357 | 45 | the user is trying to access.</p> |
paulb@327 | 46 | <div class="WebStack"> |
paulb@327 | 47 | <h3>WebStack API - Path Methods in Transaction Objects</h3> |
paulb@357 | 48 | <p>WebStack provides the following transaction methods for inspecting |
paulb@357 | 49 | path |
paulb@327 | 50 | information:</p> |
paulb@327 | 51 | <dl> |
paulb@327 | 52 | <dt><code>get_path</code></dt> |
paulb@357 | 53 | <dd>This gets the entire path of a resource including parameter |
paulb@357 | 54 | information (as described in <a href="parameters.html">"Request |
paulb@436 | 55 | Parameters and Uploads"</a>).<br /> |
paulb@453 | 56 | An optional <code>encoding</code> parameter may be used to assist the process of converting the path to a Unicode object - see below.</dd> |
paulb@327 | 57 | <dt><code>get_path_without_query</code></dt> |
paulb@357 | 58 | <dd>This gets the entire path of a resource but without any parameter |
paulb@453 | 59 | information.<br /> |
paulb@453 | 60 | |
paulb@453 | 61 | An optional <code>encoding</code> parameter may be used to assist the process of converting the path to a Unicode object - see below.</dd> |
paulb@327 | 62 | </dl> |
paulb@327 | 63 | </div> |
paulb@453 | 64 | <p>To obtain the above path using the WebStack API, we can write the following code:</p> |
paulb@453 | 65 | <pre>path = trans.get_path()</pre> |
paulb@453 | 66 | <p>Really, however, we should explicitly state the character encoding of the path. Unfortunately, as noted in <a href="encodings.html">"Character Encodings"</a>, |
paulb@453 | 67 | some guesswork is required, but if we have decided to use UTF-8 as the |
paulb@453 | 68 | encoding of our output, it is reasonable to specify UTF-8 here as well:</p> |
paulb@453 | 69 | <pre>path = trans.get_path("utf-8")<br />path = trans.get_path(self.encoding) # assuming a class/instance attribute defining such things centrally</pre> |
paulb@453 | 70 | <p>In many applications such nuances are not particularly important, but consider the following URL:</p> |
paulb@453 | 71 | <pre>http://www.boddie.org.uk/python/WebStack-%E6%F8%E5.html</pre> |
paulb@453 | 72 | <p>Here, the URL includes non-ASCII characters which must be |
paulb@453 | 73 | interpreted somehow. In this case, the "URL encoded" character values |
paulb@453 | 74 | refer to ISO-8859-1 values and can be safely inspected as follows:</p> |
paulb@453 | 75 | <pre>path = trans.get_path("iso-8859-1")</pre> |
paulb@453 | 76 | <p>The above usage of UTF-8 will also work in this case, but only |
paulb@453 | 77 | because WebStack will use ISO-8859-1 as a "safe" default for character |
paulb@453 | 78 | values it does not understand.</p> |
paulb@357 | 79 | <h2>Query Strings</h2> |
paulb@453 | 80 | |
paulb@335 | 81 | <p>Sometimes, a "query string" will be provided as part of a URL; for |
paulb@335 | 82 | example:</p> |
paulb@335 | 83 | <pre>http://www.boddie.org.uk/application?param1=value1</pre> |
paulb@357 | 84 | <p>The question mark character marks the beginning of the query string |
paulb@357 | 85 | which |
paulb@357 | 86 | contains encoded parameter information; such information and its |
paulb@357 | 87 | inspection |
paulb@335 | 88 | is discussed in <a href="parameters.html">"Request Parameters and |
paulb@335 | 89 | Uploads"</a>.</p> |
paulb@453 | 90 | <div class="WebStack"> |
paulb@453 | 91 | <h3>WebStack API - Getting Query Strings</h3> |
paulb@453 | 92 | <p>WebStack provides a method to get only the query string from the URL:</p> |
paulb@453 | 93 | <dl><dt><code>get_query_string</code></dt><dd>This method returns the part of the URL which contains parameter |
paulb@453 | 94 | information. Such information will be "URL encoded", meaning that |
paulb@453 | 95 | certain characters will have the form <code>%xx</code> where <code>xx</code> |
paulb@453 | 96 | is a two digit hexadecimal number referring to the byte value of the |
paulb@453 | 97 | unencoded character - see below for discussion of this. </dd></dl> |
paulb@453 | 98 | </div> |
paulb@453 | 99 | <p>Note that unlike the path access methods, <code>get_query_string</code> |
paulb@453 | 100 | does not accept an encoding as a parameter. Moreover, when retrieving a |
paulb@453 | 101 | path including a query string, the encoding is not used to interpret |
paulb@453 | 102 | "URL encoded" character values in the query string itself. Consider |
paulb@453 | 103 | this example URL:</p> |
paulb@453 | 104 | <pre>http://www.boddie.org.uk/application-%E6?var%F8=value%E5</pre> |
paulb@453 | 105 | <p>Upon requesting the path and the query string, certain differences should be noticeable:</p> |
paulb@453 | 106 | <pre>trans.get_path("iso-8859-1") # returns /application-æ?var%F8=value%E5<br />trans.get_path_without_query("iso-8859-1") # returns /application-æ<br />trans.get_query_string() # returns var%F8=value%E5</pre> |
paulb@453 | 107 | <p>One reason for this seemingly arbitrary distinction in treatment is |
paulb@453 | 108 | the way certain servers present path information to WebStack - often |
paulb@453 | 109 | the "URL encoded" information has been replaced by raw character values |
paulb@453 | 110 | which must then be converted to Unicode characters. In contrast, most |
paulb@453 | 111 | servers do not perform the same automatic conversion on the query |
paulb@453 | 112 | string.</p> |
paulb@453 | 113 | <p>In fact, it may become impossible to properly interpret the query |
paulb@453 | 114 | string if it is decoded prematurely; consider this example URL:</p> |
paulb@453 | 115 | <pre>http://www.boddie.org.uk/application?a=%26b</pre> |
paulb@453 | 116 | <p>If we were to just decode the query string and then extract the |
paulb@453 | 117 | parameters/fields, the result would be two empty parameters with the |
paulb@453 | 118 | names <code>a</code> and <code>b</code>, as opposed to the correct interpretation of the query string as describing a single parameter <code>a</code> with the value <code>&b</code>.</p> |
paulb@453 | 119 | <h3>Conclusion</h3> |
paulb@453 | 120 | <p>Regardless of all this, all inspection of path parameters should be done using the appropriate methods (see <a href="parameters.html">"Request Parameters and |
paulb@453 | 121 | Uploads"</a>), |
paulb@453 | 122 | and direct access to the query string should only occur in situations |
paulb@453 | 123 | of a specialised nature such as the building of URLs for output.</p> |
paulb@357 | 124 | <h2>More About Paths</h2> |
paulb@357 | 125 | <ul> |
paulb@357 | 126 | <li><a href="path-info.html">Paths To and Within Applications</a></li> |
paulb@357 | 127 | <li><a href="path-design.html">Path Design and Interpretation</a></li> |
paulb@357 | 128 | <li><a href="path-info-support.html">Path Info Support in Server |
paulb@357 | 129 | Environments</a></li> |
paulb@357 | 130 | </ul> |
paulb@436 | 131 | </body></html> |