1 <?xml version="1.0" encoding="iso-8859-1"?> 2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 3 <html xmlns="http://www.w3.org/1999/xhtml"><head> 4 <title>URLs and Paths</title> 5 <link href="styles.css" rel="stylesheet" type="text/css" /></head> 6 <body> 7 <h1>URLs and Paths</h1> 8 <p>The URL at which your application shall appear is arguably the first 9 part 10 of the application's user interface that any user will see. Remember 11 that a user of your application does not have to be a real person; in 12 fact, 13 a user can be any of the following things:</p> 14 <ul> 15 <li>A real person entering the URL into a browser's address bar.</li> 16 <li>A real person linking to your application by writing the URL in a 17 separate Web page.</li> 18 <li>A program which has the URL defined within it and which may 19 manipulate the URL to perform certain kinds of operations.</li> 20 </ul> 21 <p>Some application developers have a fairly rigid view of what kind of 22 information a URL should contain and how it should be structured. In 23 this guide, we shall look at a number of different approaches.</p> 24 <h2>Interpreting Path Information</h2> 25 <p>What the URL is supposed to do is to say where (on the Internet or 26 on an 27 intranet) your application resides and which resource or service is 28 being 29 accessed, and these look like this:</p> 30 <pre>http://www.boddie.org.uk/python/WebStack.html</pre> 31 <p>In an application the full URL, containing the address of the 32 machine on which it is running, is not always interesting. In the 33 WebStack API (and in other Web programming frameworks), we also talk 34 about "paths" - a path is just the part of the 35 URL which refers to the resource or service, ignoring the actual 36 Internet 37 address, and so the above example would have a path which looks like 38 this:</p> 39 <pre>/python/WebStack.html</pre> 40 <p>When writing a Web application, most of the time you just need to 41 concentrate on the path because the address doesn't usually tell you 42 anything 43 you don't already know. What you need to do is to interpret the path 44 specified in the request in order to work out which resource or service 45 the user is trying to access.</p> 46 <div class="WebStack"> 47 <h3>WebStack API - Path Methods in Transaction Objects</h3> 48 <p>WebStack provides the following transaction methods for inspecting 49 path 50 information:</p> 51 <dl> 52 <dt><code>get_path</code></dt> 53 <dd>This gets the entire path of a resource including parameter 54 information (as described in <a href="parameters.html">"Request 55 Parameters and Uploads"</a>).<br /> 56 An optional <code>encoding</code> parameter may be used to assist the process of converting the path to a Unicode object - see below.</dd> 57 <dt><code>get_path_without_query</code></dt> 58 <dd>This gets the entire path of a resource but without any parameter 59 information.<br /> 60 61 An optional <code>encoding</code> parameter may be used to assist the process of converting the path to a Unicode object - see below.</dd><dt><code>get_path_without_info</code></dt><dd>This gets the entire path of a resource but without any parameter 62 information or any special "path info" (as described in <a href="path-info.html">"Paths To and Within Applications"</a>). 63 The result is more or less equivalent to the location where an 64 application has been "published" - ie. the location of an application 65 in a server environment.<br /> 66 67 An optional <code>encoding</code> parameter may be used to assist the process of converting the path to a Unicode object - see below.</dd> 68 </dl> 69 </div> 70 <p>To obtain the above path using the WebStack API, we can write the following code:</p> 71 <pre>path = trans.get_path()</pre> 72 <p>Really, however, we should explicitly state the character encoding of the path. Unfortunately, as noted in <a href="encodings.html">"Character Encodings"</a>, 73 some guesswork is required, but if we have decided to use UTF-8 as the 74 encoding of our output, it is reasonable to specify UTF-8 here as well:</p> 75 <pre>path = trans.get_path("utf-8")<br />path = trans.get_path(self.encoding) # assuming a class/instance attribute defining such things centrally</pre> 76 <p>In many applications such nuances are not particularly important, but consider the following URL:</p> 77 <pre>http://www.boddie.org.uk/python/WebStack-%E6%F8%E5.html</pre> 78 <p>Here, the URL includes non-ASCII characters which must be 79 interpreted somehow. In this case, the "URL encoded" character values 80 refer to ISO-8859-1 values and can be safely inspected as follows:</p> 81 <pre>path = trans.get_path("iso-8859-1")</pre> 82 <p>The above usage of UTF-8 will also work in this case, but only 83 because WebStack will use ISO-8859-1 as a "safe" default for character 84 values it does not understand.</p> 85 <h2>Query Strings</h2> 86 87 <p>Sometimes, a "query string" will be provided as part of a URL; for 88 example:</p> 89 <pre>http://www.boddie.org.uk/application?param1=value1</pre> 90 <p>The question mark character marks the beginning of the query string 91 which 92 contains encoded parameter information; such information and its 93 inspection 94 is discussed in <a href="parameters.html">"Request Parameters and 95 Uploads"</a>.</p> 96 <div class="WebStack"> 97 <h3>WebStack API - Getting Query Strings</h3> 98 <p>WebStack provides a method to get only the query string from the URL:</p> 99 <dl><dt><code>get_query_string</code></dt><dd>This method returns the part of the URL which contains parameter 100 information. Such information will be "URL encoded", meaning that 101 certain characters will have the form <code>%xx</code> where <code>xx</code> 102 is a two digit hexadecimal number referring to the byte value of the 103 unencoded character - see below for discussion of this. </dd></dl> 104 </div> 105 <p>Note that unlike the path access methods, <code>get_query_string</code> 106 does not accept an encoding as a parameter. Moreover, when retrieving a 107 path including a query string, the encoding is not used to interpret 108 "URL encoded" character values in the query string itself. Consider 109 this example URL:</p> 110 <pre>http://www.boddie.org.uk/application-%E6?var%F8=value%E5</pre> 111 <p>Upon requesting the path and the query string, certain differences should be noticeable:</p> 112 <pre>trans.get_path("iso-8859-1") # returns /application-??var%F8=value%E5<br />trans.get_path_without_query("iso-8859-1") # returns /application-?<br />trans.get_query_string() # returns var%F8=value%E5</pre> 113 <p>One reason for this seemingly arbitrary distinction in treatment is 114 the way certain servers present path information to WebStack - often 115 the "URL encoded" information has been replaced by raw character values 116 which must then be converted to Unicode characters. In contrast, most 117 servers do not perform the same automatic conversion on the query 118 string.</p> 119 <p>In fact, it may become impossible to properly interpret the query 120 string if it is decoded prematurely; consider this example URL:</p> 121 <pre>http://www.boddie.org.uk/application?a=%26b</pre> 122 <p>If we were to just decode the query string and then extract the 123 parameters/fields, the result would be two empty parameters with the 124 names <code>a</code> and <code>b</code>, as opposed to the correct interpretation of the query string as describing a single parameter <code>a</code> with the value <code>&b</code>.</p> 125 <h3>Final Note</h3> 126 <p>Regardless of all this, all inspection of path parameters should be done using the appropriate methods (see <a href="parameters.html">"Request Parameters and 127 Uploads"</a>), 128 and direct access to the query string should only occur in situations 129 of a specialised nature such as the building of URLs for output.</p> 130 <h2>More About Paths</h2> 131 <ul> 132 <li><a href="path-info.html">Paths To and Within Applications</a></li> 133 <li><a href="path-design.html">Path Design and Interpretation</a></li><li><a href="path-value-encoding.html">Encoding and Decoding Path Values</a></li><li><a href="path-manipulation.html">Manipulating Paths</a></li> 134 <li><a href="path-info-support.html">Path Info Support in Server 135 Environments</a></li> 136 </ul> 137 </body></html>