1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 2 <html xmlns="http://www.w3.org/1999/xhtml"><head> 3 4 <title>URLs and Paths</title><meta name="generator" content="amaya 8.1a, see http://www.w3.org/Amaya/" /> 5 <link href="styles.css" rel="stylesheet" type="text/css" /></head> 6 <body> 7 <h1>URLs and Paths</h1> 8 <p>The URL at which your application shall appear is arguably the first 9 part 10 of the application's user interface that any user will see. Remember 11 that a user of your application does not have to be a real person; in 12 fact, 13 a user can be any of the following things:</p> 14 <ul> 15 <li>A real person entering the URL into a browser's address bar.</li> 16 <li>A real person linking to your application by writing the URL in a 17 separate Web page.</li> 18 <li>A program which has the URL defined within it and which may 19 manipulate the URL to perform certain kinds of operations.</li> 20 </ul> 21 <p>Some application developers have a fairly rigid view of what kind of 22 information a URL should contain and how it should be structured. In 23 this guide, we shall look at a number of different approaches.</p> 24 <h2>Interpreting Path Information</h2> 25 <p>What the URL is supposed to do is to say where (on the Internet or 26 on an 27 intranet) your application resides and which resource or service is 28 being 29 accessed, and these look like this:</p> 30 <pre>http://www.boddie.org.uk/python/WebStack.html</pre> 31 <p>In an application the full URL, containing the address of the 32 machine on which it is running, is not always interesting. In the 33 WebStack API (and in other Web programming frameworks), we also talk 34 about "paths" - a path is just the part of the 35 URL which refers to the resource or service, ignoring the actual 36 Internet 37 address, and so the above example would have a path which looks like 38 this:</p> 39 <pre>/python/WebStack.html</pre> 40 <p>When writing a Web application, most of the time you just need to 41 concentrate on the path because the address doesn't usually tell you 42 anything 43 you don't already know. What you need to do is to interpret the path 44 specified in the request in order to work out which resource or service 45 the user is trying to access.</p> 46 <div class="WebStack"> 47 <h3>WebStack API - Path Methods in Transaction Objects</h3> 48 <p>WebStack provides the following transaction methods for inspecting 49 path 50 information:</p> 51 <dl> 52 <dt><code>get_path</code></dt> 53 <dd>This gets the entire path of a resource including parameter 54 information (as described in <a href="parameters.html">"Request 55 Parameters and Uploads"</a>).<br /> 56 An optional <code>encoding</code> parameter may be used to assist the process of converting the path to a Unicode object - see below.</dd> 57 <dt><code>get_path_without_query</code></dt> 58 <dd>This gets the entire path of a resource but without any parameter 59 information.<br /> 60 61 An optional <code>encoding</code> parameter may be used to assist the process of converting the path to a Unicode object - see below.</dd> 62 </dl> 63 </div> 64 <p>To obtain the above path using the WebStack API, we can write the following code:</p> 65 <pre>path = trans.get_path()</pre> 66 <p>Really, however, we should explicitly state the character encoding of the path. Unfortunately, as noted in <a href="encodings.html">"Character Encodings"</a>, 67 some guesswork is required, but if we have decided to use UTF-8 as the 68 encoding of our output, it is reasonable to specify UTF-8 here as well:</p> 69 <pre>path = trans.get_path("utf-8")<br />path = trans.get_path(self.encoding) # assuming a class/instance attribute defining such things centrally</pre> 70 <p>In many applications such nuances are not particularly important, but consider the following URL:</p> 71 <pre>http://www.boddie.org.uk/python/WebStack-%E6%F8%E5.html</pre> 72 <p>Here, the URL includes non-ASCII characters which must be 73 interpreted somehow. In this case, the "URL encoded" character values 74 refer to ISO-8859-1 values and can be safely inspected as follows:</p> 75 <pre>path = trans.get_path("iso-8859-1")</pre> 76 <p>The above usage of UTF-8 will also work in this case, but only 77 because WebStack will use ISO-8859-1 as a "safe" default for character 78 values it does not understand.</p> 79 <h2>Query Strings</h2> 80 81 <p>Sometimes, a "query string" will be provided as part of a URL; for 82 example:</p> 83 <pre>http://www.boddie.org.uk/application?param1=value1</pre> 84 <p>The question mark character marks the beginning of the query string 85 which 86 contains encoded parameter information; such information and its 87 inspection 88 is discussed in <a href="parameters.html">"Request Parameters and 89 Uploads"</a>.</p> 90 <div class="WebStack"> 91 <h3>WebStack API - Getting Query Strings</h3> 92 <p>WebStack provides a method to get only the query string from the URL:</p> 93 <dl><dt><code>get_query_string</code></dt><dd>This method returns the part of the URL which contains parameter 94 information. Such information will be "URL encoded", meaning that 95 certain characters will have the form <code>%xx</code> where <code>xx</code> 96 is a two digit hexadecimal number referring to the byte value of the 97 unencoded character - see below for discussion of this. </dd></dl> 98 </div> 99 <p>Note that unlike the path access methods, <code>get_query_string</code> 100 does not accept an encoding as a parameter. Moreover, when retrieving a 101 path including a query string, the encoding is not used to interpret 102 "URL encoded" character values in the query string itself. Consider 103 this example URL:</p> 104 <pre>http://www.boddie.org.uk/application-%E6?var%F8=value%E5</pre> 105 <p>Upon requesting the path and the query string, certain differences should be noticeable:</p> 106 <pre>trans.get_path("iso-8859-1") # returns /application-æ?var%F8=value%E5<br />trans.get_path_without_query("iso-8859-1") # returns /application-æ<br />trans.get_query_string() # returns var%F8=value%E5</pre> 107 <p>One reason for this seemingly arbitrary distinction in treatment is 108 the way certain servers present path information to WebStack - often 109 the "URL encoded" information has been replaced by raw character values 110 which must then be converted to Unicode characters. In contrast, most 111 servers do not perform the same automatic conversion on the query 112 string.</p> 113 <p>In fact, it may become impossible to properly interpret the query 114 string if it is decoded prematurely; consider this example URL:</p> 115 <pre>http://www.boddie.org.uk/application?a=%26b</pre> 116 <p>If we were to just decode the query string and then extract the 117 parameters/fields, the result would be two empty parameters with the 118 names <code>a</code> and <code>b</code>, as opposed to the correct interpretation of the query string as describing a single parameter <code>a</code> with the value <code>&b</code>.</p> 119 <h3>Conclusion</h3> 120 <p>Regardless of all this, all inspection of path parameters should be done using the appropriate methods (see <a href="parameters.html">"Request Parameters and 121 Uploads"</a>), 122 and direct access to the query string should only occur in situations 123 of a specialised nature such as the building of URLs for output.</p> 124 <h2>More About Paths</h2> 125 <ul> 126 <li><a href="path-info.html">Paths To and Within Applications</a></li> 127 <li><a href="path-design.html">Path Design and Interpretation</a></li> 128 <li><a href="path-info-support.html">Path Info Support in Server 129 Environments</a></li> 130 </ul> 131 </body></html>