WebStack

Annotated docs/paths.html

503:5e29854fe10d
2005-11-15 paulb [project @ 2005-11-15 15:46:01 by paulb] Added has_key method.
paulb@357 1
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
paulb@436 2
<html xmlns="http://www.w3.org/1999/xhtml"><head>
paulb@436 3
  
paulb@436 4
  <title>URLs and Paths</title><meta name="generator" content="amaya 8.1a, see http://www.w3.org/Amaya/" />
paulb@436 5
  <link href="styles.css" rel="stylesheet" type="text/css" /></head>
paulb@327 6
<body>
paulb@327 7
<h1>URLs and Paths</h1>
paulb@357 8
<p>The URL at which your application shall appear is arguably the first
paulb@357 9
part
paulb@357 10
of the application's user interface that any user will see. Remember
paulb@357 11
that a user of your application does not have to be a real person; in
paulb@357 12
fact,
paulb@327 13
a user can be any of the following things:</p>
paulb@327 14
<ul>
paulb@327 15
  <li>A real person entering the URL into a browser's address bar.</li>
paulb@327 16
  <li>A real person linking to your application by writing the URL in a
paulb@357 17
separate Web page.</li>
paulb@357 18
  <li>A program which has the URL defined within it and which may
paulb@357 19
manipulate the URL to perform certain kinds of operations.</li>
paulb@327 20
</ul>
paulb@357 21
<p>Some application developers have a fairly rigid view of what kind of
paulb@357 22
information a URL should contain and how it should be structured. In
paulb@357 23
this guide, we shall look at a number of different approaches.</p>
paulb@327 24
<h2>Interpreting Path Information</h2>
paulb@357 25
<p>What the URL is supposed to do is to say where (on the Internet or
paulb@357 26
on an
paulb@357 27
intranet) your application resides and which resource or service is
paulb@357 28
being
paulb@327 29
accessed, and these look like this:</p>
paulb@327 30
<pre>http://www.boddie.org.uk/python/WebStack.html</pre>
paulb@357 31
<p>In an application the full URL, containing the address of the
paulb@357 32
machine on which it is running, is not always interesting. In the
paulb@357 33
WebStack API (and in other Web programming frameworks), we also talk
paulb@357 34
about "paths" - a path is&nbsp;just the part of the
paulb@357 35
URL which refers to the resource or service, ignoring the actual
paulb@357 36
Internet
paulb@357 37
address, and so the above example would have a path which looks like
paulb@357 38
this:</p>
paulb@327 39
<pre>/python/WebStack.html</pre>
paulb@327 40
<p>When writing a Web application, most of the time you just need to
paulb@357 41
concentrate on the path because the address doesn't usually tell you
paulb@357 42
anything
paulb@327 43
you don't already know. What you need to do is to interpret the path
paulb@357 44
specified in the request in order to work out which resource or service
paulb@357 45
the user is trying to access.</p>
paulb@327 46
<div class="WebStack">
paulb@327 47
<h3>WebStack API - Path Methods in Transaction Objects</h3>
paulb@357 48
<p>WebStack provides the following transaction methods for inspecting
paulb@357 49
path
paulb@327 50
information:</p>
paulb@327 51
<dl>
paulb@327 52
  <dt><code>get_path</code></dt>
paulb@357 53
  <dd>This gets the entire path of a resource including parameter
paulb@357 54
information (as described in <a href="parameters.html">"Request
paulb@436 55
Parameters and Uploads"</a>).<br />
paulb@453 56
An optional&nbsp;<code>encoding</code> parameter may be used to assist the process of converting the path to a Unicode object - see below.</dd>
paulb@327 57
  <dt><code>get_path_without_query</code></dt>
paulb@357 58
  <dd>This gets the entire path of a resource but without any parameter
paulb@453 59
information.<br />
paulb@453 60
paulb@453 61
An optional&nbsp;<code>encoding</code> parameter may be used to assist the process of converting the path to a Unicode object - see below.</dd>
paulb@327 62
</dl>
paulb@327 63
</div>
paulb@453 64
<p>To obtain the above path using the WebStack API, we can write the following code:</p>
paulb@453 65
<pre>path = trans.get_path()</pre>
paulb@453 66
<p>Really, however, we should explicitly state the character encoding of the path. Unfortunately, as noted in <a href="encodings.html">"Character Encodings"</a>,
paulb@453 67
some guesswork is required, but if we have decided to use UTF-8 as the
paulb@453 68
encoding of our output, it is reasonable to specify UTF-8 here as well:</p>
paulb@453 69
<pre>path = trans.get_path("utf-8")<br />path = trans.get_path(self.encoding) # assuming a class/instance attribute defining such things centrally</pre>
paulb@453 70
<p>In many applications such nuances are not particularly important, but consider the following URL:</p>
paulb@453 71
<pre>http://www.boddie.org.uk/python/WebStack-%E6%F8%E5.html</pre>
paulb@453 72
<p>Here, the URL includes non-ASCII characters which must be
paulb@453 73
interpreted somehow. In this case, the "URL encoded" character values
paulb@453 74
refer to ISO-8859-1 values and can be safely inspected as follows:</p>
paulb@453 75
<pre>path = trans.get_path("iso-8859-1")</pre>
paulb@453 76
<p>The above usage of UTF-8 will also work in this case, but only
paulb@453 77
because WebStack will use ISO-8859-1 as a "safe" default for character
paulb@453 78
values it does not understand.</p>
paulb@357 79
<h2>Query Strings</h2>
paulb@453 80
paulb@335 81
<p>Sometimes, a "query string" will be provided as part of a URL; for
paulb@335 82
example:</p>
paulb@335 83
<pre>http://www.boddie.org.uk/application?param1=value1</pre>
paulb@357 84
<p>The question mark character marks the beginning of the query string
paulb@357 85
which
paulb@357 86
contains encoded parameter information; such information and its
paulb@357 87
inspection
paulb@335 88
is discussed in <a href="parameters.html">"Request Parameters and
paulb@335 89
Uploads"</a>.</p>
paulb@453 90
<div class="WebStack">
paulb@453 91
<h3>WebStack API - Getting Query Strings</h3>
paulb@453 92
<p>WebStack provides a&nbsp;method to get only the query string from the URL:</p>
paulb@453 93
<dl><dt><code>get_query_string</code></dt><dd>This method returns the part of the URL which contains parameter
paulb@453 94
information. Such information will be "URL encoded", meaning that
paulb@453 95
certain characters will have the form&nbsp;<code>%xx</code> where&nbsp;<code>xx</code>
paulb@453 96
is a two digit hexadecimal number referring to the byte value of the
paulb@453 97
unencoded character - see below for discussion of this. </dd></dl>
paulb@453 98
</div>
paulb@453 99
<p>Note that unlike the path access methods,&nbsp;<code>get_query_string</code>
paulb@453 100
does not accept an encoding as a parameter. Moreover, when retrieving a
paulb@453 101
path including a query string, the encoding is not used to interpret
paulb@453 102
"URL encoded" character values in the query string itself. Consider
paulb@453 103
this example URL:</p>
paulb@453 104
<pre>http://www.boddie.org.uk/application-%E6?var%F8=value%E5</pre>
paulb@453 105
<p>Upon requesting the path and the query string, certain differences should be noticeable:</p>
paulb@453 106
<pre>trans.get_path("iso-8859-1")               # returns /application-&aelig;?var%F8=value%E5<br />trans.get_path_without_query("iso-8859-1") # returns /application-&aelig;<br />trans.get_query_string()                   # returns var%F8=value%E5</pre>
paulb@453 107
<p>One reason for this seemingly arbitrary distinction in treatment is
paulb@453 108
the way certain servers present path information to WebStack - often
paulb@453 109
the "URL encoded" information has been replaced by raw character values
paulb@453 110
which must then be converted to Unicode characters. In contrast, most
paulb@453 111
servers do not perform the same automatic conversion on the query
paulb@453 112
string.</p>
paulb@453 113
<p>In fact, it may become impossible to properly interpret the query
paulb@453 114
string if it is decoded prematurely; consider this example URL:</p>
paulb@453 115
<pre>http://www.boddie.org.uk/application?a=%26b</pre>
paulb@453 116
<p>If we were to just decode the query string and then extract the
paulb@453 117
parameters/fields, the result would be two empty parameters with the
paulb@453 118
names&nbsp;<code>a</code> and <code>b</code>, as opposed to the correct interpretation of the query string as describing a single parameter&nbsp;<code>a</code> with the value <code>&amp;b</code>.</p>
paulb@453 119
<h3>Conclusion</h3>
paulb@453 120
<p>Regardless of all this, all inspection of path parameters should be done using the appropriate methods (see  <a href="parameters.html">"Request Parameters and
paulb@453 121
Uploads"</a>),
paulb@453 122
and direct access to the query string should only occur in situations
paulb@453 123
of a specialised nature such as the building of URLs for output.</p>
paulb@357 124
<h2>More About Paths</h2>
paulb@357 125
<ul>
paulb@357 126
  <li><a href="path-info.html">Paths To and Within Applications</a></li>
paulb@357 127
  <li><a href="path-design.html">Path Design and Interpretation</a></li>
paulb@357 128
  <li><a href="path-info-support.html">Path Info Support in Server
paulb@357 129
Environments</a></li>
paulb@357 130
</ul>
paulb@436 131
</body></html>