WebStack (annotate docs/paths.html in 5e29854fe10d)

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

paulb@436

2

<html xmlns="http://www.w3.org/1999/xhtml"><head>

  <title>URLs and Paths</title><meta name="generator" content="amaya 8.1a, see http://www.w3.org/Amaya/" />

paulb@436

5

  <link href="styles.css" rel="stylesheet" type="text/css" /></head>

paulb@327

6

<body>

paulb@327

7

<h1>URLs and Paths</h1>

paulb@357

8

<p>The URL at which your application shall appear is arguably the first

paulb@357

9

part

paulb@357

10

of the application's user interface that any user will see. Remember

paulb@357

11

that a user of your application does not have to be a real person; in

paulb@357

12

fact,

paulb@327

13

a user can be any of the following things:</p>

paulb@327

14

<ul>

paulb@327

15

  <li>A real person entering the URL into a browser's address bar.</li>

paulb@327

16

  <li>A real person linking to your application by writing the URL in a

paulb@357

17

separate Web page.</li>

paulb@357

18

  <li>A program which has the URL defined within it and which may

paulb@357

19

manipulate the URL to perform certain kinds of operations.</li>

paulb@327

20

</ul>

paulb@357

21

<p>Some application developers have a fairly rigid view of what kind of

paulb@357

22

information a URL should contain and how it should be structured. In

paulb@357

23

this guide, we shall look at a number of different approaches.</p>

paulb@327

24

<h2>Interpreting Path Information</h2>

paulb@357

25

<p>What the URL is supposed to do is to say where (on the Internet or

paulb@357

26

on an

paulb@357

27

intranet) your application resides and which resource or service is

paulb@357

28

being

paulb@327

29

accessed, and these look like this:</p>

paulb@327

30

<pre>http://www.boddie.org.uk/python/WebStack.html</pre>

paulb@357

31

<p>In an application the full URL, containing the address of the

paulb@357

32

machine on which it is running, is not always interesting. In the

paulb@357

33

WebStack API (and in other Web programming frameworks), we also talk

paulb@357

34

about "paths" - a path is&nbsp;just the part of the

paulb@357

35

URL which refers to the resource or service, ignoring the actual

paulb@357

36

Internet

paulb@357

37

address, and so the above example would have a path which looks like

paulb@357

38

this:</p>

paulb@327

39

<pre>/python/WebStack.html</pre>

paulb@327

40

<p>When writing a Web application, most of the time you just need to

paulb@357

41

concentrate on the path because the address doesn't usually tell you

paulb@357

42

anything

paulb@327

43

you don't already know. What you need to do is to interpret the path

paulb@357

44

specified in the request in order to work out which resource or service

paulb@357

45

the user is trying to access.</p>

paulb@327

46

<div class="WebStack">

paulb@327

47

<h3>WebStack API - Path Methods in Transaction Objects</h3>

paulb@357

48

<p>WebStack provides the following transaction methods for inspecting

paulb@357

49

path

paulb@327

50

information:</p>

paulb@327

51

<dl>

paulb@327

52

  <dt><code>get_path</code></dt>

paulb@357

53

  <dd>This gets the entire path of a resource including parameter

paulb@357

54

information (as described in <a href="parameters.html">"Request

paulb@436

55

Parameters and Uploads"</a>).<br />

paulb@453

56

An optional&nbsp;<code>encoding</code> parameter may be used to assist the process of converting the path to a Unicode object - see below.</dd>

paulb@327

57

  <dt><code>get_path_without_query</code></dt>

paulb@357

58

  <dd>This gets the entire path of a resource but without any parameter

paulb@453

59

information.<br />

An optional&nbsp;<code>encoding</code> parameter may be used to assist the process of converting the path to a Unicode object - see below.</dd>

paulb@327

62

</dl>

paulb@327

63

</div>

paulb@453

64

<p>To obtain the above path using the WebStack API, we can write the following code:</p>

paulb@453

65

<pre>path = trans.get_path()</pre>

paulb@453

66

<p>Really, however, we should explicitly state the character encoding of the path. Unfortunately, as noted in <a href="encodings.html">"Character Encodings"</a>,

paulb@453

67

some guesswork is required, but if we have decided to use UTF-8 as the

paulb@453

68

encoding of our output, it is reasonable to specify UTF-8 here as well:</p>

paulb@453

69

<pre>path = trans.get_path("utf-8")<br />path = trans.get_path(self.encoding) # assuming a class/instance attribute defining such things centrally</pre>

paulb@453

70

<p>In many applications such nuances are not particularly important, but consider the following URL:</p>

paulb@453

71

<pre>http://www.boddie.org.uk/python/WebStack-%E6%F8%E5.html</pre>

paulb@453

72

<p>Here, the URL includes non-ASCII characters which must be

paulb@453

73

interpreted somehow. In this case, the "URL encoded" character values

paulb@453

74

refer to ISO-8859-1 values and can be safely inspected as follows:</p>

paulb@453

75

<pre>path = trans.get_path("iso-8859-1")</pre>

paulb@453

76

<p>The above usage of UTF-8 will also work in this case, but only

paulb@453

77

because WebStack will use ISO-8859-1 as a "safe" default for character

paulb@453

78

values it does not understand.</p>

paulb@357

79

<h2>Query Strings</h2>

<p>Sometimes, a "query string" will be provided as part of a URL; for

paulb@335

82

example:</p>

paulb@335

83

<pre>http://www.boddie.org.uk/application?param1=value1</pre>

paulb@357

84

<p>The question mark character marks the beginning of the query string

paulb@357

85

which

paulb@357

86

contains encoded parameter information; such information and its

paulb@357

87

inspection

paulb@335

88

is discussed in <a href="parameters.html">"Request Parameters and

paulb@335

89

Uploads"</a>.</p>

paulb@453

90

<div class="WebStack">

paulb@453

91

<h3>WebStack API - Getting Query Strings</h3>

paulb@453

92

<p>WebStack provides a&nbsp;method to get only the query string from the URL:</p>

paulb@453

93

<dl><dt><code>get_query_string</code></dt><dd>This method returns the part of the URL which contains parameter

paulb@453

94

information. Such information will be "URL encoded", meaning that

paulb@453

95

certain characters will have the form&nbsp;<code>%xx</code> where&nbsp;<code>xx</code>

paulb@453

96

is a two digit hexadecimal number referring to the byte value of the

paulb@453

97

unencoded character - see below for discussion of this. </dd></dl>

paulb@453

98

</div>

paulb@453

99

<p>Note that unlike the path access methods,&nbsp;<code>get_query_string</code>

paulb@453

100

does not accept an encoding as a parameter. Moreover, when retrieving a

paulb@453

101

path including a query string, the encoding is not used to interpret

paulb@453

102

"URL encoded" character values in the query string itself. Consider

paulb@453

103

this example URL:</p>

paulb@453

104

<pre>http://www.boddie.org.uk/application-%E6?var%F8=value%E5</pre>

paulb@453

105

<p>Upon requesting the path and the query string, certain differences should be noticeable:</p>

paulb@453

106

<pre>trans.get_path("iso-8859-1")               # returns /application-&aelig;?var%F8=value%E5<br />trans.get_path_without_query("iso-8859-1") # returns /application-&aelig;<br />trans.get_query_string()                   # returns var%F8=value%E5</pre>

paulb@453

107

<p>One reason for this seemingly arbitrary distinction in treatment is

paulb@453

108

the way certain servers present path information to WebStack - often

paulb@453

109

the "URL encoded" information has been replaced by raw character values

paulb@453

110

which must then be converted to Unicode characters. In contrast, most

paulb@453

111

servers do not perform the same automatic conversion on the query

paulb@453

112

string.</p>

paulb@453

113

<p>In fact, it may become impossible to properly interpret the query

paulb@453

114

string if it is decoded prematurely; consider this example URL:</p>

paulb@453

115

<pre>http://www.boddie.org.uk/application?a=%26b</pre>

paulb@453

116

<p>If we were to just decode the query string and then extract the

paulb@453

117

parameters/fields, the result would be two empty parameters with the

paulb@453

118

names&nbsp;<code>a</code> and <code>b</code>, as opposed to the correct interpretation of the query string as describing a single parameter&nbsp;<code>a</code> with the value <code>&amp;b</code>.</p>

paulb@453

119

<h3>Conclusion</h3>

paulb@453

120

<p>Regardless of all this, all inspection of path parameters should be done using the appropriate methods (see  <a href="parameters.html">"Request Parameters and

paulb@453

121

Uploads"</a>),

paulb@453

122

and direct access to the query string should only occur in situations

paulb@453

123

of a specialised nature such as the building of URLs for output.</p>

paulb@357

124

<h2>More About Paths</h2>

paulb@357

125

<ul>

paulb@357

126

  <li><a href="path-info.html">Paths To and Within Applications</a></li>

paulb@357

127

  <li><a href="path-design.html">Path Design and Interpretation</a></li>

paulb@357

128

  <li><a href="path-info-support.html">Path Info Support in Server

paulb@357

129

Environments</a></li>

paulb@357

130

</ul>

paulb@436

131

</body></html>

paulb@357	1	<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
paulb@436	2	<html xmlns="http://www.w3.org/1999/xhtml"><head>
paulb@436	3
paulb@436	4	<title>URLs and Paths</title><meta name="generator" content="amaya 8.1a, see http://www.w3.org/Amaya/" />
paulb@436	5	<link href="styles.css" rel="stylesheet" type="text/css" /></head>
paulb@327	6	<body>
paulb@327	7	<h1>URLs and Paths</h1>
paulb@357	8	<p>The URL at which your application shall appear is arguably the first
paulb@357	9	part
paulb@357	10	of the application's user interface that any user will see. Remember
paulb@357	11	that a user of your application does not have to be a real person; in
paulb@357	12	fact,
paulb@327	13	a user can be any of the following things:</p>
paulb@327	14	<ul>
paulb@327	15	<li>A real person entering the URL into a browser's address bar.</li>
paulb@327	16	<li>A real person linking to your application by writing the URL in a
paulb@357	17	separate Web page.</li>
paulb@357	18	<li>A program which has the URL defined within it and which may
paulb@357	19	manipulate the URL to perform certain kinds of operations.</li>
paulb@327	20	</ul>
paulb@357	21	<p>Some application developers have a fairly rigid view of what kind of
paulb@357	22	information a URL should contain and how it should be structured. In
paulb@357	23	this guide, we shall look at a number of different approaches.</p>
paulb@327	24	<h2>Interpreting Path Information</h2>
paulb@357	25	<p>What the URL is supposed to do is to say where (on the Internet or
paulb@357	26	on an
paulb@357	27	intranet) your application resides and which resource or service is
paulb@357	28	being
paulb@327	29	accessed, and these look like this:</p>
paulb@327	30	<pre>http://www.boddie.org.uk/python/WebStack.html</pre>
paulb@357	31	<p>In an application the full URL, containing the address of the
paulb@357	32	machine on which it is running, is not always interesting. In the
paulb@357	33	WebStack API (and in other Web programming frameworks), we also talk
paulb@357	34	about "paths" - a path is just the part of the
paulb@357	35	URL which refers to the resource or service, ignoring the actual
paulb@357	36	Internet
paulb@357	37	address, and so the above example would have a path which looks like
paulb@357	38	this:</p>
paulb@327	39	<pre>/python/WebStack.html</pre>
paulb@327	40	<p>When writing a Web application, most of the time you just need to
paulb@357	41	concentrate on the path because the address doesn't usually tell you
paulb@357	42	anything
paulb@327	43	you don't already know. What you need to do is to interpret the path
paulb@357	44	specified in the request in order to work out which resource or service
paulb@357	45	the user is trying to access.</p>
paulb@327	46	<div class="WebStack">
paulb@327	47	<h3>WebStack API - Path Methods in Transaction Objects</h3>
paulb@357	48	<p>WebStack provides the following transaction methods for inspecting
paulb@357	49	path
paulb@327	50	information:</p>
paulb@327	51	<dl>
paulb@327	52	<dt><code>get_path</code></dt>
paulb@357	53	<dd>This gets the entire path of a resource including parameter
paulb@357	54	information (as described in <a href="parameters.html">"Request
paulb@436	55	Parameters and Uploads"</a>).<br />
paulb@453	56	An optional <code>encoding</code> parameter may be used to assist the process of converting the path to a Unicode object - see below.</dd>
paulb@327	57	<dt><code>get_path_without_query</code></dt>
paulb@357	58	<dd>This gets the entire path of a resource but without any parameter
paulb@453	59	information.<br />
paulb@453	60
paulb@453	61	An optional <code>encoding</code> parameter may be used to assist the process of converting the path to a Unicode object - see below.</dd>
paulb@327	62	</dl>
paulb@327	63	</div>
paulb@453	64	<p>To obtain the above path using the WebStack API, we can write the following code:</p>
paulb@453	65	<pre>path = trans.get_path()</pre>
paulb@453	66	<p>Really, however, we should explicitly state the character encoding of the path. Unfortunately, as noted in <a href="encodings.html">"Character Encodings"</a>,
paulb@453	67	some guesswork is required, but if we have decided to use UTF-8 as the
paulb@453	68	encoding of our output, it is reasonable to specify UTF-8 here as well:</p>
paulb@453	69	<pre>path = trans.get_path("utf-8")<br />path = trans.get_path(self.encoding) # assuming a class/instance attribute defining such things centrally</pre>
paulb@453	70	<p>In many applications such nuances are not particularly important, but consider the following URL:</p>
paulb@453	71	<pre>http://www.boddie.org.uk/python/WebStack-%E6%F8%E5.html</pre>
paulb@453	72	<p>Here, the URL includes non-ASCII characters which must be
paulb@453	73	interpreted somehow. In this case, the "URL encoded" character values
paulb@453	74	refer to ISO-8859-1 values and can be safely inspected as follows:</p>
paulb@453	75	<pre>path = trans.get_path("iso-8859-1")</pre>
paulb@453	76	<p>The above usage of UTF-8 will also work in this case, but only
paulb@453	77	because WebStack will use ISO-8859-1 as a "safe" default for character
paulb@453	78	values it does not understand.</p>
paulb@357	79	<h2>Query Strings</h2>
paulb@453	80
paulb@335	81	<p>Sometimes, a "query string" will be provided as part of a URL; for
paulb@335	82	example:</p>
paulb@335	83	<pre>http://www.boddie.org.uk/application?param1=value1</pre>
paulb@357	84	<p>The question mark character marks the beginning of the query string
paulb@357	85	which
paulb@357	86	contains encoded parameter information; such information and its
paulb@357	87	inspection
paulb@335	88	is discussed in <a href="parameters.html">"Request Parameters and
paulb@335	89	Uploads"</a>.</p>
paulb@453	90	<div class="WebStack">
paulb@453	91	<h3>WebStack API - Getting Query Strings</h3>
paulb@453	92	<p>WebStack provides a method to get only the query string from the URL:</p>
paulb@453	93	<dl><dt><code>get_query_string</code></dt><dd>This method returns the part of the URL which contains parameter
paulb@453	94	information. Such information will be "URL encoded", meaning that
paulb@453	95	certain characters will have the form <code>%xx</code> where <code>xx</code>
paulb@453	96	is a two digit hexadecimal number referring to the byte value of the
paulb@453	97	unencoded character - see below for discussion of this. </dd></dl>
paulb@453	98	</div>
paulb@453	99	<p>Note that unlike the path access methods, <code>get_query_string</code>
paulb@453	100	does not accept an encoding as a parameter. Moreover, when retrieving a
paulb@453	101	path including a query string, the encoding is not used to interpret
paulb@453	102	"URL encoded" character values in the query string itself. Consider
paulb@453	103	this example URL:</p>
paulb@453	104	<pre>http://www.boddie.org.uk/application-%E6?var%F8=value%E5</pre>
paulb@453	105	<p>Upon requesting the path and the query string, certain differences should be noticeable:</p>
paulb@453	106	<pre>trans.get_path("iso-8859-1") # returns /application-æ?var%F8=value%E5<br />trans.get_path_without_query("iso-8859-1") # returns /application-æ<br />trans.get_query_string() # returns var%F8=value%E5</pre>
paulb@453	107	<p>One reason for this seemingly arbitrary distinction in treatment is
paulb@453	108	the way certain servers present path information to WebStack - often
paulb@453	109	the "URL encoded" information has been replaced by raw character values
paulb@453	110	which must then be converted to Unicode characters. In contrast, most
paulb@453	111	servers do not perform the same automatic conversion on the query
paulb@453	112	string.</p>
paulb@453	113	<p>In fact, it may become impossible to properly interpret the query
paulb@453	114	string if it is decoded prematurely; consider this example URL:</p>
paulb@453	115	<pre>http://www.boddie.org.uk/application?a=%26b</pre>
paulb@453	116	<p>If we were to just decode the query string and then extract the
paulb@453	117	parameters/fields, the result would be two empty parameters with the
paulb@453	118	names <code>a</code> and <code>b</code>, as opposed to the correct interpretation of the query string as describing a single parameter <code>a</code> with the value <code>&b</code>.</p>
paulb@453	119	<h3>Conclusion</h3>
paulb@453	120	<p>Regardless of all this, all inspection of path parameters should be done using the appropriate methods (see <a href="parameters.html">"Request Parameters and
paulb@453	121	Uploads"</a>),
paulb@453	122	and direct access to the query string should only occur in situations
paulb@453	123	of a specialised nature such as the building of URLs for output.</p>
paulb@357	124	<h2>More About Paths</h2>
paulb@357	125	<ul>
paulb@357	126	<li><a href="path-info.html">Paths To and Within Applications</a></li>
paulb@357	127	<li><a href="path-design.html">Path Design and Interpretation</a></li>
paulb@357	128	<li><a href="path-info-support.html">Path Info Support in Server
paulb@357	129	Environments</a></li>
paulb@357	130	</ul>
paulb@436	131	</body></html>

WebStack

Annotated docs/paths.html