ConfluenceConverter (annotate README.txt in debb4b2401fb)

Introduction

paul@20

2

------------

ConfluenceConverter is a distribution of software that converts exported data

paul@100

5

from Confluence wiki instances, provided in the form of an XML file, to a

paul@100

6

collection of wiki pages and resources that can be imported into a MoinMoin

paul@20

7

instance as a page package.

Migration Activities

paul@127

10

--------------------

The following activities are involved in a migration from Confluence to

paul@138

13

MoinMoin. First, the activities that can be performed from any location:

  * Export of Confluence content

paul@127

16

  * Conversion of Confluence content to MoinMoin content

paul@127

17

  * Confluence page identifier extraction and mapping to MoinMoin identifiers

paul@127

18

  * Acquisition of Confluence user profile details

Then, the activities that are performed on the server:

  * Installation of MoinMoin

paul@127

23

  * Initialisation of a MoinMoin wiki instance

paul@127

24

  * Import of MoinMoin content into the new wiki instance

paul@127

25

  * Installation of MoinMoin extensions

paul@127

26

  * Initialisation of user profiles in MoinMoin

paul@127

27

  * Installation of scripts and identifier mappings

paul@127

28

  * Filesystem permission adjustments

Prerequisites

paul@20

31

-------------

ConfluenceConverter requires a library called xmlread that can be found at the

paul@20

34

following location:

http://hgweb.boddie.org.uk/xmlread

The xmlread.py file from the xmlread distribution can be copied into the

paul@20

39

ConfluenceConverter directory.

ConfluenceConverter also requires access to the MoinMoin.wikiutil module found

paul@40

42

in the MoinMoin distribution.

The moinsetup program is highly recommended for the installation of page

paul@100

45

packages and the management of MoinMoin wiki instances:

http://moinmo.in/ScriptMarket/moinsetup

If moinsetup is not being used, the page package installer documentation

paul@20

50

should be consulted:

http://moinmo.in/HelpOnPackageInstaller

To read Confluence user profiles on live Confluence sites using the

paul@106

55

get_profiles.py program, the libxml2dom library is required:

http://hgweb.boddie.org.uk/libxml2dom

MoinMoin Prerequisites

paul@100

60

----------------------

The page package installer does not preserve user information or the last

paul@123

63

modified time when installing page revisions. This can be modified by applying

paul@123

64

a patch to MoinMoin as follows while at the top level of the MoinMoin source

paul@123

65

distribution:

patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-packages.diff

Here, CCDIR is the path to the top level of this source distribution where

paul@100

70

this README.txt file is found.

When importing users, MoinMoin may be unable to handle user information

paul@133

73

containing non-ASCII characters. Another patch to solve such problems can be

paul@133

74

applied to MoinMoin as follows:

patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-user.diff

Wiki Content Prerequisites

paul@49

79

--------------------------

For the output of the converter, the following MoinMoin extensions are

paul@49

82

required:

http://moinmo.in/ParserMarket/ImprovedTableParser

paul@133

85

http://moinmo.in/ActionMarket/SubpageComments

paul@79

86

http://moinmo.in/MacroMarket/Color2

A common dependency of various extensions is provided by MoinSupport:

http://hgweb.boddie.org.uk/MoinSupport

Additional Software

paul@118

93

-------------------

PDF export support requires the ExportPDF action:

http://moinmo.in/ActionMarket/ExportPDF

This in turn requires Apache FOP for PDF production using XSL-FO:

http://xmlgraphics.apache.org/fop/

(On Debian systems, the fop package provides this tool.)

To produce XSL-FO from DocBook output, xsltproc is required from the libxslt

paul@118

106

distribution:

http://xmlsoft.org/XSLT/

(On Debian systems, the xsltproc package provides this tool.)

And DocBook output requires the DocBook resources to be installed, described

paul@118

113

in the following guide:

http://www.sagehill.net/docbookxsl/ToolsSetup.html

(On Debian systems, the docbook-xsl package provides these resources.)

Quick Start

paul@20

120

-----------

(!) The acquisition of Confluence wiki content and its conversion can be

paul@138

123

performed from any location, not necessarily on the server.

To obtain XML export archives from a Confluence wiki instance, the

paul@136

126

exportspacexml.action resource is visited and the "Export" button selected.

paul@136

127

For example, for the Mailman Wiki, the appropriate resource (with the COM

paul@136

128

namespace selected) is as follows:

http://wiki.list.org/spaces/exportspacexml.action?key=COM

For your own instance, adjust the above URL accordingly. Alternatively, you

paul@136

133

can find your way to the export page by selecting a namespace, then choosing

paul@136

134

"Advanced" from the "Browse" menu, and then choosing "XML Export" from the

paul@136

135

"Export" sidebar.

Given an XML export archive file for a Confluence wiki instance (in the

paul@100

138

example below, the file is called COM-123456-789012.zip), the following

paul@100

139

command can be used to prepare a page package for MoinMoin:

python convert.py COM-123456-789012.zip COM

In addition to the filename, a workspace name is required. Confluence appears

paul@20

144

to require a workspace as a container for collections of pages, but this also

paul@100

145

permits us to selectively import parts of a wiki into MoinMoin. If attachments

paul@100

146

were included in the export from Confluence, these will be imported into the

paul@100

147

page package.

The result of the above command will be a directory having the same name as

paul@20

150

the chosen workspace, together with a zip archive for that directory's

paul@20

151

contents. Thus, the above command would produce a directory called COM and an

paul@20

152

archive called COM.zip.

(!) The following step is performed on the server.

To import the result (although you may wish to process other namespaces

paul@136

157

first), use moinsetup as follows:

python moinsetup.py -m install_page_package COM.zip

This requires a suitable moinsetup.cfg file in the working directory.

Importing Many Workspaces/Namespaces

paul@136

164

------------------------------------

Where more than one namespace is to be imported, the page packages should be

paul@123

167

merged so that the resulting history information is ordered correctly.

(!) This process can be performed from any location and the result uploaded to

paul@138

170

the server for eventual import.

To merge packages, use a command of the following form:

python merge.py OUT COM.zip DEV.zip DOC.zip SEC.zip

A directory called OUT and a page package called OUT.zip will be produced. The

paul@123

177

latter can then be imported into MoinMoin as described above.

Mappings from Identifiers to Pages

paul@100

180

----------------------------------

Confluence uses numbers to label content revisions, and links to Confluence

paul@100

183

sites sometimes use these numbers instead of a readable page name. MoinMoin,

paul@100

184

meanwhile, only uses page names and has no external numeric identifier scheme.

paul@100

185

Consequently, it is necessary to produce a mapping from Confluence identifiers

paul@100

186

to MoinMoin page names. In addition to numeric identifiers, Confluence also

paul@100

187

provides "tiny URLs" which are an alphanumeric encoding of the numeric

paul@100

188

identifiers.

(!) This process can be performed with the converted content from any

paul@138

191

location, with the generated files uploaded to the server for eventual

paul@138

192

deployment.

To generate mappings for the Confluence content, use the mappings script as

paul@100

195

follows:

tools/mappings.sh COM

Here, COM is a directory name containing converted Confluence content,

paul@100

200

corresponding to a space name in the original Confluence wiki. More than one

paul@138

201

space name can be used to generate a complete mapping for a site. For example:

tools/mappings.sh COM DEV DOC SEC

The following files are generated:

  * mapping-id-to-page.txt

paul@100

208

  * mapping-tiny-to-id.txt

paul@100

209

  * mapping-tiny-to-page.txt

The most useful of these is the first as it includes all the necessary

paul@100

212

information provided by the arbitrary mapping from identifiers to page names.

paul@100

213

The second mapping merely converts the "tiny URLs" to identifiers, which can

paul@100

214

be done by applying an algorithm without any external knowledge of the wiki

paul@100

215

structure. The third mapping is provided as a convenience, combining the "tiny

paul@100

216

URL" conversion and the arbitrary mapping to page names.

Translating Requests Using the Mappings

paul@102

219

---------------------------------------

Where Web server facilities such as RewriteMap are available for use, the

paul@100

222

first and third mapping files can be used directly. See the Apache

paul@100

223

documentation for details of RewriteMap:

http://httpd.apache.org/docs/2.4/rewrite/rewritemap.html

Otherwise, it is more likely that the first file is used by a program that can

paul@100

228

perform a redirect to the appropriate wiki page, and the "tiny URL" decoding

paul@100

229

is also done by this program when deployed in a suitable location to receive

paul@102

230

such requests. To support this, the following resources are provided:

  * scripts/redirect.py

paul@102

233

  * config/mailmanwiki-redirect

The latter configuration file should be combined with the Web server

paul@102

236

configuration file such that the appropriate aliases are able to capture

paul@102

237

requests and invoke the redirect.py script before the main wiki aliases are

paul@102

238

consulted. The script itself should be placed in a suitable filesystem

paul@102

239

location, and the mapping-id-to-page.txt file should be placed alongside it,

paul@102

240

or it should be placed in a different location and the MAPPING_ID_TO_PAGE

paul@102

241

variable changed in the script to refer to this different location.

Supporting Confluence Action URLs

paul@113

244

---------------------------------

Besides the "viewpage" action mapping identifiers to pages (covered by the

paul@113

247

mapping described above), some other action URLs may be used in wiki content

paul@113

248

and must either be translated or supported using redirects. Since external

paul@113

249

sites may also employ such actions, a redirect strategy perhaps makes more

paul@113

250

sense. To support this, the following resources are involved:

  * scripts/dashboard.py

paul@118

253

  * scripts/redirect.py

paul@113

254

  * scripts/search.py

paul@113

255

  * config/mailmanwiki-redirect

The latter configuration file is also involved in identifier-to-page mapping,

paul@118

258

but in this case it causes requests to the "dashboard", "doexportpage" and

paul@118

259

"dosearchsite" actions to be directed to the dashboard.py, redirect.py and

paul@118

260

search.py scripts respectively.

The dashboard.py script merely redirects requests to the root of the site,

paul@114

263

thus assuming that the front page is configured to show dashboard-like

paul@114

264

information.

The redirect.py script, apart from supporting identifier-to-page redirects,

paul@118

267

also supports PDF page exports since the "doexportpage" action uses

paul@134

268

identifiers to indicate which page is to be exported. In an environment that

paul@134

269

uses .htaccess and mod_rewrite, the redirect.py script should also be deployed

paul@135

270

under separate names (such as export.py and exportpdf.py) so that it can

paul@135

271

discover whether it should be exporting a page instead of just showing it.

The search.py script redirects search requests in a suitable form to the

paul@114

274

MoinMoin "fullsearch" action.

Identifying and Migrating Users

paul@104

277

-------------------------------

Confluence export archives do not contain user profile information, but page

paul@104

280

versions are marked with user identifiers. Therefore, a list of user

paul@104

281

identifiers can be obtained by running a script extracting these identifiers.

paul@104

282

The following command writes to standard output the users involved with

paul@104

283

editing the wiki in four different spaces (exported to four directories):

tools/users.sh COM DEV DOC SEC

This output can be edited and then passed to a program which fetches other

paul@105

288

profile details as follows:

tools/users.sh COM DEV DOC SEC > users.txt

After editing...

  cat users.txt \

paul@138

295

| tools/get_profiles.py http://wiki.list.org/ \

paul@138

296

> profiles.txt

If no users are to be removed in migration, the following command could be

paul@104

299

issued:

  tools/users.sh COM DEV DOC SEC \

paul@138

302

| tools/get_profiles.py http://wiki.list.org/ \

paul@138

303

> profiles.txt

The get_profiles.py program needs to be told the URL of the original

paul@105

306

Confluence site. Note that it accesses the site at a default rate of around

paul@105

307

one request per second; a different delay between requests can be specified

paul@105

308

using an additional argument.

(!) The above steps can be performed from any location, but the command

paul@138

311

pipelines below need to be run on the server due to the use of a program that

paul@138

312

updates the deployed wiki.

The output of the get_profiles.py program can be passed to another program

paul@105

315

which adds users to MoinMoin, and so the following commands can be used:

  cat profiles.txt \

paul@138

318

| tools/addusers.py wiki

Alternatively, the users can be converted to profiles and immediately added

paul@138

321

without creating a profiles file:

  cat users.txt \

paul@105

324

| tools/get_profiles.py http://wiki.list.org/ \

paul@105

325

| tools/addusers.py wiki

Or just using one single command without inspecting the users or profiles at

paul@138

328

all:

  tools/users.sh COM DEV DOC SEC \

paul@105

331

| tools/get_profiles.py http://wiki.list.org/ \

paul@105

332

| tools/addusers.py wiki

The addusers.py program needs to be told the directory containing the wiki

paul@105

335

configuration.

Output Structure

paul@30

338

----------------

The structure of a converted workspace is a directory hierarchy containing the

paul@30

341

following directories:

  * pages     (a collection of directories defining each page or content item,

paul@30

344

               corresponding to Page, Comment and BlogPost elements in the XML

paul@30

345

               exported from Confluence)

  * versions  (a collection of files, each defining a revision or version of

paul@30

348

               some content, corresponding to BodyContent elements in the XML

paul@30

349

               exported from Confluence)

Each page directory contains the following things:

  * pagetype    (either "Page", "Comment" or "BlogPost")

  * manifest    (a list of version entries in a format similar to the MoinMoin

paul@40

356

                 page package manifest format)

  * attachments (a list of attachment version entries in a format similar to

paul@40

359

                 the MoinMoin page package manifest format)

  * pagetitle   (an optional page title imposed on the page by another content

paul@40

362

                 item)

  * children    (a list of child page names defined for the page)

  * comments    (a list of creation date plus comment page identifier pairs)

In the output structure, content items such as comments are represented as

paul@30

369

pages and each reference a content version. Since comments will ultimately be

paul@30

370

represented as subpages of some parent page, they will have a pagetitle file

paul@30

371

in their directory with an appropriate subpage name written according to the

paul@30

372

parent page's name and comment details.

Troubleshooting

paul@20

375

---------------

The page package import activity in particular can be a source of problems.

paul@20

378

Generally, any error occurring when attempting to import a package is likely

paul@20

379

to be due to insufficient privileges when writing to the pages directory of a

paul@100

380

wiki or to its edit-log file.

The moinsetup software can generate scripts that set the ownership of wiki

paul@20

383

files or apply ACLs (access control lists) to those files in order to make

paul@100

384

access to wiki data more convenient. Where the ownership of the files must be

paul@20

385

set (to www-data or nobody), the import step can be run as that user given

paul@20

386

sufficient privileges. However, the easiest solution is to apply ACLs, thus

paul@100

387

allowing the user who created the wiki to retain write access to it.

Contact, Copyright and Licence Information

paul@20

390

------------------------------------------

The current Web page for ConfluenceConverter at the time of release is:

http://hgweb.boddie.org.uk/ConfluenceConverter

Copyright and licence information can be found in the docs directory - see

paul@20

397

docs/COPYING.txt and docs/LICENCE.txt for more information.

paul@20	1	Introduction
paul@20	2	------------
paul@20	3
paul@20	4	ConfluenceConverter is a distribution of software that converts exported data
paul@100	5	from Confluence wiki instances, provided in the form of an XML file, to a
paul@100	6	collection of wiki pages and resources that can be imported into a MoinMoin
paul@20	7	instance as a page package.
paul@20	8
paul@127	9	Migration Activities
paul@127	10	--------------------
paul@127	11
paul@127	12	The following activities are involved in a migration from Confluence to
paul@138	13	MoinMoin. First, the activities that can be performed from any location:
paul@127	14
paul@127	15	* Export of Confluence content
paul@127	16	* Conversion of Confluence content to MoinMoin content
paul@127	17	* Confluence page identifier extraction and mapping to MoinMoin identifiers
paul@127	18	* Acquisition of Confluence user profile details
paul@138	19
paul@138	20	Then, the activities that are performed on the server:
paul@138	21
paul@127	22	* Installation of MoinMoin
paul@127	23	* Initialisation of a MoinMoin wiki instance
paul@127	24	* Import of MoinMoin content into the new wiki instance
paul@127	25	* Installation of MoinMoin extensions
paul@127	26	* Initialisation of user profiles in MoinMoin
paul@127	27	* Installation of scripts and identifier mappings
paul@127	28	* Filesystem permission adjustments
paul@127	29
paul@20	30	Prerequisites
paul@20	31	-------------
paul@20	32
paul@20	33	ConfluenceConverter requires a library called xmlread that can be found at the
paul@20	34	following location:
paul@20	35
paul@20	36	http://hgweb.boddie.org.uk/xmlread
paul@20	37
paul@20	38	The xmlread.py file from the xmlread distribution can be copied into the
paul@20	39	ConfluenceConverter directory.
paul@20	40
paul@40	41	ConfluenceConverter also requires access to the MoinMoin.wikiutil module found
paul@40	42	in the MoinMoin distribution.
paul@40	43
paul@20	44	The moinsetup program is highly recommended for the installation of page
paul@100	45	packages and the management of MoinMoin wiki instances:
paul@20	46
paul@20	47	http://moinmo.in/ScriptMarket/moinsetup
paul@20	48
paul@20	49	If moinsetup is not being used, the page package installer documentation
paul@20	50	should be consulted:
paul@20	51
paul@20	52	http://moinmo.in/HelpOnPackageInstaller
paul@20	53
paul@106	54	To read Confluence user profiles on live Confluence sites using the
paul@106	55	get_profiles.py program, the libxml2dom library is required:
paul@106	56
paul@106	57	http://hgweb.boddie.org.uk/libxml2dom
paul@106	58
paul@100	59	MoinMoin Prerequisites
paul@100	60	----------------------
paul@100	61
paul@123	62	The page package installer does not preserve user information or the last
paul@123	63	modified time when installing page revisions. This can be modified by applying
paul@123	64	a patch to MoinMoin as follows while at the top level of the MoinMoin source
paul@123	65	distribution:
paul@100	66
paul@100	67	patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-packages.diff
paul@100	68
paul@100	69	Here, CCDIR is the path to the top level of this source distribution where
paul@100	70	this README.txt file is found.
paul@100	71
paul@133	72	When importing users, MoinMoin may be unable to handle user information
paul@133	73	containing non-ASCII characters. Another patch to solve such problems can be
paul@133	74	applied to MoinMoin as follows:
paul@133	75
paul@133	76	patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-user.diff
paul@133	77
paul@49	78	Wiki Content Prerequisites
paul@49	79	--------------------------
paul@49	80
paul@49	81	For the output of the converter, the following MoinMoin extensions are
paul@49	82	required:
paul@49	83
paul@49	84	http://moinmo.in/ParserMarket/ImprovedTableParser
paul@133	85	http://moinmo.in/ActionMarket/SubpageComments
paul@79	86	http://moinmo.in/MacroMarket/Color2
paul@49	87
paul@133	88	A common dependency of various extensions is provided by MoinSupport:
paul@108	89
paul@133	90	http://hgweb.boddie.org.uk/MoinSupport
paul@108	91
paul@118	92	Additional Software
paul@118	93	-------------------
paul@118	94
paul@118	95	PDF export support requires the ExportPDF action:
paul@118	96
paul@118	97	http://moinmo.in/ActionMarket/ExportPDF
paul@118	98
paul@118	99	This in turn requires Apache FOP for PDF production using XSL-FO:
paul@118	100
paul@118	101	http://xmlgraphics.apache.org/fop/
paul@118	102
paul@118	103	(On Debian systems, the fop package provides this tool.)
paul@118	104
paul@118	105	To produce XSL-FO from DocBook output, xsltproc is required from the libxslt
paul@118	106	distribution:
paul@118	107
paul@118	108	http://xmlsoft.org/XSLT/
paul@118	109
paul@118	110	(On Debian systems, the xsltproc package provides this tool.)
paul@118	111
paul@118	112	And DocBook output requires the DocBook resources to be installed, described
paul@118	113	in the following guide:
paul@118	114
paul@118	115	http://www.sagehill.net/docbookxsl/ToolsSetup.html
paul@118	116
paul@118	117	(On Debian systems, the docbook-xsl package provides these resources.)
paul@118	118
paul@20	119	Quick Start
paul@20	120	-----------
paul@20	121
paul@138	122	(!) The acquisition of Confluence wiki content and its conversion can be
paul@138	123	performed from any location, not necessarily on the server.
paul@138	124
paul@136	125	To obtain XML export archives from a Confluence wiki instance, the
paul@136	126	exportspacexml.action resource is visited and the "Export" button selected.
paul@136	127	For example, for the Mailman Wiki, the appropriate resource (with the COM
paul@136	128	namespace selected) is as follows:
paul@136	129
paul@136	130	http://wiki.list.org/spaces/exportspacexml.action?key=COM
paul@136	131
paul@136	132	For your own instance, adjust the above URL accordingly. Alternatively, you
paul@136	133	can find your way to the export page by selecting a namespace, then choosing
paul@136	134	"Advanced" from the "Browse" menu, and then choosing "XML Export" from the
paul@136	135	"Export" sidebar.
paul@136	136
paul@100	137	Given an XML export archive file for a Confluence wiki instance (in the
paul@100	138	example below, the file is called COM-123456-789012.zip), the following
paul@100	139	command can be used to prepare a page package for MoinMoin:
paul@20	140
paul@100	141	python convert.py COM-123456-789012.zip COM
paul@20	142
paul@20	143	In addition to the filename, a workspace name is required. Confluence appears
paul@20	144	to require a workspace as a container for collections of pages, but this also
paul@100	145	permits us to selectively import parts of a wiki into MoinMoin. If attachments
paul@100	146	were included in the export from Confluence, these will be imported into the
paul@100	147	page package.
paul@20	148
paul@20	149	The result of the above command will be a directory having the same name as
paul@20	150	the chosen workspace, together with a zip archive for that directory's
paul@20	151	contents. Thus, the above command would produce a directory called COM and an
paul@20	152	archive called COM.zip.
paul@20	153
paul@138	154	(!) The following step is performed on the server.
paul@138	155
paul@136	156	To import the result (although you may wish to process other namespaces
paul@136	157	first), use moinsetup as follows:
paul@20	158
paul@20	159	python moinsetup.py -m install_page_package COM.zip
paul@20	160
paul@20	161	This requires a suitable moinsetup.cfg file in the working directory.
paul@20	162
paul@136	163	Importing Many Workspaces/Namespaces
paul@136	164	------------------------------------
paul@123	165
paul@123	166	Where more than one namespace is to be imported, the page packages should be
paul@123	167	merged so that the resulting history information is ordered correctly.
paul@123	168
paul@138	169	(!) This process can be performed from any location and the result uploaded to
paul@138	170	the server for eventual import.
paul@138	171
paul@123	172	To merge packages, use a command of the following form:
paul@123	173
paul@123	174	python merge.py OUT COM.zip DEV.zip DOC.zip SEC.zip
paul@123	175
paul@123	176	A directory called OUT and a page package called OUT.zip will be produced. The
paul@123	177	latter can then be imported into MoinMoin as described above.
paul@123	178
paul@100	179	Mappings from Identifiers to Pages
paul@100	180	----------------------------------
paul@100	181
paul@100	182	Confluence uses numbers to label content revisions, and links to Confluence
paul@100	183	sites sometimes use these numbers instead of a readable page name. MoinMoin,
paul@100	184	meanwhile, only uses page names and has no external numeric identifier scheme.
paul@100	185	Consequently, it is necessary to produce a mapping from Confluence identifiers
paul@100	186	to MoinMoin page names. In addition to numeric identifiers, Confluence also
paul@100	187	provides "tiny URLs" which are an alphanumeric encoding of the numeric
paul@100	188	identifiers.
paul@100	189
paul@138	190	(!) This process can be performed with the converted content from any
paul@138	191	location, with the generated files uploaded to the server for eventual
paul@138	192	deployment.
paul@138	193
paul@100	194	To generate mappings for the Confluence content, use the mappings script as
paul@100	195	follows:
paul@100	196
paul@103	197	tools/mappings.sh COM
paul@100	198
paul@100	199	Here, COM is a directory name containing converted Confluence content,
paul@100	200	corresponding to a space name in the original Confluence wiki. More than one
paul@138	201	space name can be used to generate a complete mapping for a site. For example:
paul@138	202
paul@138	203	tools/mappings.sh COM DEV DOC SEC
paul@100	204
paul@100	205	The following files are generated:
paul@100	206
paul@100	207	* mapping-id-to-page.txt
paul@100	208	* mapping-tiny-to-id.txt
paul@100	209	* mapping-tiny-to-page.txt
paul@100	210
paul@100	211	The most useful of these is the first as it includes all the necessary
paul@100	212	information provided by the arbitrary mapping from identifiers to page names.
paul@100	213	The second mapping merely converts the "tiny URLs" to identifiers, which can
paul@100	214	be done by applying an algorithm without any external knowledge of the wiki
paul@100	215	structure. The third mapping is provided as a convenience, combining the "tiny
paul@100	216	URL" conversion and the arbitrary mapping to page names.
paul@100	217
paul@102	218	Translating Requests Using the Mappings
paul@102	219	---------------------------------------
paul@102	220
paul@100	221	Where Web server facilities such as RewriteMap are available for use, the
paul@100	222	first and third mapping files can be used directly. See the Apache
paul@100	223	documentation for details of RewriteMap:
paul@100	224
paul@100	225	http://httpd.apache.org/docs/2.4/rewrite/rewritemap.html
paul@100	226
paul@100	227	Otherwise, it is more likely that the first file is used by a program that can
paul@100	228	perform a redirect to the appropriate wiki page, and the "tiny URL" decoding
paul@100	229	is also done by this program when deployed in a suitable location to receive
paul@102	230	such requests. To support this, the following resources are provided:
paul@102	231
paul@102	232	* scripts/redirect.py
paul@102	233	* config/mailmanwiki-redirect
paul@102	234
paul@102	235	The latter configuration file should be combined with the Web server
paul@102	236	configuration file such that the appropriate aliases are able to capture
paul@102	237	requests and invoke the redirect.py script before the main wiki aliases are
paul@102	238	consulted. The script itself should be placed in a suitable filesystem
paul@102	239	location, and the mapping-id-to-page.txt file should be placed alongside it,
paul@102	240	or it should be placed in a different location and the MAPPING_ID_TO_PAGE
paul@102	241	variable changed in the script to refer to this different location.
paul@100	242
paul@113	243	Supporting Confluence Action URLs
paul@113	244	---------------------------------
paul@113	245
paul@113	246	Besides the "viewpage" action mapping identifiers to pages (covered by the
paul@113	247	mapping described above), some other action URLs may be used in wiki content
paul@113	248	and must either be translated or supported using redirects. Since external
paul@113	249	sites may also employ such actions, a redirect strategy perhaps makes more
paul@113	250	sense. To support this, the following resources are involved:
paul@113	251
paul@114	252	* scripts/dashboard.py
paul@118	253	* scripts/redirect.py
paul@113	254	* scripts/search.py
paul@113	255	* config/mailmanwiki-redirect
paul@113	256
paul@113	257	The latter configuration file is also involved in identifier-to-page mapping,
paul@118	258	but in this case it causes requests to the "dashboard", "doexportpage" and
paul@118	259	"dosearchsite" actions to be directed to the dashboard.py, redirect.py and
paul@118	260	search.py scripts respectively.
paul@114	261
paul@114	262	The dashboard.py script merely redirects requests to the root of the site,
paul@114	263	thus assuming that the front page is configured to show dashboard-like
paul@114	264	information.
paul@114	265
paul@118	266	The redirect.py script, apart from supporting identifier-to-page redirects,
paul@118	267	also supports PDF page exports since the "doexportpage" action uses
paul@134	268	identifiers to indicate which page is to be exported. In an environment that
paul@134	269	uses .htaccess and mod_rewrite, the redirect.py script should also be deployed
paul@135	270	under separate names (such as export.py and exportpdf.py) so that it can
paul@135	271	discover whether it should be exporting a page instead of just showing it.
paul@118	272
paul@114	273	The search.py script redirects search requests in a suitable form to the
paul@114	274	MoinMoin "fullsearch" action.
paul@113	275
paul@104	276	Identifying and Migrating Users
paul@104	277	-------------------------------
paul@104	278
paul@104	279	Confluence export archives do not contain user profile information, but page
paul@104	280	versions are marked with user identifiers. Therefore, a list of user
paul@104	281	identifiers can be obtained by running a script extracting these identifiers.
paul@104	282	The following command writes to standard output the users involved with
paul@104	283	editing the wiki in four different spaces (exported to four directories):
paul@104	284
paul@104	285	tools/users.sh COM DEV DOC SEC
paul@104	286
paul@105	287	This output can be edited and then passed to a program which fetches other
paul@105	288	profile details as follows:
paul@104	289
paul@138	290	tools/users.sh COM DEV DOC SEC > users.txt
paul@138	291
paul@138	292	After editing...
paul@138	293
paul@138	294	cat users.txt \
paul@138	295	\| tools/get_profiles.py http://wiki.list.org/ \
paul@138	296	> profiles.txt
paul@104	297
paul@104	298	If no users are to be removed in migration, the following command could be
paul@104	299	issued:
paul@104	300
paul@138	301	tools/users.sh COM DEV DOC SEC \
paul@138	302	\| tools/get_profiles.py http://wiki.list.org/ \
paul@138	303	> profiles.txt
paul@105	304
paul@105	305	The get_profiles.py program needs to be told the URL of the original
paul@105	306	Confluence site. Note that it accesses the site at a default rate of around
paul@105	307	one request per second; a different delay between requests can be specified
paul@105	308	using an additional argument.
paul@105	309
paul@138	310	(!) The above steps can be performed from any location, but the command
paul@138	311	pipelines below need to be run on the server due to the use of a program that
paul@138	312	updates the deployed wiki.
paul@138	313
paul@105	314	The output of the get_profiles.py program can be passed to another program
paul@105	315	which adds users to MoinMoin, and so the following commands can be used:
paul@105	316
paul@138	317	cat profiles.txt \
paul@138	318	\| tools/addusers.py wiki
paul@138	319
paul@138	320	Alternatively, the users can be converted to profiles and immediately added
paul@138	321	without creating a profiles file:
paul@138	322
paul@105	323	cat users.txt \
paul@105	324	\| tools/get_profiles.py http://wiki.list.org/ \
paul@105	325	\| tools/addusers.py wiki
paul@105	326
paul@138	327	Or just using one single command without inspecting the users or profiles at
paul@138	328	all:
paul@105	329
paul@105	330	tools/users.sh COM DEV DOC SEC \
paul@105	331	\| tools/get_profiles.py http://wiki.list.org/ \
paul@105	332	\| tools/addusers.py wiki
paul@104	333
paul@104	334	The addusers.py program needs to be told the directory containing the wiki
paul@105	335	configuration.
paul@104	336
paul@30	337	Output Structure
paul@30	338	----------------
paul@30	339
paul@30	340	The structure of a converted workspace is a directory hierarchy containing the
paul@30	341	following directories:
paul@30	342
paul@30	343	* pages (a collection of directories defining each page or content item,
paul@30	344	corresponding to Page, Comment and BlogPost elements in the XML
paul@30	345	exported from Confluence)
paul@30	346
paul@30	347	* versions (a collection of files, each defining a revision or version of
paul@30	348	some content, corresponding to BodyContent elements in the XML
paul@30	349	exported from Confluence)
paul@30	350
paul@30	351	Each page directory contains the following things:
paul@30	352
paul@100	353	* pagetype (either "Page", "Comment" or "BlogPost")
paul@100	354
paul@40	355	* manifest (a list of version entries in a format similar to the MoinMoin
paul@40	356	page package manifest format)
paul@30	357
paul@40	358	* attachments (a list of attachment version entries in a format similar to
paul@40	359	the MoinMoin page package manifest format)
paul@30	360
paul@40	361	* pagetitle (an optional page title imposed on the page by another content
paul@40	362	item)
paul@40	363
paul@40	364	* children (a list of child page names defined for the page)
paul@30	365
paul@100	366	* comments (a list of creation date plus comment page identifier pairs)
paul@100	367
paul@30	368	In the output structure, content items such as comments are represented as
paul@30	369	pages and each reference a content version. Since comments will ultimately be
paul@30	370	represented as subpages of some parent page, they will have a pagetitle file
paul@30	371	in their directory with an appropriate subpage name written according to the
paul@30	372	parent page's name and comment details.
paul@30	373
paul@20	374	Troubleshooting
paul@20	375	---------------
paul@20	376
paul@20	377	The page package import activity in particular can be a source of problems.
paul@20	378	Generally, any error occurring when attempting to import a package is likely
paul@20	379	to be due to insufficient privileges when writing to the pages directory of a
paul@100	380	wiki or to its edit-log file.
paul@20	381
paul@100	382	The moinsetup software can generate scripts that set the ownership of wiki
paul@20	383	files or apply ACLs (access control lists) to those files in order to make
paul@100	384	access to wiki data more convenient. Where the ownership of the files must be
paul@20	385	set (to www-data or nobody), the import step can be run as that user given
paul@20	386	sufficient privileges. However, the easiest solution is to apply ACLs, thus
paul@100	387	allowing the user who created the wiki to retain write access to it.
paul@20	388
paul@20	389	Contact, Copyright and Licence Information
paul@20	390	------------------------------------------
paul@20	391
paul@20	392	The current Web page for ConfluenceConverter at the time of release is:
paul@20	393
paul@20	394	http://hgweb.boddie.org.uk/ConfluenceConverter
paul@20	395
paul@20	396	Copyright and licence information can be found in the docs directory - see
paul@20	397	docs/COPYING.txt and docs/LICENCE.txt for more information.

ConfluenceConverter

Annotated README.txt