ConfluenceConverter (annotate README.txt in 72d61af44eaa)

Introduction

paul@20

2

------------

ConfluenceConverter is a distribution of software that converts exported data

paul@100

5

from Confluence wiki instances, provided in the form of an XML file, to a

paul@100

6

collection of wiki pages and resources that can be imported into a MoinMoin

paul@20

7

instance as a page package.

Migration Activities

paul@127

10

--------------------

The following activities are involved in a migration from Confluence to

paul@127

13

MoinMoin:

  * Export of Confluence content

paul@127

16

  * Conversion of Confluence content to MoinMoin content

paul@127

17

  * Confluence page identifier extraction and mapping to MoinMoin identifiers

paul@127

18

  * Acquisition of Confluence user profile details

paul@127

19

  * Installation of MoinMoin

paul@127

20

  * Initialisation of a MoinMoin wiki instance

paul@127

21

  * Import of MoinMoin content into the new wiki instance

paul@127

22

  * Installation of MoinMoin extensions

paul@127

23

  * Initialisation of user profiles in MoinMoin

paul@127

24

  * Installation of scripts and identifier mappings

paul@127

25

  * Filesystem permission adjustments

Prerequisites

paul@20

28

-------------

ConfluenceConverter requires a library called xmlread that can be found at the

paul@20

31

following location:

http://hgweb.boddie.org.uk/xmlread

The xmlread.py file from the xmlread distribution can be copied into the

paul@20

36

ConfluenceConverter directory.

ConfluenceConverter also requires access to the MoinMoin.wikiutil module found

paul@40

39

in the MoinMoin distribution.

The moinsetup program is highly recommended for the installation of page

paul@100

42

packages and the management of MoinMoin wiki instances:

http://moinmo.in/ScriptMarket/moinsetup

If moinsetup is not being used, the page package installer documentation

paul@20

47

should be consulted:

http://moinmo.in/HelpOnPackageInstaller

To read Confluence user profiles on live Confluence sites using the

paul@106

52

get_profiles.py program, the libxml2dom library is required:

http://hgweb.boddie.org.uk/libxml2dom

MoinMoin Prerequisites

paul@100

57

----------------------

The page package installer does not preserve user information or the last

paul@123

60

modified time when installing page revisions. This can be modified by applying

paul@123

61

a patch to MoinMoin as follows while at the top level of the MoinMoin source

paul@123

62

distribution:

patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-packages.diff

Here, CCDIR is the path to the top level of this source distribution where

paul@100

67

this README.txt file is found.

Wiki Content Prerequisites

paul@49

70

--------------------------

For the output of the converter, the following MoinMoin extensions are

paul@49

73

required:

http://moinmo.in/ParserMarket/ImprovedTableParser

paul@49

76

http://hgweb.boddie.org.uk/MoinSupport

paul@79

77

http://moinmo.in/MacroMarket/Color2

In addition, extensions are provided in this distribution to support various

paul@108

80

Confluence features, notably comments on pages. These extensions are installed

paul@108

81

as follows:

python moinsetup.py -m install_actions $CCDIR/actions

paul@108

84

python moinsetup.py -m install_macros $CCDIR/macros

paul@108

85

python moinsetup.py -m install_theme_resources $CCDIR

paul@108

86

python moinsetup.py -m edit_theme_stylesheet screen.css includecomments.css

paul@108

87

python moinsetup.py -m edit_theme_stylesheet print.css includecomments.css

Additional Software

paul@118

90

-------------------

PDF export support requires the ExportPDF action:

http://moinmo.in/ActionMarket/ExportPDF

This in turn requires Apache FOP for PDF production using XSL-FO:

http://xmlgraphics.apache.org/fop/

(On Debian systems, the fop package provides this tool.)

To produce XSL-FO from DocBook output, xsltproc is required from the libxslt

paul@118

103

distribution:

http://xmlsoft.org/XSLT/

(On Debian systems, the xsltproc package provides this tool.)

And DocBook output requires the DocBook resources to be installed, described

paul@118

110

in the following guide:

http://www.sagehill.net/docbookxsl/ToolsSetup.html

(On Debian systems, the docbook-xsl package provides these resources.)

Quick Start

paul@20

117

-----------

Given an XML export archive file for a Confluence wiki instance (in the

paul@100

120

example below, the file is called COM-123456-789012.zip), the following

paul@100

121

command can be used to prepare a page package for MoinMoin:

python convert.py COM-123456-789012.zip COM

In addition to the filename, a workspace name is required. Confluence appears

paul@20

126

to require a workspace as a container for collections of pages, but this also

paul@100

127

permits us to selectively import parts of a wiki into MoinMoin. If attachments

paul@100

128

were included in the export from Confluence, these will be imported into the

paul@100

129

page package.

The result of the above command will be a directory having the same name as

paul@20

132

the chosen workspace, together with a zip archive for that directory's

paul@20

133

contents. Thus, the above command would produce a directory called COM and an

paul@20

134

archive called COM.zip.

To import the result, use moinsetup as follows:

python moinsetup.py -m install_page_package COM.zip

This requires a suitable moinsetup.cfg file in the working directory.

Importing Many Workspaces

paul@123

143

-------------------------

Where more than one namespace is to be imported, the page packages should be

paul@123

146

merged so that the resulting history information is ordered correctly.

To merge packages, use a command of the following form:

python merge.py OUT COM.zip DEV.zip DOC.zip SEC.zip

A directory called OUT and a page package called OUT.zip will be produced. The

paul@123

153

latter can then be imported into MoinMoin as described above.

Mappings from Identifiers to Pages

paul@100

156

----------------------------------

Confluence uses numbers to label content revisions, and links to Confluence

paul@100

159

sites sometimes use these numbers instead of a readable page name. MoinMoin,

paul@100

160

meanwhile, only uses page names and has no external numeric identifier scheme.

paul@100

161

Consequently, it is necessary to produce a mapping from Confluence identifiers

paul@100

162

to MoinMoin page names. In addition to numeric identifiers, Confluence also

paul@100

163

provides "tiny URLs" which are an alphanumeric encoding of the numeric

paul@100

164

identifiers.

To generate mappings for the Confluence content, use the mappings script as

paul@100

167

follows:

tools/mappings.sh COM

Here, COM is a directory name containing converted Confluence content,

paul@100

172

corresponding to a space name in the original Confluence wiki. More than one

paul@100

173

space name can be used to generate a complete mapping for a site.

The following files are generated:

  * mapping-id-to-page.txt

paul@100

178

  * mapping-tiny-to-id.txt

paul@100

179

  * mapping-tiny-to-page.txt

The most useful of these is the first as it includes all the necessary

paul@100

182

information provided by the arbitrary mapping from identifiers to page names.

paul@100

183

The second mapping merely converts the "tiny URLs" to identifiers, which can

paul@100

184

be done by applying an algorithm without any external knowledge of the wiki

paul@100

185

structure. The third mapping is provided as a convenience, combining the "tiny

paul@100

186

URL" conversion and the arbitrary mapping to page names.

Translating Requests Using the Mappings

paul@102

189

---------------------------------------

Where Web server facilities such as RewriteMap are available for use, the

paul@100

192

first and third mapping files can be used directly. See the Apache

paul@100

193

documentation for details of RewriteMap:

http://httpd.apache.org/docs/2.4/rewrite/rewritemap.html

Otherwise, it is more likely that the first file is used by a program that can

paul@100

198

perform a redirect to the appropriate wiki page, and the "tiny URL" decoding

paul@100

199

is also done by this program when deployed in a suitable location to receive

paul@102

200

such requests. To support this, the following resources are provided:

  * scripts/redirect.py

paul@102

203

  * config/mailmanwiki-redirect

The latter configuration file should be combined with the Web server

paul@102

206

configuration file such that the appropriate aliases are able to capture

paul@102

207

requests and invoke the redirect.py script before the main wiki aliases are

paul@102

208

consulted. The script itself should be placed in a suitable filesystem

paul@102

209

location, and the mapping-id-to-page.txt file should be placed alongside it,

paul@102

210

or it should be placed in a different location and the MAPPING_ID_TO_PAGE

paul@102

211

variable changed in the script to refer to this different location.

Supporting Confluence Action URLs

paul@113

214

---------------------------------

Besides the "viewpage" action mapping identifiers to pages (covered by the

paul@113

217

mapping described above), some other action URLs may be used in wiki content

paul@113

218

and must either be translated or supported using redirects. Since external

paul@113

219

sites may also employ such actions, a redirect strategy perhaps makes more

paul@113

220

sense. To support this, the following resources are involved:

  * scripts/dashboard.py

paul@118

223

  * scripts/redirect.py

paul@113

224

  * scripts/search.py

paul@113

225

  * config/mailmanwiki-redirect

The latter configuration file is also involved in identifier-to-page mapping,

paul@118

228

but in this case it causes requests to the "dashboard", "doexportpage" and

paul@118

229

"dosearchsite" actions to be directed to the dashboard.py, redirect.py and

paul@118

230

search.py scripts respectively.

The dashboard.py script merely redirects requests to the root of the site,

paul@114

233

thus assuming that the front page is configured to show dashboard-like

paul@114

234

information.

The redirect.py script, apart from supporting identifier-to-page redirects,

paul@118

237

also supports PDF page exports since the "doexportpage" action uses

paul@118

238

identifiers to indicate which page is to be exported.

The search.py script redirects search requests in a suitable form to the

paul@114

241

MoinMoin "fullsearch" action.

Identifying and Migrating Users

paul@104

244

-------------------------------

Confluence export archives do not contain user profile information, but page

paul@104

247

versions are marked with user identifiers. Therefore, a list of user

paul@104

248

identifiers can be obtained by running a script extracting these identifiers.

paul@104

249

The following command writes to standard output the users involved with

paul@104

250

editing the wiki in four different spaces (exported to four directories):

tools/users.sh COM DEV DOC SEC

This output can be edited and then passed to a program which fetches other

paul@105

255

profile details as follows:

tools/users.sh COM DEV DOC SEC > users.txt # for editing

paul@105

258

cat users.txt | tools/get_profiles.py http://wiki.list.org/

If no users are to be removed in migration, the following command could be

paul@104

261

issued:

tools/users.sh COM DEV DOC SEC | tools/get_profiles.py http://wiki.list.org/

The get_profiles.py program needs to be told the URL of the original

paul@105

266

Confluence site. Note that it accesses the site at a default rate of around

paul@105

267

one request per second; a different delay between requests can be specified

paul@105

268

using an additional argument.

The output of the get_profiles.py program can be passed to another program

paul@105

271

which adds users to MoinMoin, and so the following commands can be used:

  cat users.txt \

paul@105

274

| tools/get_profiles.py http://wiki.list.org/ \

paul@105

275

| tools/addusers.py wiki

And using one single command:

  tools/users.sh COM DEV DOC SEC \

paul@105

280

| tools/get_profiles.py http://wiki.list.org/ \

paul@105

281

| tools/addusers.py wiki

The addusers.py program needs to be told the directory containing the wiki

paul@105

284

configuration.

Output Structure

paul@30

287

----------------

The structure of a converted workspace is a directory hierarchy containing the

paul@30

290

following directories:

  * pages     (a collection of directories defining each page or content item,

paul@30

293

               corresponding to Page, Comment and BlogPost elements in the XML

paul@30

294

               exported from Confluence)

  * versions  (a collection of files, each defining a revision or version of

paul@30

297

               some content, corresponding to BodyContent elements in the XML

paul@30

298

               exported from Confluence)

Each page directory contains the following things:

  * pagetype    (either "Page", "Comment" or "BlogPost")

  * manifest    (a list of version entries in a format similar to the MoinMoin

paul@40

305

                 page package manifest format)

  * attachments (a list of attachment version entries in a format similar to

paul@40

308

                 the MoinMoin page package manifest format)

  * pagetitle   (an optional page title imposed on the page by another content

paul@40

311

                 item)

  * children    (a list of child page names defined for the page)

  * comments    (a list of creation date plus comment page identifier pairs)

In the output structure, content items such as comments are represented as

paul@30

318

pages and each reference a content version. Since comments will ultimately be

paul@30

319

represented as subpages of some parent page, they will have a pagetitle file

paul@30

320

in their directory with an appropriate subpage name written according to the

paul@30

321

parent page's name and comment details.

Troubleshooting

paul@20

324

---------------

The page package import activity in particular can be a source of problems.

paul@20

327

Generally, any error occurring when attempting to import a package is likely

paul@20

328

to be due to insufficient privileges when writing to the pages directory of a

paul@100

329

wiki or to its edit-log file.

The moinsetup software can generate scripts that set the ownership of wiki

paul@20

332

files or apply ACLs (access control lists) to those files in order to make

paul@100

333

access to wiki data more convenient. Where the ownership of the files must be

paul@20

334

set (to www-data or nobody), the import step can be run as that user given

paul@20

335

sufficient privileges. However, the easiest solution is to apply ACLs, thus

paul@100

336

allowing the user who created the wiki to retain write access to it.

Contact, Copyright and Licence Information

paul@20

339

------------------------------------------

The current Web page for ConfluenceConverter at the time of release is:

http://hgweb.boddie.org.uk/ConfluenceConverter

Copyright and licence information can be found in the docs directory - see

paul@20

346

docs/COPYING.txt and docs/LICENCE.txt for more information.

paul@20	1	Introduction
paul@20	2	------------
paul@20	3
paul@20	4	ConfluenceConverter is a distribution of software that converts exported data
paul@100	5	from Confluence wiki instances, provided in the form of an XML file, to a
paul@100	6	collection of wiki pages and resources that can be imported into a MoinMoin
paul@20	7	instance as a page package.
paul@20	8
paul@127	9	Migration Activities
paul@127	10	--------------------
paul@127	11
paul@127	12	The following activities are involved in a migration from Confluence to
paul@127	13	MoinMoin:
paul@127	14
paul@127	15	* Export of Confluence content
paul@127	16	* Conversion of Confluence content to MoinMoin content
paul@127	17	* Confluence page identifier extraction and mapping to MoinMoin identifiers
paul@127	18	* Acquisition of Confluence user profile details
paul@127	19	* Installation of MoinMoin
paul@127	20	* Initialisation of a MoinMoin wiki instance
paul@127	21	* Import of MoinMoin content into the new wiki instance
paul@127	22	* Installation of MoinMoin extensions
paul@127	23	* Initialisation of user profiles in MoinMoin
paul@127	24	* Installation of scripts and identifier mappings
paul@127	25	* Filesystem permission adjustments
paul@127	26
paul@20	27	Prerequisites
paul@20	28	-------------
paul@20	29
paul@20	30	ConfluenceConverter requires a library called xmlread that can be found at the
paul@20	31	following location:
paul@20	32
paul@20	33	http://hgweb.boddie.org.uk/xmlread
paul@20	34
paul@20	35	The xmlread.py file from the xmlread distribution can be copied into the
paul@20	36	ConfluenceConverter directory.
paul@20	37
paul@40	38	ConfluenceConverter also requires access to the MoinMoin.wikiutil module found
paul@40	39	in the MoinMoin distribution.
paul@40	40
paul@20	41	The moinsetup program is highly recommended for the installation of page
paul@100	42	packages and the management of MoinMoin wiki instances:
paul@20	43
paul@20	44	http://moinmo.in/ScriptMarket/moinsetup
paul@20	45
paul@20	46	If moinsetup is not being used, the page package installer documentation
paul@20	47	should be consulted:
paul@20	48
paul@20	49	http://moinmo.in/HelpOnPackageInstaller
paul@20	50
paul@106	51	To read Confluence user profiles on live Confluence sites using the
paul@106	52	get_profiles.py program, the libxml2dom library is required:
paul@106	53
paul@106	54	http://hgweb.boddie.org.uk/libxml2dom
paul@106	55
paul@100	56	MoinMoin Prerequisites
paul@100	57	----------------------
paul@100	58
paul@123	59	The page package installer does not preserve user information or the last
paul@123	60	modified time when installing page revisions. This can be modified by applying
paul@123	61	a patch to MoinMoin as follows while at the top level of the MoinMoin source
paul@123	62	distribution:
paul@100	63
paul@100	64	patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-packages.diff
paul@100	65
paul@100	66	Here, CCDIR is the path to the top level of this source distribution where
paul@100	67	this README.txt file is found.
paul@100	68
paul@49	69	Wiki Content Prerequisites
paul@49	70	--------------------------
paul@49	71
paul@49	72	For the output of the converter, the following MoinMoin extensions are
paul@49	73	required:
paul@49	74
paul@49	75	http://moinmo.in/ParserMarket/ImprovedTableParser
paul@49	76	http://hgweb.boddie.org.uk/MoinSupport
paul@79	77	http://moinmo.in/MacroMarket/Color2
paul@49	78
paul@108	79	In addition, extensions are provided in this distribution to support various
paul@108	80	Confluence features, notably comments on pages. These extensions are installed
paul@108	81	as follows:
paul@108	82
paul@108	83	python moinsetup.py -m install_actions $CCDIR/actions
paul@108	84	python moinsetup.py -m install_macros $CCDIR/macros
paul@108	85	python moinsetup.py -m install_theme_resources $CCDIR
paul@108	86	python moinsetup.py -m edit_theme_stylesheet screen.css includecomments.css
paul@108	87	python moinsetup.py -m edit_theme_stylesheet print.css includecomments.css
paul@108	88
paul@118	89	Additional Software
paul@118	90	-------------------
paul@118	91
paul@118	92	PDF export support requires the ExportPDF action:
paul@118	93
paul@118	94	http://moinmo.in/ActionMarket/ExportPDF
paul@118	95
paul@118	96	This in turn requires Apache FOP for PDF production using XSL-FO:
paul@118	97
paul@118	98	http://xmlgraphics.apache.org/fop/
paul@118	99
paul@118	100	(On Debian systems, the fop package provides this tool.)
paul@118	101
paul@118	102	To produce XSL-FO from DocBook output, xsltproc is required from the libxslt
paul@118	103	distribution:
paul@118	104
paul@118	105	http://xmlsoft.org/XSLT/
paul@118	106
paul@118	107	(On Debian systems, the xsltproc package provides this tool.)
paul@118	108
paul@118	109	And DocBook output requires the DocBook resources to be installed, described
paul@118	110	in the following guide:
paul@118	111
paul@118	112	http://www.sagehill.net/docbookxsl/ToolsSetup.html
paul@118	113
paul@118	114	(On Debian systems, the docbook-xsl package provides these resources.)
paul@118	115
paul@20	116	Quick Start
paul@20	117	-----------
paul@20	118
paul@100	119	Given an XML export archive file for a Confluence wiki instance (in the
paul@100	120	example below, the file is called COM-123456-789012.zip), the following
paul@100	121	command can be used to prepare a page package for MoinMoin:
paul@20	122
paul@100	123	python convert.py COM-123456-789012.zip COM
paul@20	124
paul@20	125	In addition to the filename, a workspace name is required. Confluence appears
paul@20	126	to require a workspace as a container for collections of pages, but this also
paul@100	127	permits us to selectively import parts of a wiki into MoinMoin. If attachments
paul@100	128	were included in the export from Confluence, these will be imported into the
paul@100	129	page package.
paul@20	130
paul@20	131	The result of the above command will be a directory having the same name as
paul@20	132	the chosen workspace, together with a zip archive for that directory's
paul@20	133	contents. Thus, the above command would produce a directory called COM and an
paul@20	134	archive called COM.zip.
paul@20	135
paul@20	136	To import the result, use moinsetup as follows:
paul@20	137
paul@20	138	python moinsetup.py -m install_page_package COM.zip
paul@20	139
paul@20	140	This requires a suitable moinsetup.cfg file in the working directory.
paul@20	141
paul@123	142	Importing Many Workspaces
paul@123	143	-------------------------
paul@123	144
paul@123	145	Where more than one namespace is to be imported, the page packages should be
paul@123	146	merged so that the resulting history information is ordered correctly.
paul@123	147
paul@123	148	To merge packages, use a command of the following form:
paul@123	149
paul@123	150	python merge.py OUT COM.zip DEV.zip DOC.zip SEC.zip
paul@123	151
paul@123	152	A directory called OUT and a page package called OUT.zip will be produced. The
paul@123	153	latter can then be imported into MoinMoin as described above.
paul@123	154
paul@100	155	Mappings from Identifiers to Pages
paul@100	156	----------------------------------
paul@100	157
paul@100	158	Confluence uses numbers to label content revisions, and links to Confluence
paul@100	159	sites sometimes use these numbers instead of a readable page name. MoinMoin,
paul@100	160	meanwhile, only uses page names and has no external numeric identifier scheme.
paul@100	161	Consequently, it is necessary to produce a mapping from Confluence identifiers
paul@100	162	to MoinMoin page names. In addition to numeric identifiers, Confluence also
paul@100	163	provides "tiny URLs" which are an alphanumeric encoding of the numeric
paul@100	164	identifiers.
paul@100	165
paul@100	166	To generate mappings for the Confluence content, use the mappings script as
paul@100	167	follows:
paul@100	168
paul@103	169	tools/mappings.sh COM
paul@100	170
paul@100	171	Here, COM is a directory name containing converted Confluence content,
paul@100	172	corresponding to a space name in the original Confluence wiki. More than one
paul@100	173	space name can be used to generate a complete mapping for a site.
paul@100	174
paul@100	175	The following files are generated:
paul@100	176
paul@100	177	* mapping-id-to-page.txt
paul@100	178	* mapping-tiny-to-id.txt
paul@100	179	* mapping-tiny-to-page.txt
paul@100	180
paul@100	181	The most useful of these is the first as it includes all the necessary
paul@100	182	information provided by the arbitrary mapping from identifiers to page names.
paul@100	183	The second mapping merely converts the "tiny URLs" to identifiers, which can
paul@100	184	be done by applying an algorithm without any external knowledge of the wiki
paul@100	185	structure. The third mapping is provided as a convenience, combining the "tiny
paul@100	186	URL" conversion and the arbitrary mapping to page names.
paul@100	187
paul@102	188	Translating Requests Using the Mappings
paul@102	189	---------------------------------------
paul@102	190
paul@100	191	Where Web server facilities such as RewriteMap are available for use, the
paul@100	192	first and third mapping files can be used directly. See the Apache
paul@100	193	documentation for details of RewriteMap:
paul@100	194
paul@100	195	http://httpd.apache.org/docs/2.4/rewrite/rewritemap.html
paul@100	196
paul@100	197	Otherwise, it is more likely that the first file is used by a program that can
paul@100	198	perform a redirect to the appropriate wiki page, and the "tiny URL" decoding
paul@100	199	is also done by this program when deployed in a suitable location to receive
paul@102	200	such requests. To support this, the following resources are provided:
paul@102	201
paul@102	202	* scripts/redirect.py
paul@102	203	* config/mailmanwiki-redirect
paul@102	204
paul@102	205	The latter configuration file should be combined with the Web server
paul@102	206	configuration file such that the appropriate aliases are able to capture
paul@102	207	requests and invoke the redirect.py script before the main wiki aliases are
paul@102	208	consulted. The script itself should be placed in a suitable filesystem
paul@102	209	location, and the mapping-id-to-page.txt file should be placed alongside it,
paul@102	210	or it should be placed in a different location and the MAPPING_ID_TO_PAGE
paul@102	211	variable changed in the script to refer to this different location.
paul@100	212
paul@113	213	Supporting Confluence Action URLs
paul@113	214	---------------------------------
paul@113	215
paul@113	216	Besides the "viewpage" action mapping identifiers to pages (covered by the
paul@113	217	mapping described above), some other action URLs may be used in wiki content
paul@113	218	and must either be translated or supported using redirects. Since external
paul@113	219	sites may also employ such actions, a redirect strategy perhaps makes more
paul@113	220	sense. To support this, the following resources are involved:
paul@113	221
paul@114	222	* scripts/dashboard.py
paul@118	223	* scripts/redirect.py
paul@113	224	* scripts/search.py
paul@113	225	* config/mailmanwiki-redirect
paul@113	226
paul@113	227	The latter configuration file is also involved in identifier-to-page mapping,
paul@118	228	but in this case it causes requests to the "dashboard", "doexportpage" and
paul@118	229	"dosearchsite" actions to be directed to the dashboard.py, redirect.py and
paul@118	230	search.py scripts respectively.
paul@114	231
paul@114	232	The dashboard.py script merely redirects requests to the root of the site,
paul@114	233	thus assuming that the front page is configured to show dashboard-like
paul@114	234	information.
paul@114	235
paul@118	236	The redirect.py script, apart from supporting identifier-to-page redirects,
paul@118	237	also supports PDF page exports since the "doexportpage" action uses
paul@118	238	identifiers to indicate which page is to be exported.
paul@118	239
paul@114	240	The search.py script redirects search requests in a suitable form to the
paul@114	241	MoinMoin "fullsearch" action.
paul@113	242
paul@104	243	Identifying and Migrating Users
paul@104	244	-------------------------------
paul@104	245
paul@104	246	Confluence export archives do not contain user profile information, but page
paul@104	247	versions are marked with user identifiers. Therefore, a list of user
paul@104	248	identifiers can be obtained by running a script extracting these identifiers.
paul@104	249	The following command writes to standard output the users involved with
paul@104	250	editing the wiki in four different spaces (exported to four directories):
paul@104	251
paul@104	252	tools/users.sh COM DEV DOC SEC
paul@104	253
paul@105	254	This output can be edited and then passed to a program which fetches other
paul@105	255	profile details as follows:
paul@104	256
paul@104	257	tools/users.sh COM DEV DOC SEC > users.txt # for editing
paul@105	258	cat users.txt \| tools/get_profiles.py http://wiki.list.org/
paul@104	259
paul@104	260	If no users are to be removed in migration, the following command could be
paul@104	261	issued:
paul@104	262
paul@105	263	tools/users.sh COM DEV DOC SEC \| tools/get_profiles.py http://wiki.list.org/
paul@105	264
paul@105	265	The get_profiles.py program needs to be told the URL of the original
paul@105	266	Confluence site. Note that it accesses the site at a default rate of around
paul@105	267	one request per second; a different delay between requests can be specified
paul@105	268	using an additional argument.
paul@105	269
paul@105	270	The output of the get_profiles.py program can be passed to another program
paul@105	271	which adds users to MoinMoin, and so the following commands can be used:
paul@105	272
paul@105	273	cat users.txt \
paul@105	274	\| tools/get_profiles.py http://wiki.list.org/ \
paul@105	275	\| tools/addusers.py wiki
paul@105	276
paul@105	277	And using one single command:
paul@105	278
paul@105	279	tools/users.sh COM DEV DOC SEC \
paul@105	280	\| tools/get_profiles.py http://wiki.list.org/ \
paul@105	281	\| tools/addusers.py wiki
paul@104	282
paul@104	283	The addusers.py program needs to be told the directory containing the wiki
paul@105	284	configuration.
paul@104	285
paul@30	286	Output Structure
paul@30	287	----------------
paul@30	288
paul@30	289	The structure of a converted workspace is a directory hierarchy containing the
paul@30	290	following directories:
paul@30	291
paul@30	292	* pages (a collection of directories defining each page or content item,
paul@30	293	corresponding to Page, Comment and BlogPost elements in the XML
paul@30	294	exported from Confluence)
paul@30	295
paul@30	296	* versions (a collection of files, each defining a revision or version of
paul@30	297	some content, corresponding to BodyContent elements in the XML
paul@30	298	exported from Confluence)
paul@30	299
paul@30	300	Each page directory contains the following things:
paul@30	301
paul@100	302	* pagetype (either "Page", "Comment" or "BlogPost")
paul@100	303
paul@40	304	* manifest (a list of version entries in a format similar to the MoinMoin
paul@40	305	page package manifest format)
paul@30	306
paul@40	307	* attachments (a list of attachment version entries in a format similar to
paul@40	308	the MoinMoin page package manifest format)
paul@30	309
paul@40	310	* pagetitle (an optional page title imposed on the page by another content
paul@40	311	item)
paul@40	312
paul@40	313	* children (a list of child page names defined for the page)
paul@30	314
paul@100	315	* comments (a list of creation date plus comment page identifier pairs)
paul@100	316
paul@30	317	In the output structure, content items such as comments are represented as
paul@30	318	pages and each reference a content version. Since comments will ultimately be
paul@30	319	represented as subpages of some parent page, they will have a pagetitle file
paul@30	320	in their directory with an appropriate subpage name written according to the
paul@30	321	parent page's name and comment details.
paul@30	322
paul@20	323	Troubleshooting
paul@20	324	---------------
paul@20	325
paul@20	326	The page package import activity in particular can be a source of problems.
paul@20	327	Generally, any error occurring when attempting to import a package is likely
paul@20	328	to be due to insufficient privileges when writing to the pages directory of a
paul@100	329	wiki or to its edit-log file.
paul@20	330
paul@100	331	The moinsetup software can generate scripts that set the ownership of wiki
paul@20	332	files or apply ACLs (access control lists) to those files in order to make
paul@100	333	access to wiki data more convenient. Where the ownership of the files must be
paul@20	334	set (to www-data or nobody), the import step can be run as that user given
paul@20	335	sufficient privileges. However, the easiest solution is to apply ACLs, thus
paul@100	336	allowing the user who created the wiki to retain write access to it.
paul@20	337
paul@20	338	Contact, Copyright and Licence Information
paul@20	339	------------------------------------------
paul@20	340
paul@20	341	The current Web page for ConfluenceConverter at the time of release is:
paul@20	342
paul@20	343	http://hgweb.boddie.org.uk/ConfluenceConverter
paul@20	344
paul@20	345	Copyright and licence information can be found in the docs directory - see
paul@20	346	docs/COPYING.txt and docs/LICENCE.txt for more information.

ConfluenceConverter

Annotated README.txt