ConfluenceConverter (annotate README.txt in e425db7daf23)

Introduction

paul@20

2

------------

ConfluenceConverter is a distribution of software that converts exported data

paul@100

5

from Confluence wiki instances, provided in the form of an XML file, to a

paul@100

6

collection of wiki pages and resources that can be imported into a MoinMoin

paul@20

7

instance as a page package.

Prerequisites

paul@20

10

-------------

ConfluenceConverter requires a library called xmlread that can be found at the

paul@20

13

following location:

http://hgweb.boddie.org.uk/xmlread

The xmlread.py file from the xmlread distribution can be copied into the

paul@20

18

ConfluenceConverter directory.

ConfluenceConverter also requires access to the MoinMoin.wikiutil module found

paul@40

21

in the MoinMoin distribution.

The moinsetup program is highly recommended for the installation of page

paul@100

24

packages and the management of MoinMoin wiki instances:

http://moinmo.in/ScriptMarket/moinsetup

If moinsetup is not being used, the page package installer documentation

paul@20

29

should be consulted:

http://moinmo.in/HelpOnPackageInstaller

MoinMoin Prerequisites

paul@100

34

----------------------

The page package installer does not preserve user information when installing

paul@100

37

page revisions. This can be modified by applying a patch to MoinMoin as

paul@100

38

follows while at the top level of the MoinMoin source distribution:

patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-packages.diff

Here, CCDIR is the path to the top level of this source distribution where

paul@100

43

this README.txt file is found.

Wiki Content Prerequisites

paul@49

46

--------------------------

For the output of the converter, the following MoinMoin extensions are

paul@49

49

required:

http://moinmo.in/ParserMarket/ImprovedTableParser

paul@49

52

http://hgweb.boddie.org.uk/MoinSupport

paul@79

53

http://moinmo.in/MacroMarket/Color2

Quick Start

paul@20

56

-----------

Given an XML export archive file for a Confluence wiki instance (in the

paul@100

59

example below, the file is called COM-123456-789012.zip), the following

paul@100

60

command can be used to prepare a page package for MoinMoin:

python convert.py COM-123456-789012.zip COM

In addition to the filename, a workspace name is required. Confluence appears

paul@20

65

to require a workspace as a container for collections of pages, but this also

paul@100

66

permits us to selectively import parts of a wiki into MoinMoin. If attachments

paul@100

67

were included in the export from Confluence, these will be imported into the

paul@100

68

page package.

The result of the above command will be a directory having the same name as

paul@20

71

the chosen workspace, together with a zip archive for that directory's

paul@20

72

contents. Thus, the above command would produce a directory called COM and an

paul@20

73

archive called COM.zip.

To import the result, use moinsetup as follows:

python moinsetup.py -m install_page_package COM.zip

This requires a suitable moinsetup.cfg file in the working directory.

Mappings from Identifiers to Pages

paul@100

82

----------------------------------

Confluence uses numbers to label content revisions, and links to Confluence

paul@100

85

sites sometimes use these numbers instead of a readable page name. MoinMoin,

paul@100

86

meanwhile, only uses page names and has no external numeric identifier scheme.

paul@100

87

Consequently, it is necessary to produce a mapping from Confluence identifiers

paul@100

88

to MoinMoin page names. In addition to numeric identifiers, Confluence also

paul@100

89

provides "tiny URLs" which are an alphanumeric encoding of the numeric

paul@100

90

identifiers.

To generate mappings for the Confluence content, use the mappings script as

paul@100

93

follows:

tools/mappings.sh COM

Here, COM is a directory name containing converted Confluence content,

paul@100

98

corresponding to a space name in the original Confluence wiki. More than one

paul@100

99

space name can be used to generate a complete mapping for a site.

The following files are generated:

  * mapping-id-to-page.txt

paul@100

104

  * mapping-tiny-to-id.txt

paul@100

105

  * mapping-tiny-to-page.txt

The most useful of these is the first as it includes all the necessary

paul@100

108

information provided by the arbitrary mapping from identifiers to page names.

paul@100

109

The second mapping merely converts the "tiny URLs" to identifiers, which can

paul@100

110

be done by applying an algorithm without any external knowledge of the wiki

paul@100

111

structure. The third mapping is provided as a convenience, combining the "tiny

paul@100

112

URL" conversion and the arbitrary mapping to page names.

Translating Requests Using the Mappings

paul@102

115

---------------------------------------

Where Web server facilities such as RewriteMap are available for use, the

paul@100

118

first and third mapping files can be used directly. See the Apache

paul@100

119

documentation for details of RewriteMap:

http://httpd.apache.org/docs/2.4/rewrite/rewritemap.html

Otherwise, it is more likely that the first file is used by a program that can

paul@100

124

perform a redirect to the appropriate wiki page, and the "tiny URL" decoding

paul@100

125

is also done by this program when deployed in a suitable location to receive

paul@102

126

such requests. To support this, the following resources are provided:

  * scripts/redirect.py

paul@102

129

  * config/mailmanwiki-redirect

The latter configuration file should be combined with the Web server

paul@102

132

configuration file such that the appropriate aliases are able to capture

paul@102

133

requests and invoke the redirect.py script before the main wiki aliases are

paul@102

134

consulted. The script itself should be placed in a suitable filesystem

paul@102

135

location, and the mapping-id-to-page.txt file should be placed alongside it,

paul@102

136

or it should be placed in a different location and the MAPPING_ID_TO_PAGE

paul@102

137

variable changed in the script to refer to this different location.

Output Structure

paul@30

140

----------------

The structure of a converted workspace is a directory hierarchy containing the

paul@30

143

following directories:

  * pages     (a collection of directories defining each page or content item,

paul@30

146

               corresponding to Page, Comment and BlogPost elements in the XML

paul@30

147

               exported from Confluence)

  * versions  (a collection of files, each defining a revision or version of

paul@30

150

               some content, corresponding to BodyContent elements in the XML

paul@30

151

               exported from Confluence)

Each page directory contains the following things:

  * pagetype    (either "Page", "Comment" or "BlogPost")

  * manifest    (a list of version entries in a format similar to the MoinMoin

paul@40

158

                 page package manifest format)

  * attachments (a list of attachment version entries in a format similar to

paul@40

161

                 the MoinMoin page package manifest format)

  * pagetitle   (an optional page title imposed on the page by another content

paul@40

164

                 item)

  * children    (a list of child page names defined for the page)

  * comments    (a list of creation date plus comment page identifier pairs)

In the output structure, content items such as comments are represented as

paul@30

171

pages and each reference a content version. Since comments will ultimately be

paul@30

172

represented as subpages of some parent page, they will have a pagetitle file

paul@30

173

in their directory with an appropriate subpage name written according to the

paul@30

174

parent page's name and comment details.

Troubleshooting

paul@20

177

---------------

The page package import activity in particular can be a source of problems.

paul@20

180

Generally, any error occurring when attempting to import a package is likely

paul@20

181

to be due to insufficient privileges when writing to the pages directory of a

paul@100

182

wiki or to its edit-log file.

The moinsetup software can generate scripts that set the ownership of wiki

paul@20

185

files or apply ACLs (access control lists) to those files in order to make

paul@100

186

access to wiki data more convenient. Where the ownership of the files must be

paul@20

187

set (to www-data or nobody), the import step can be run as that user given

paul@20

188

sufficient privileges. However, the easiest solution is to apply ACLs, thus

paul@100

189

allowing the user who created the wiki to retain write access to it.

Contact, Copyright and Licence Information

paul@20

192

------------------------------------------

The current Web page for ConfluenceConverter at the time of release is:

http://hgweb.boddie.org.uk/ConfluenceConverter

Copyright and licence information can be found in the docs directory - see

paul@20

199

docs/COPYING.txt and docs/LICENCE.txt for more information.

paul@20	1	Introduction
paul@20	2	------------
paul@20	3
paul@20	4	ConfluenceConverter is a distribution of software that converts exported data
paul@100	5	from Confluence wiki instances, provided in the form of an XML file, to a
paul@100	6	collection of wiki pages and resources that can be imported into a MoinMoin
paul@20	7	instance as a page package.
paul@20	8
paul@20	9	Prerequisites
paul@20	10	-------------
paul@20	11
paul@20	12	ConfluenceConverter requires a library called xmlread that can be found at the
paul@20	13	following location:
paul@20	14
paul@20	15	http://hgweb.boddie.org.uk/xmlread
paul@20	16
paul@20	17	The xmlread.py file from the xmlread distribution can be copied into the
paul@20	18	ConfluenceConverter directory.
paul@20	19
paul@40	20	ConfluenceConverter also requires access to the MoinMoin.wikiutil module found
paul@40	21	in the MoinMoin distribution.
paul@40	22
paul@20	23	The moinsetup program is highly recommended for the installation of page
paul@100	24	packages and the management of MoinMoin wiki instances:
paul@20	25
paul@20	26	http://moinmo.in/ScriptMarket/moinsetup
paul@20	27
paul@20	28	If moinsetup is not being used, the page package installer documentation
paul@20	29	should be consulted:
paul@20	30
paul@20	31	http://moinmo.in/HelpOnPackageInstaller
paul@20	32
paul@100	33	MoinMoin Prerequisites
paul@100	34	----------------------
paul@100	35
paul@100	36	The page package installer does not preserve user information when installing
paul@100	37	page revisions. This can be modified by applying a patch to MoinMoin as
paul@100	38	follows while at the top level of the MoinMoin source distribution:
paul@100	39
paul@100	40	patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-packages.diff
paul@100	41
paul@100	42	Here, CCDIR is the path to the top level of this source distribution where
paul@100	43	this README.txt file is found.
paul@100	44
paul@49	45	Wiki Content Prerequisites
paul@49	46	--------------------------
paul@49	47
paul@49	48	For the output of the converter, the following MoinMoin extensions are
paul@49	49	required:
paul@49	50
paul@49	51	http://moinmo.in/ParserMarket/ImprovedTableParser
paul@49	52	http://hgweb.boddie.org.uk/MoinSupport
paul@79	53	http://moinmo.in/MacroMarket/Color2
paul@49	54
paul@20	55	Quick Start
paul@20	56	-----------
paul@20	57
paul@100	58	Given an XML export archive file for a Confluence wiki instance (in the
paul@100	59	example below, the file is called COM-123456-789012.zip), the following
paul@100	60	command can be used to prepare a page package for MoinMoin:
paul@20	61
paul@100	62	python convert.py COM-123456-789012.zip COM
paul@20	63
paul@20	64	In addition to the filename, a workspace name is required. Confluence appears
paul@20	65	to require a workspace as a container for collections of pages, but this also
paul@100	66	permits us to selectively import parts of a wiki into MoinMoin. If attachments
paul@100	67	were included in the export from Confluence, these will be imported into the
paul@100	68	page package.
paul@20	69
paul@20	70	The result of the above command will be a directory having the same name as
paul@20	71	the chosen workspace, together with a zip archive for that directory's
paul@20	72	contents. Thus, the above command would produce a directory called COM and an
paul@20	73	archive called COM.zip.
paul@20	74
paul@20	75	To import the result, use moinsetup as follows:
paul@20	76
paul@20	77	python moinsetup.py -m install_page_package COM.zip
paul@20	78
paul@20	79	This requires a suitable moinsetup.cfg file in the working directory.
paul@20	80
paul@100	81	Mappings from Identifiers to Pages
paul@100	82	----------------------------------
paul@100	83
paul@100	84	Confluence uses numbers to label content revisions, and links to Confluence
paul@100	85	sites sometimes use these numbers instead of a readable page name. MoinMoin,
paul@100	86	meanwhile, only uses page names and has no external numeric identifier scheme.
paul@100	87	Consequently, it is necessary to produce a mapping from Confluence identifiers
paul@100	88	to MoinMoin page names. In addition to numeric identifiers, Confluence also
paul@100	89	provides "tiny URLs" which are an alphanumeric encoding of the numeric
paul@100	90	identifiers.
paul@100	91
paul@100	92	To generate mappings for the Confluence content, use the mappings script as
paul@100	93	follows:
paul@100	94
paul@103	95	tools/mappings.sh COM
paul@100	96
paul@100	97	Here, COM is a directory name containing converted Confluence content,
paul@100	98	corresponding to a space name in the original Confluence wiki. More than one
paul@100	99	space name can be used to generate a complete mapping for a site.
paul@100	100
paul@100	101	The following files are generated:
paul@100	102
paul@100	103	* mapping-id-to-page.txt
paul@100	104	* mapping-tiny-to-id.txt
paul@100	105	* mapping-tiny-to-page.txt
paul@100	106
paul@100	107	The most useful of these is the first as it includes all the necessary
paul@100	108	information provided by the arbitrary mapping from identifiers to page names.
paul@100	109	The second mapping merely converts the "tiny URLs" to identifiers, which can
paul@100	110	be done by applying an algorithm without any external knowledge of the wiki
paul@100	111	structure. The third mapping is provided as a convenience, combining the "tiny
paul@100	112	URL" conversion and the arbitrary mapping to page names.
paul@100	113
paul@102	114	Translating Requests Using the Mappings
paul@102	115	---------------------------------------
paul@102	116
paul@100	117	Where Web server facilities such as RewriteMap are available for use, the
paul@100	118	first and third mapping files can be used directly. See the Apache
paul@100	119	documentation for details of RewriteMap:
paul@100	120
paul@100	121	http://httpd.apache.org/docs/2.4/rewrite/rewritemap.html
paul@100	122
paul@100	123	Otherwise, it is more likely that the first file is used by a program that can
paul@100	124	perform a redirect to the appropriate wiki page, and the "tiny URL" decoding
paul@100	125	is also done by this program when deployed in a suitable location to receive
paul@102	126	such requests. To support this, the following resources are provided:
paul@102	127
paul@102	128	* scripts/redirect.py
paul@102	129	* config/mailmanwiki-redirect
paul@102	130
paul@102	131	The latter configuration file should be combined with the Web server
paul@102	132	configuration file such that the appropriate aliases are able to capture
paul@102	133	requests and invoke the redirect.py script before the main wiki aliases are
paul@102	134	consulted. The script itself should be placed in a suitable filesystem
paul@102	135	location, and the mapping-id-to-page.txt file should be placed alongside it,
paul@102	136	or it should be placed in a different location and the MAPPING_ID_TO_PAGE
paul@102	137	variable changed in the script to refer to this different location.
paul@100	138
paul@30	139	Output Structure
paul@30	140	----------------
paul@30	141
paul@30	142	The structure of a converted workspace is a directory hierarchy containing the
paul@30	143	following directories:
paul@30	144
paul@30	145	* pages (a collection of directories defining each page or content item,
paul@30	146	corresponding to Page, Comment and BlogPost elements in the XML
paul@30	147	exported from Confluence)
paul@30	148
paul@30	149	* versions (a collection of files, each defining a revision or version of
paul@30	150	some content, corresponding to BodyContent elements in the XML
paul@30	151	exported from Confluence)
paul@30	152
paul@30	153	Each page directory contains the following things:
paul@30	154
paul@100	155	* pagetype (either "Page", "Comment" or "BlogPost")
paul@100	156
paul@40	157	* manifest (a list of version entries in a format similar to the MoinMoin
paul@40	158	page package manifest format)
paul@30	159
paul@40	160	* attachments (a list of attachment version entries in a format similar to
paul@40	161	the MoinMoin page package manifest format)
paul@30	162
paul@40	163	* pagetitle (an optional page title imposed on the page by another content
paul@40	164	item)
paul@40	165
paul@40	166	* children (a list of child page names defined for the page)
paul@30	167
paul@100	168	* comments (a list of creation date plus comment page identifier pairs)
paul@100	169
paul@30	170	In the output structure, content items such as comments are represented as
paul@30	171	pages and each reference a content version. Since comments will ultimately be
paul@30	172	represented as subpages of some parent page, they will have a pagetitle file
paul@30	173	in their directory with an appropriate subpage name written according to the
paul@30	174	parent page's name and comment details.
paul@30	175
paul@20	176	Troubleshooting
paul@20	177	---------------
paul@20	178
paul@20	179	The page package import activity in particular can be a source of problems.
paul@20	180	Generally, any error occurring when attempting to import a package is likely
paul@20	181	to be due to insufficient privileges when writing to the pages directory of a
paul@100	182	wiki or to its edit-log file.
paul@20	183
paul@100	184	The moinsetup software can generate scripts that set the ownership of wiki
paul@20	185	files or apply ACLs (access control lists) to those files in order to make
paul@100	186	access to wiki data more convenient. Where the ownership of the files must be
paul@20	187	set (to www-data or nobody), the import step can be run as that user given
paul@20	188	sufficient privileges. However, the easiest solution is to apply ACLs, thus
paul@100	189	allowing the user who created the wiki to retain write access to it.
paul@20	190
paul@20	191	Contact, Copyright and Licence Information
paul@20	192	------------------------------------------
paul@20	193
paul@20	194	The current Web page for ConfluenceConverter at the time of release is:
paul@20	195
paul@20	196	http://hgweb.boddie.org.uk/ConfluenceConverter
paul@20	197
paul@20	198	Copyright and licence information can be found in the docs directory - see
paul@20	199	docs/COPYING.txt and docs/LICENCE.txt for more information.

ConfluenceConverter

Annotated README.txt