1 Introduction
2 ------------
3
4 ConfluenceConverter is a distribution of software that converts exported data
5 from Confluence wiki instances, provided in the form of an XML file, to a
6 collection of wiki pages and resources that can be imported into a MoinMoin
7 instance as a page package.
8
9 Migration Activities
10 --------------------
11
12 The following activities are involved in a migration from Confluence to
13 MoinMoin:
14
15 * Export of Confluence content
16 * Conversion of Confluence content to MoinMoin content
17 * Confluence page identifier extraction and mapping to MoinMoin identifiers
18 * Acquisition of Confluence user profile details
19 * Installation of MoinMoin
20 * Initialisation of a MoinMoin wiki instance
21 * Import of MoinMoin content into the new wiki instance
22 * Installation of MoinMoin extensions
23 * Initialisation of user profiles in MoinMoin
24 * Installation of scripts and identifier mappings
25 * Filesystem permission adjustments
26
27 Prerequisites
28 -------------
29
30 ConfluenceConverter requires a library called xmlread that can be found at the
31 following location:
32
33 http://hgweb.boddie.org.uk/xmlread
34
35 The xmlread.py file from the xmlread distribution can be copied into the
36 ConfluenceConverter directory.
37
38 ConfluenceConverter also requires access to the MoinMoin.wikiutil module found
39 in the MoinMoin distribution.
40
41 The moinsetup program is highly recommended for the installation of page
42 packages and the management of MoinMoin wiki instances:
43
44 http://moinmo.in/ScriptMarket/moinsetup
45
46 If moinsetup is not being used, the page package installer documentation
47 should be consulted:
48
49 http://moinmo.in/HelpOnPackageInstaller
50
51 To read Confluence user profiles on live Confluence sites using the
52 get_profiles.py program, the libxml2dom library is required:
53
54 http://hgweb.boddie.org.uk/libxml2dom
55
56 MoinMoin Prerequisites
57 ----------------------
58
59 The page package installer does not preserve user information or the last
60 modified time when installing page revisions. This can be modified by applying
61 a patch to MoinMoin as follows while at the top level of the MoinMoin source
62 distribution:
63
64 patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-packages.diff
65
66 Here, CCDIR is the path to the top level of this source distribution where
67 this README.txt file is found.
68
69 When importing users, MoinMoin may be unable to handle user information
70 containing non-ASCII characters. Another patch to solve such problems can be
71 applied to MoinMoin as follows:
72
73 patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-user.diff
74
75 Wiki Content Prerequisites
76 --------------------------
77
78 For the output of the converter, the following MoinMoin extensions are
79 required:
80
81 http://moinmo.in/ParserMarket/ImprovedTableParser
82 http://moinmo.in/ActionMarket/SubpageComments
83 http://moinmo.in/MacroMarket/Color2
84
85 A common dependency of various extensions is provided by MoinSupport:
86
87 http://hgweb.boddie.org.uk/MoinSupport
88
89 Additional Software
90 -------------------
91
92 PDF export support requires the ExportPDF action:
93
94 http://moinmo.in/ActionMarket/ExportPDF
95
96 This in turn requires Apache FOP for PDF production using XSL-FO:
97
98 http://xmlgraphics.apache.org/fop/
99
100 (On Debian systems, the fop package provides this tool.)
101
102 To produce XSL-FO from DocBook output, xsltproc is required from the libxslt
103 distribution:
104
105 http://xmlsoft.org/XSLT/
106
107 (On Debian systems, the xsltproc package provides this tool.)
108
109 And DocBook output requires the DocBook resources to be installed, described
110 in the following guide:
111
112 http://www.sagehill.net/docbookxsl/ToolsSetup.html
113
114 (On Debian systems, the docbook-xsl package provides these resources.)
115
116 Quick Start
117 -----------
118
119 To obtain XML export archives from a Confluence wiki instance, the
120 exportspacexml.action resource is visited and the "Export" button selected.
121 For example, for the Mailman Wiki, the appropriate resource (with the COM
122 namespace selected) is as follows:
123
124 http://wiki.list.org/spaces/exportspacexml.action?key=COM
125
126 For your own instance, adjust the above URL accordingly. Alternatively, you
127 can find your way to the export page by selecting a namespace, then choosing
128 "Advanced" from the "Browse" menu, and then choosing "XML Export" from the
129 "Export" sidebar.
130
131 Given an XML export archive file for a Confluence wiki instance (in the
132 example below, the file is called COM-123456-789012.zip), the following
133 command can be used to prepare a page package for MoinMoin:
134
135 python convert.py COM-123456-789012.zip COM
136
137 In addition to the filename, a workspace name is required. Confluence appears
138 to require a workspace as a container for collections of pages, but this also
139 permits us to selectively import parts of a wiki into MoinMoin. If attachments
140 were included in the export from Confluence, these will be imported into the
141 page package.
142
143 The result of the above command will be a directory having the same name as
144 the chosen workspace, together with a zip archive for that directory's
145 contents. Thus, the above command would produce a directory called COM and an
146 archive called COM.zip.
147
148 To import the result (although you may wish to process other namespaces
149 first), use moinsetup as follows:
150
151 python moinsetup.py -m install_page_package COM.zip
152
153 This requires a suitable moinsetup.cfg file in the working directory.
154
155 Importing Many Workspaces/Namespaces
156 ------------------------------------
157
158 Where more than one namespace is to be imported, the page packages should be
159 merged so that the resulting history information is ordered correctly.
160
161 To merge packages, use a command of the following form:
162
163 python merge.py OUT COM.zip DEV.zip DOC.zip SEC.zip
164
165 A directory called OUT and a page package called OUT.zip will be produced. The
166 latter can then be imported into MoinMoin as described above.
167
168 Mappings from Identifiers to Pages
169 ----------------------------------
170
171 Confluence uses numbers to label content revisions, and links to Confluence
172 sites sometimes use these numbers instead of a readable page name. MoinMoin,
173 meanwhile, only uses page names and has no external numeric identifier scheme.
174 Consequently, it is necessary to produce a mapping from Confluence identifiers
175 to MoinMoin page names. In addition to numeric identifiers, Confluence also
176 provides "tiny URLs" which are an alphanumeric encoding of the numeric
177 identifiers.
178
179 To generate mappings for the Confluence content, use the mappings script as
180 follows:
181
182 tools/mappings.sh COM
183
184 Here, COM is a directory name containing converted Confluence content,
185 corresponding to a space name in the original Confluence wiki. More than one
186 space name can be used to generate a complete mapping for a site.
187
188 The following files are generated:
189
190 * mapping-id-to-page.txt
191 * mapping-tiny-to-id.txt
192 * mapping-tiny-to-page.txt
193
194 The most useful of these is the first as it includes all the necessary
195 information provided by the arbitrary mapping from identifiers to page names.
196 The second mapping merely converts the "tiny URLs" to identifiers, which can
197 be done by applying an algorithm without any external knowledge of the wiki
198 structure. The third mapping is provided as a convenience, combining the "tiny
199 URL" conversion and the arbitrary mapping to page names.
200
201 Translating Requests Using the Mappings
202 ---------------------------------------
203
204 Where Web server facilities such as RewriteMap are available for use, the
205 first and third mapping files can be used directly. See the Apache
206 documentation for details of RewriteMap:
207
208 http://httpd.apache.org/docs/2.4/rewrite/rewritemap.html
209
210 Otherwise, it is more likely that the first file is used by a program that can
211 perform a redirect to the appropriate wiki page, and the "tiny URL" decoding
212 is also done by this program when deployed in a suitable location to receive
213 such requests. To support this, the following resources are provided:
214
215 * scripts/redirect.py
216 * config/mailmanwiki-redirect
217
218 The latter configuration file should be combined with the Web server
219 configuration file such that the appropriate aliases are able to capture
220 requests and invoke the redirect.py script before the main wiki aliases are
221 consulted. The script itself should be placed in a suitable filesystem
222 location, and the mapping-id-to-page.txt file should be placed alongside it,
223 or it should be placed in a different location and the MAPPING_ID_TO_PAGE
224 variable changed in the script to refer to this different location.
225
226 Supporting Confluence Action URLs
227 ---------------------------------
228
229 Besides the "viewpage" action mapping identifiers to pages (covered by the
230 mapping described above), some other action URLs may be used in wiki content
231 and must either be translated or supported using redirects. Since external
232 sites may also employ such actions, a redirect strategy perhaps makes more
233 sense. To support this, the following resources are involved:
234
235 * scripts/dashboard.py
236 * scripts/redirect.py
237 * scripts/search.py
238 * config/mailmanwiki-redirect
239
240 The latter configuration file is also involved in identifier-to-page mapping,
241 but in this case it causes requests to the "dashboard", "doexportpage" and
242 "dosearchsite" actions to be directed to the dashboard.py, redirect.py and
243 search.py scripts respectively.
244
245 The dashboard.py script merely redirects requests to the root of the site,
246 thus assuming that the front page is configured to show dashboard-like
247 information.
248
249 The redirect.py script, apart from supporting identifier-to-page redirects,
250 also supports PDF page exports since the "doexportpage" action uses
251 identifiers to indicate which page is to be exported. In an environment that
252 uses .htaccess and mod_rewrite, the redirect.py script should also be deployed
253 under separate names (such as export.py and exportpdf.py) so that it can
254 discover whether it should be exporting a page instead of just showing it.
255
256 The search.py script redirects search requests in a suitable form to the
257 MoinMoin "fullsearch" action.
258
259 Identifying and Migrating Users
260 -------------------------------
261
262 Confluence export archives do not contain user profile information, but page
263 versions are marked with user identifiers. Therefore, a list of user
264 identifiers can be obtained by running a script extracting these identifiers.
265 The following command writes to standard output the users involved with
266 editing the wiki in four different spaces (exported to four directories):
267
268 tools/users.sh COM DEV DOC SEC
269
270 This output can be edited and then passed to a program which fetches other
271 profile details as follows:
272
273 tools/users.sh COM DEV DOC SEC > users.txt # for editing
274 cat users.txt | tools/get_profiles.py http://wiki.list.org/
275
276 If no users are to be removed in migration, the following command could be
277 issued:
278
279 tools/users.sh COM DEV DOC SEC | tools/get_profiles.py http://wiki.list.org/
280
281 The get_profiles.py program needs to be told the URL of the original
282 Confluence site. Note that it accesses the site at a default rate of around
283 one request per second; a different delay between requests can be specified
284 using an additional argument.
285
286 The output of the get_profiles.py program can be passed to another program
287 which adds users to MoinMoin, and so the following commands can be used:
288
289 cat users.txt \
290 | tools/get_profiles.py http://wiki.list.org/ \
291 | tools/addusers.py wiki
292
293 And using one single command:
294
295 tools/users.sh COM DEV DOC SEC \
296 | tools/get_profiles.py http://wiki.list.org/ \
297 | tools/addusers.py wiki
298
299 The addusers.py program needs to be told the directory containing the wiki
300 configuration.
301
302 Output Structure
303 ----------------
304
305 The structure of a converted workspace is a directory hierarchy containing the
306 following directories:
307
308 * pages (a collection of directories defining each page or content item,
309 corresponding to Page, Comment and BlogPost elements in the XML
310 exported from Confluence)
311
312 * versions (a collection of files, each defining a revision or version of
313 some content, corresponding to BodyContent elements in the XML
314 exported from Confluence)
315
316 Each page directory contains the following things:
317
318 * pagetype (either "Page", "Comment" or "BlogPost")
319
320 * manifest (a list of version entries in a format similar to the MoinMoin
321 page package manifest format)
322
323 * attachments (a list of attachment version entries in a format similar to
324 the MoinMoin page package manifest format)
325
326 * pagetitle (an optional page title imposed on the page by another content
327 item)
328
329 * children (a list of child page names defined for the page)
330
331 * comments (a list of creation date plus comment page identifier pairs)
332
333 In the output structure, content items such as comments are represented as
334 pages and each reference a content version. Since comments will ultimately be
335 represented as subpages of some parent page, they will have a pagetitle file
336 in their directory with an appropriate subpage name written according to the
337 parent page's name and comment details.
338
339 Troubleshooting
340 ---------------
341
342 The page package import activity in particular can be a source of problems.
343 Generally, any error occurring when attempting to import a package is likely
344 to be due to insufficient privileges when writing to the pages directory of a
345 wiki or to its edit-log file.
346
347 The moinsetup software can generate scripts that set the ownership of wiki
348 files or apply ACLs (access control lists) to those files in order to make
349 access to wiki data more convenient. Where the ownership of the files must be
350 set (to www-data or nobody), the import step can be run as that user given
351 sufficient privileges. However, the easiest solution is to apply ACLs, thus
352 allowing the user who created the wiki to retain write access to it.
353
354 Contact, Copyright and Licence Information
355 ------------------------------------------
356
357 The current Web page for ConfluenceConverter at the time of release is:
358
359 http://hgweb.boddie.org.uk/ConfluenceConverter
360
361 Copyright and licence information can be found in the docs directory - see
362 docs/COPYING.txt and docs/LICENCE.txt for more information.