1 Introduction
2 ------------
3
4 ConfluenceConverter is a distribution of software that converts exported data
5 from Confluence wiki instances, provided in the form of an XML file, to a
6 collection of wiki pages and resources that can be imported into a MoinMoin
7 instance as a page package.
8
9 Migration Activities
10 --------------------
11
12 The following activities are involved in a migration from Confluence to
13 MoinMoin. First, the activities that can be performed from any location:
14
15 * Export of Confluence content
16 * Conversion of Confluence content to MoinMoin content
17 * Confluence page identifier extraction and mapping to MoinMoin identifiers
18 * Acquisition of Confluence user profile details
19
20 Then, the activities that are performed on the server:
21
22 * Installation of MoinMoin
23 * Initialisation of a MoinMoin wiki instance
24 * Import of MoinMoin content into the new wiki instance
25 * Installation of MoinMoin extensions
26 * Initialisation of user profiles in MoinMoin
27 * Installation of scripts and identifier mappings
28 * Filesystem permission adjustments
29
30 Prerequisites
31 -------------
32
33 ConfluenceConverter requires a library called xmlread that can be found at the
34 following location:
35
36 http://hgweb.boddie.org.uk/xmlread
37
38 The xmlread.py file from the xmlread distribution can be copied into the
39 ConfluenceConverter directory.
40
41 ConfluenceConverter also requires access to the MoinMoin.wikiutil module found
42 in the MoinMoin distribution.
43
44 The moinsetup program is highly recommended for the installation of page
45 packages and the management of MoinMoin wiki instances:
46
47 http://moinmo.in/ScriptMarket/moinsetup
48
49 If moinsetup is not being used, the page package installer documentation
50 should be consulted:
51
52 http://moinmo.in/HelpOnPackageInstaller
53
54 To read Confluence user profiles on live Confluence sites using the
55 get_profiles.py program, the libxml2dom library is required:
56
57 http://hgweb.boddie.org.uk/libxml2dom
58
59 MoinMoin Prerequisites
60 ----------------------
61
62 The page package installer does not preserve user information or the last
63 modified time when installing page revisions. This can be modified by applying
64 a patch to MoinMoin as follows while at the top level of the MoinMoin source
65 distribution:
66
67 patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-packages.diff
68
69 Here, CCDIR is the path to the top level of this source distribution where
70 this README.txt file is found.
71
72 When importing users, MoinMoin may be unable to handle user information
73 containing non-ASCII characters. Another patch to solve such problems can be
74 applied to MoinMoin as follows:
75
76 patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-user.diff
77
78 Wiki Content Prerequisites
79 --------------------------
80
81 For the output of the converter, the following MoinMoin extensions are
82 required:
83
84 http://moinmo.in/ParserMarket/ImprovedTableParser
85 http://moinmo.in/ActionMarket/SubpageComments
86 http://moinmo.in/MacroMarket/Color2
87
88 A common dependency of various extensions is provided by MoinSupport:
89
90 http://hgweb.boddie.org.uk/MoinSupport
91
92 Additional Software
93 -------------------
94
95 PDF export support requires the ExportPDF action:
96
97 http://moinmo.in/ActionMarket/ExportPDF
98
99 This in turn requires Apache FOP for PDF production using XSL-FO:
100
101 http://xmlgraphics.apache.org/fop/
102
103 (On Debian systems, the fop package provides this tool.)
104
105 To produce XSL-FO from DocBook output, xsltproc is required from the libxslt
106 distribution:
107
108 http://xmlsoft.org/XSLT/
109
110 (On Debian systems, the xsltproc package provides this tool.)
111
112 And DocBook output requires the DocBook resources to be installed, described
113 in the following guide:
114
115 http://www.sagehill.net/docbookxsl/ToolsSetup.html
116
117 (On Debian systems, the docbook-xsl package provides these resources.)
118
119 Quick Start
120 -----------
121
122 (!) The acquisition of Confluence wiki content and its conversion can be
123 performed from any location, not necessarily on the server.
124
125 To obtain XML export archives from a Confluence wiki instance, the
126 exportspacexml.action resource is visited and the "Export" button selected.
127 For example, for the Mailman Wiki, the appropriate resource (with the COM
128 namespace selected) is as follows:
129
130 http://wiki.list.org/spaces/exportspacexml.action?key=COM
131
132 For your own instance, adjust the above URL accordingly. Alternatively, you
133 can find your way to the export page by selecting a namespace, then choosing
134 "Advanced" from the "Browse" menu, and then choosing "XML Export" from the
135 "Export" sidebar.
136
137 Given an XML export archive file for a Confluence wiki instance (in the
138 example below, the file is called COM-123456-789012.zip), the following
139 command can be used to prepare a page package for MoinMoin:
140
141 python convert.py COM-123456-789012.zip COM
142
143 In addition to the filename, a workspace name is required. Confluence appears
144 to require a workspace as a container for collections of pages, but this also
145 permits us to selectively import parts of a wiki into MoinMoin. If attachments
146 were included in the export from Confluence, these will be imported into the
147 page package.
148
149 The result of the above command will be a directory having the same name as
150 the chosen workspace, together with a zip archive for that directory's
151 contents. Thus, the above command would produce a directory called COM and an
152 archive called COM.zip.
153
154 (!) The following step is performed on the server.
155
156 To import the result (although you may wish to process other namespaces
157 first), use moinsetup as follows:
158
159 python moinsetup.py -m install_page_package COM.zip
160
161 This requires a suitable moinsetup.cfg file in the working directory.
162
163 Importing Many Workspaces/Namespaces
164 ------------------------------------
165
166 Where more than one namespace is to be imported, the page packages should be
167 merged so that the resulting history information is ordered correctly.
168
169 (!) This process can be performed from any location and the result uploaded to
170 the server for eventual import.
171
172 To merge packages, use a command of the following form:
173
174 python merge.py OUT COM.zip DEV.zip DOC.zip SEC.zip
175
176 A directory called OUT and a page package called OUT.zip will be produced. The
177 latter can then be imported into MoinMoin as described above.
178
179 Mappings from Identifiers to Pages
180 ----------------------------------
181
182 Confluence uses numbers to label content revisions, and links to Confluence
183 sites sometimes use these numbers instead of a readable page name. MoinMoin,
184 meanwhile, only uses page names and has no external numeric identifier scheme.
185 Consequently, it is necessary to produce a mapping from Confluence identifiers
186 to MoinMoin page names. In addition to numeric identifiers, Confluence also
187 provides "tiny URLs" which are an alphanumeric encoding of the numeric
188 identifiers.
189
190 (!) This process can be performed with the converted content from any
191 location, with the generated files uploaded to the server for eventual
192 deployment.
193
194 To generate mappings for the Confluence content, use the mappings script as
195 follows:
196
197 tools/mappings.sh COM
198
199 Here, COM is a directory name containing converted Confluence content,
200 corresponding to a space name in the original Confluence wiki. More than one
201 space name can be used to generate a complete mapping for a site. For example:
202
203 tools/mappings.sh COM DEV DOC SEC
204
205 The following files are generated:
206
207 * mapping-id-to-page.txt
208 * mapping-tiny-to-id.txt
209 * mapping-tiny-to-page.txt
210
211 The most useful of these is the first as it includes all the necessary
212 information provided by the arbitrary mapping from identifiers to page names.
213 The second mapping merely converts the "tiny URLs" to identifiers, which can
214 be done by applying an algorithm without any external knowledge of the wiki
215 structure. The third mapping is provided as a convenience, combining the "tiny
216 URL" conversion and the arbitrary mapping to page names.
217
218 Translating Requests Using the Mappings
219 ---------------------------------------
220
221 Where Web server facilities such as RewriteMap are available for use, the
222 first and third mapping files can be used directly. See the Apache
223 documentation for details of RewriteMap:
224
225 http://httpd.apache.org/docs/2.4/rewrite/rewritemap.html
226
227 Otherwise, it is more likely that the first file is used by a program that can
228 perform a redirect to the appropriate wiki page, and the "tiny URL" decoding
229 is also done by this program when deployed in a suitable location to receive
230 such requests. To support this, the following resources are provided:
231
232 * scripts/redirect.py
233 * config/mailmanwiki-redirect
234
235 The latter configuration file should be combined with the Web server
236 configuration file such that the appropriate aliases are able to capture
237 requests and invoke the redirect.py script before the main wiki aliases are
238 consulted. The script itself should be placed in a suitable filesystem
239 location, and the mapping-id-to-page.txt file should be placed alongside it,
240 or it should be placed in a different location and the MAPPING_ID_TO_PAGE
241 variable changed in the script to refer to this different location.
242
243 Supporting Confluence Action URLs
244 ---------------------------------
245
246 Besides the "viewpage" action mapping identifiers to pages (covered by the
247 mapping described above), some other action URLs may be used in wiki content
248 and must either be translated or supported using redirects. Since external
249 sites may also employ such actions, a redirect strategy perhaps makes more
250 sense. To support this, the following resources are involved:
251
252 * scripts/dashboard.py
253 * scripts/redirect.py
254 * scripts/search.py
255 * config/mailmanwiki-redirect
256
257 The latter configuration file is also involved in identifier-to-page mapping,
258 but in this case it causes requests to the "dashboard", "doexportpage" and
259 "dosearchsite" actions to be directed to the dashboard.py, redirect.py and
260 search.py scripts respectively.
261
262 The dashboard.py script merely redirects requests to the root of the site,
263 thus assuming that the front page is configured to show dashboard-like
264 information.
265
266 The redirect.py script, apart from supporting identifier-to-page redirects,
267 also supports PDF page exports since the "doexportpage" action uses
268 identifiers to indicate which page is to be exported. In an environment that
269 uses .htaccess and mod_rewrite, the redirect.py script should also be deployed
270 under separate names (such as export.py and exportpdf.py) so that it can
271 discover whether it should be exporting a page instead of just showing it.
272
273 The search.py script redirects search requests in a suitable form to the
274 MoinMoin "fullsearch" action.
275
276 Identifying and Migrating Users
277 -------------------------------
278
279 Confluence export archives do not contain user profile information, but page
280 versions are marked with user identifiers. Therefore, a list of user
281 identifiers can be obtained by running a script extracting these identifiers.
282 The following command writes to standard output the users involved with
283 editing the wiki in four different spaces (exported to four directories):
284
285 tools/users.sh COM DEV DOC SEC
286
287 This output can be edited and then passed to a program which fetches other
288 profile details as follows:
289
290 tools/users.sh COM DEV DOC SEC > users.txt
291
292 After editing...
293
294 cat users.txt \
295 | tools/get_profiles.py http://wiki.list.org/ \
296 > profiles.txt
297
298 If no users are to be removed in migration, the following command could be
299 issued:
300
301 tools/users.sh COM DEV DOC SEC \
302 | tools/get_profiles.py http://wiki.list.org/ \
303 > profiles.txt
304
305 The get_profiles.py program needs to be told the URL of the original
306 Confluence site. Note that it accesses the site at a default rate of around
307 one request per second; a different delay between requests can be specified
308 using an additional argument.
309
310 (!) The above steps can be performed from any location, but the command
311 pipelines below need to be run on the server due to the use of a program that
312 updates the deployed wiki.
313
314 The output of the get_profiles.py program can be passed to another program
315 which adds users to MoinMoin, and so the following commands can be used:
316
317 cat profiles.txt \
318 | tools/addusers.py wiki
319
320 Alternatively, the users can be converted to profiles and immediately added
321 without creating a profiles file:
322
323 cat users.txt \
324 | tools/get_profiles.py http://wiki.list.org/ \
325 | tools/addusers.py wiki
326
327 Or just using one single command without inspecting the users or profiles at
328 all:
329
330 tools/users.sh COM DEV DOC SEC \
331 | tools/get_profiles.py http://wiki.list.org/ \
332 | tools/addusers.py wiki
333
334 The addusers.py program needs to be told the directory containing the wiki
335 configuration.
336
337 Output Structure
338 ----------------
339
340 The structure of a converted workspace is a directory hierarchy containing the
341 following directories:
342
343 * pages (a collection of directories defining each page or content item,
344 corresponding to Page, Comment and BlogPost elements in the XML
345 exported from Confluence)
346
347 * versions (a collection of files, each defining a revision or version of
348 some content, corresponding to BodyContent elements in the XML
349 exported from Confluence)
350
351 Each page directory contains the following things:
352
353 * pagetype (either "Page", "Comment" or "BlogPost")
354
355 * manifest (a list of version entries in a format similar to the MoinMoin
356 page package manifest format)
357
358 * attachments (a list of attachment version entries in a format similar to
359 the MoinMoin page package manifest format)
360
361 * pagetitle (an optional page title imposed on the page by another content
362 item)
363
364 * children (a list of child page names defined for the page)
365
366 * comments (a list of creation date plus comment page identifier pairs)
367
368 In the output structure, content items such as comments are represented as
369 pages and each reference a content version. Since comments will ultimately be
370 represented as subpages of some parent page, they will have a pagetitle file
371 in their directory with an appropriate subpage name written according to the
372 parent page's name and comment details.
373
374 Troubleshooting
375 ---------------
376
377 The page package import activity in particular can be a source of problems.
378 Generally, any error occurring when attempting to import a package is likely
379 to be due to insufficient privileges when writing to the pages directory of a
380 wiki or to its edit-log file.
381
382 The moinsetup software can generate scripts that set the ownership of wiki
383 files or apply ACLs (access control lists) to those files in order to make
384 access to wiki data more convenient. Where the ownership of the files must be
385 set (to www-data or nobody), the import step can be run as that user given
386 sufficient privileges. However, the easiest solution is to apply ACLs, thus
387 allowing the user who created the wiki to retain write access to it.
388
389 Contact, Copyright and Licence Information
390 ------------------------------------------
391
392 The current Web page for ConfluenceConverter at the time of release is:
393
394 http://hgweb.boddie.org.uk/ConfluenceConverter
395
396 Copyright and licence information can be found in the docs directory - see
397 docs/COPYING.txt and docs/LICENCE.txt for more information.