paul@20 | 1 | Introduction
|
paul@20 | 2 | ------------
|
paul@20 | 3 |
|
paul@20 | 4 | ConfluenceConverter is a distribution of software that converts exported data
|
paul@100 | 5 | from Confluence wiki instances, provided in the form of an XML file, to a
|
paul@100 | 6 | collection of wiki pages and resources that can be imported into a MoinMoin
|
paul@20 | 7 | instance as a page package.
|
paul@20 | 8 |
|
paul@127 | 9 | Migration Activities
|
paul@127 | 10 | --------------------
|
paul@127 | 11 |
|
paul@127 | 12 | The following activities are involved in a migration from Confluence to
|
paul@138 | 13 | MoinMoin. First, the activities that can be performed from any location:
|
paul@127 | 14 |
|
paul@127 | 15 | * Export of Confluence content
|
paul@127 | 16 | * Conversion of Confluence content to MoinMoin content
|
paul@127 | 17 | * Confluence page identifier extraction and mapping to MoinMoin identifiers
|
paul@127 | 18 | * Acquisition of Confluence user profile details
|
paul@138 | 19 |
|
paul@138 | 20 | Then, the activities that are performed on the server:
|
paul@138 | 21 |
|
paul@127 | 22 | * Installation of MoinMoin
|
paul@127 | 23 | * Initialisation of a MoinMoin wiki instance
|
paul@127 | 24 | * Import of MoinMoin content into the new wiki instance
|
paul@127 | 25 | * Installation of MoinMoin extensions
|
paul@127 | 26 | * Initialisation of user profiles in MoinMoin
|
paul@127 | 27 | * Installation of scripts and identifier mappings
|
paul@127 | 28 | * Filesystem permission adjustments
|
paul@127 | 29 |
|
paul@20 | 30 | Prerequisites
|
paul@20 | 31 | -------------
|
paul@20 | 32 |
|
paul@20 | 33 | ConfluenceConverter requires a library called xmlread that can be found at the
|
paul@20 | 34 | following location:
|
paul@20 | 35 |
|
paul@20 | 36 | http://hgweb.boddie.org.uk/xmlread
|
paul@20 | 37 |
|
paul@20 | 38 | The xmlread.py file from the xmlread distribution can be copied into the
|
paul@20 | 39 | ConfluenceConverter directory.
|
paul@20 | 40 |
|
paul@40 | 41 | ConfluenceConverter also requires access to the MoinMoin.wikiutil module found
|
paul@40 | 42 | in the MoinMoin distribution.
|
paul@40 | 43 |
|
paul@20 | 44 | The moinsetup program is highly recommended for the installation of page
|
paul@100 | 45 | packages and the management of MoinMoin wiki instances:
|
paul@20 | 46 |
|
paul@20 | 47 | http://moinmo.in/ScriptMarket/moinsetup
|
paul@20 | 48 |
|
paul@20 | 49 | If moinsetup is not being used, the page package installer documentation
|
paul@20 | 50 | should be consulted:
|
paul@20 | 51 |
|
paul@20 | 52 | http://moinmo.in/HelpOnPackageInstaller
|
paul@20 | 53 |
|
paul@106 | 54 | To read Confluence user profiles on live Confluence sites using the
|
paul@106 | 55 | get_profiles.py program, the libxml2dom library is required:
|
paul@106 | 56 |
|
paul@106 | 57 | http://hgweb.boddie.org.uk/libxml2dom
|
paul@106 | 58 |
|
paul@100 | 59 | MoinMoin Prerequisites
|
paul@100 | 60 | ----------------------
|
paul@100 | 61 |
|
paul@123 | 62 | The page package installer does not preserve user information or the last
|
paul@123 | 63 | modified time when installing page revisions. This can be modified by applying
|
paul@123 | 64 | a patch to MoinMoin as follows while at the top level of the MoinMoin source
|
paul@123 | 65 | distribution:
|
paul@100 | 66 |
|
paul@100 | 67 | patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-packages.diff
|
paul@100 | 68 |
|
paul@100 | 69 | Here, CCDIR is the path to the top level of this source distribution where
|
paul@100 | 70 | this README.txt file is found.
|
paul@100 | 71 |
|
paul@133 | 72 | When importing users, MoinMoin may be unable to handle user information
|
paul@133 | 73 | containing non-ASCII characters. Another patch to solve such problems can be
|
paul@133 | 74 | applied to MoinMoin as follows:
|
paul@133 | 75 |
|
paul@133 | 76 | patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-user.diff
|
paul@133 | 77 |
|
paul@49 | 78 | Wiki Content Prerequisites
|
paul@49 | 79 | --------------------------
|
paul@49 | 80 |
|
paul@49 | 81 | For the output of the converter, the following MoinMoin extensions are
|
paul@49 | 82 | required:
|
paul@49 | 83 |
|
paul@49 | 84 | http://moinmo.in/ParserMarket/ImprovedTableParser
|
paul@133 | 85 | http://moinmo.in/ActionMarket/SubpageComments
|
paul@79 | 86 | http://moinmo.in/MacroMarket/Color2
|
paul@49 | 87 |
|
paul@133 | 88 | A common dependency of various extensions is provided by MoinSupport:
|
paul@108 | 89 |
|
paul@133 | 90 | http://hgweb.boddie.org.uk/MoinSupport
|
paul@108 | 91 |
|
paul@118 | 92 | Additional Software
|
paul@118 | 93 | -------------------
|
paul@118 | 94 |
|
paul@118 | 95 | PDF export support requires the ExportPDF action:
|
paul@118 | 96 |
|
paul@118 | 97 | http://moinmo.in/ActionMarket/ExportPDF
|
paul@118 | 98 |
|
paul@118 | 99 | This in turn requires Apache FOP for PDF production using XSL-FO:
|
paul@118 | 100 |
|
paul@118 | 101 | http://xmlgraphics.apache.org/fop/
|
paul@118 | 102 |
|
paul@118 | 103 | (On Debian systems, the fop package provides this tool.)
|
paul@118 | 104 |
|
paul@118 | 105 | To produce XSL-FO from DocBook output, xsltproc is required from the libxslt
|
paul@118 | 106 | distribution:
|
paul@118 | 107 |
|
paul@118 | 108 | http://xmlsoft.org/XSLT/
|
paul@118 | 109 |
|
paul@118 | 110 | (On Debian systems, the xsltproc package provides this tool.)
|
paul@118 | 111 |
|
paul@118 | 112 | And DocBook output requires the DocBook resources to be installed, described
|
paul@118 | 113 | in the following guide:
|
paul@118 | 114 |
|
paul@118 | 115 | http://www.sagehill.net/docbookxsl/ToolsSetup.html
|
paul@118 | 116 |
|
paul@118 | 117 | (On Debian systems, the docbook-xsl package provides these resources.)
|
paul@118 | 118 |
|
paul@20 | 119 | Quick Start
|
paul@20 | 120 | -----------
|
paul@20 | 121 |
|
paul@138 | 122 | (!) The acquisition of Confluence wiki content and its conversion can be
|
paul@138 | 123 | performed from any location, not necessarily on the server.
|
paul@138 | 124 |
|
paul@136 | 125 | To obtain XML export archives from a Confluence wiki instance, the
|
paul@136 | 126 | exportspacexml.action resource is visited and the "Export" button selected.
|
paul@136 | 127 | For example, for the Mailman Wiki, the appropriate resource (with the COM
|
paul@136 | 128 | namespace selected) is as follows:
|
paul@136 | 129 |
|
paul@136 | 130 | http://wiki.list.org/spaces/exportspacexml.action?key=COM
|
paul@136 | 131 |
|
paul@136 | 132 | For your own instance, adjust the above URL accordingly. Alternatively, you
|
paul@136 | 133 | can find your way to the export page by selecting a namespace, then choosing
|
paul@136 | 134 | "Advanced" from the "Browse" menu, and then choosing "XML Export" from the
|
paul@136 | 135 | "Export" sidebar.
|
paul@136 | 136 |
|
paul@100 | 137 | Given an XML export archive file for a Confluence wiki instance (in the
|
paul@100 | 138 | example below, the file is called COM-123456-789012.zip), the following
|
paul@100 | 139 | command can be used to prepare a page package for MoinMoin:
|
paul@20 | 140 |
|
paul@100 | 141 | python convert.py COM-123456-789012.zip COM
|
paul@20 | 142 |
|
paul@20 | 143 | In addition to the filename, a workspace name is required. Confluence appears
|
paul@20 | 144 | to require a workspace as a container for collections of pages, but this also
|
paul@100 | 145 | permits us to selectively import parts of a wiki into MoinMoin. If attachments
|
paul@100 | 146 | were included in the export from Confluence, these will be imported into the
|
paul@100 | 147 | page package.
|
paul@20 | 148 |
|
paul@20 | 149 | The result of the above command will be a directory having the same name as
|
paul@20 | 150 | the chosen workspace, together with a zip archive for that directory's
|
paul@20 | 151 | contents. Thus, the above command would produce a directory called COM and an
|
paul@20 | 152 | archive called COM.zip.
|
paul@20 | 153 |
|
paul@138 | 154 | (!) The following step is performed on the server.
|
paul@138 | 155 |
|
paul@136 | 156 | To import the result (although you may wish to process other namespaces
|
paul@136 | 157 | first), use moinsetup as follows:
|
paul@20 | 158 |
|
paul@20 | 159 | python moinsetup.py -m install_page_package COM.zip
|
paul@20 | 160 |
|
paul@20 | 161 | This requires a suitable moinsetup.cfg file in the working directory.
|
paul@20 | 162 |
|
paul@136 | 163 | Importing Many Workspaces/Namespaces
|
paul@136 | 164 | ------------------------------------
|
paul@123 | 165 |
|
paul@123 | 166 | Where more than one namespace is to be imported, the page packages should be
|
paul@123 | 167 | merged so that the resulting history information is ordered correctly.
|
paul@123 | 168 |
|
paul@138 | 169 | (!) This process can be performed from any location and the result uploaded to
|
paul@138 | 170 | the server for eventual import.
|
paul@138 | 171 |
|
paul@123 | 172 | To merge packages, use a command of the following form:
|
paul@123 | 173 |
|
paul@123 | 174 | python merge.py OUT COM.zip DEV.zip DOC.zip SEC.zip
|
paul@123 | 175 |
|
paul@123 | 176 | A directory called OUT and a page package called OUT.zip will be produced. The
|
paul@123 | 177 | latter can then be imported into MoinMoin as described above.
|
paul@123 | 178 |
|
paul@100 | 179 | Mappings from Identifiers to Pages
|
paul@100 | 180 | ----------------------------------
|
paul@100 | 181 |
|
paul@100 | 182 | Confluence uses numbers to label content revisions, and links to Confluence
|
paul@100 | 183 | sites sometimes use these numbers instead of a readable page name. MoinMoin,
|
paul@100 | 184 | meanwhile, only uses page names and has no external numeric identifier scheme.
|
paul@100 | 185 | Consequently, it is necessary to produce a mapping from Confluence identifiers
|
paul@100 | 186 | to MoinMoin page names. In addition to numeric identifiers, Confluence also
|
paul@100 | 187 | provides "tiny URLs" which are an alphanumeric encoding of the numeric
|
paul@100 | 188 | identifiers.
|
paul@100 | 189 |
|
paul@138 | 190 | (!) This process can be performed with the converted content from any
|
paul@138 | 191 | location, with the generated files uploaded to the server for eventual
|
paul@138 | 192 | deployment.
|
paul@138 | 193 |
|
paul@100 | 194 | To generate mappings for the Confluence content, use the mappings script as
|
paul@100 | 195 | follows:
|
paul@100 | 196 |
|
paul@103 | 197 | tools/mappings.sh COM
|
paul@100 | 198 |
|
paul@100 | 199 | Here, COM is a directory name containing converted Confluence content,
|
paul@100 | 200 | corresponding to a space name in the original Confluence wiki. More than one
|
paul@138 | 201 | space name can be used to generate a complete mapping for a site. For example:
|
paul@138 | 202 |
|
paul@138 | 203 | tools/mappings.sh COM DEV DOC SEC
|
paul@100 | 204 |
|
paul@100 | 205 | The following files are generated:
|
paul@100 | 206 |
|
paul@100 | 207 | * mapping-id-to-page.txt
|
paul@100 | 208 | * mapping-tiny-to-id.txt
|
paul@100 | 209 | * mapping-tiny-to-page.txt
|
paul@100 | 210 |
|
paul@100 | 211 | The most useful of these is the first as it includes all the necessary
|
paul@100 | 212 | information provided by the arbitrary mapping from identifiers to page names.
|
paul@100 | 213 | The second mapping merely converts the "tiny URLs" to identifiers, which can
|
paul@100 | 214 | be done by applying an algorithm without any external knowledge of the wiki
|
paul@100 | 215 | structure. The third mapping is provided as a convenience, combining the "tiny
|
paul@100 | 216 | URL" conversion and the arbitrary mapping to page names.
|
paul@100 | 217 |
|
paul@102 | 218 | Translating Requests Using the Mappings
|
paul@102 | 219 | ---------------------------------------
|
paul@102 | 220 |
|
paul@100 | 221 | Where Web server facilities such as RewriteMap are available for use, the
|
paul@100 | 222 | first and third mapping files can be used directly. See the Apache
|
paul@100 | 223 | documentation for details of RewriteMap:
|
paul@100 | 224 |
|
paul@100 | 225 | http://httpd.apache.org/docs/2.4/rewrite/rewritemap.html
|
paul@100 | 226 |
|
paul@100 | 227 | Otherwise, it is more likely that the first file is used by a program that can
|
paul@100 | 228 | perform a redirect to the appropriate wiki page, and the "tiny URL" decoding
|
paul@100 | 229 | is also done by this program when deployed in a suitable location to receive
|
paul@102 | 230 | such requests. To support this, the following resources are provided:
|
paul@102 | 231 |
|
paul@102 | 232 | * scripts/redirect.py
|
paul@102 | 233 | * config/mailmanwiki-redirect
|
paul@102 | 234 |
|
paul@102 | 235 | The latter configuration file should be combined with the Web server
|
paul@102 | 236 | configuration file such that the appropriate aliases are able to capture
|
paul@102 | 237 | requests and invoke the redirect.py script before the main wiki aliases are
|
paul@102 | 238 | consulted. The script itself should be placed in a suitable filesystem
|
paul@102 | 239 | location, and the mapping-id-to-page.txt file should be placed alongside it,
|
paul@102 | 240 | or it should be placed in a different location and the MAPPING_ID_TO_PAGE
|
paul@102 | 241 | variable changed in the script to refer to this different location.
|
paul@100 | 242 |
|
paul@113 | 243 | Supporting Confluence Action URLs
|
paul@113 | 244 | ---------------------------------
|
paul@113 | 245 |
|
paul@113 | 246 | Besides the "viewpage" action mapping identifiers to pages (covered by the
|
paul@113 | 247 | mapping described above), some other action URLs may be used in wiki content
|
paul@113 | 248 | and must either be translated or supported using redirects. Since external
|
paul@113 | 249 | sites may also employ such actions, a redirect strategy perhaps makes more
|
paul@113 | 250 | sense. To support this, the following resources are involved:
|
paul@113 | 251 |
|
paul@114 | 252 | * scripts/dashboard.py
|
paul@118 | 253 | * scripts/redirect.py
|
paul@113 | 254 | * scripts/search.py
|
paul@113 | 255 | * config/mailmanwiki-redirect
|
paul@113 | 256 |
|
paul@113 | 257 | The latter configuration file is also involved in identifier-to-page mapping,
|
paul@118 | 258 | but in this case it causes requests to the "dashboard", "doexportpage" and
|
paul@118 | 259 | "dosearchsite" actions to be directed to the dashboard.py, redirect.py and
|
paul@118 | 260 | search.py scripts respectively.
|
paul@114 | 261 |
|
paul@114 | 262 | The dashboard.py script merely redirects requests to the root of the site,
|
paul@114 | 263 | thus assuming that the front page is configured to show dashboard-like
|
paul@114 | 264 | information.
|
paul@114 | 265 |
|
paul@118 | 266 | The redirect.py script, apart from supporting identifier-to-page redirects,
|
paul@118 | 267 | also supports PDF page exports since the "doexportpage" action uses
|
paul@134 | 268 | identifiers to indicate which page is to be exported. In an environment that
|
paul@134 | 269 | uses .htaccess and mod_rewrite, the redirect.py script should also be deployed
|
paul@135 | 270 | under separate names (such as export.py and exportpdf.py) so that it can
|
paul@135 | 271 | discover whether it should be exporting a page instead of just showing it.
|
paul@118 | 272 |
|
paul@114 | 273 | The search.py script redirects search requests in a suitable form to the
|
paul@114 | 274 | MoinMoin "fullsearch" action.
|
paul@113 | 275 |
|
paul@104 | 276 | Identifying and Migrating Users
|
paul@104 | 277 | -------------------------------
|
paul@104 | 278 |
|
paul@104 | 279 | Confluence export archives do not contain user profile information, but page
|
paul@104 | 280 | versions are marked with user identifiers. Therefore, a list of user
|
paul@104 | 281 | identifiers can be obtained by running a script extracting these identifiers.
|
paul@104 | 282 | The following command writes to standard output the users involved with
|
paul@104 | 283 | editing the wiki in four different spaces (exported to four directories):
|
paul@104 | 284 |
|
paul@104 | 285 | tools/users.sh COM DEV DOC SEC
|
paul@104 | 286 |
|
paul@105 | 287 | This output can be edited and then passed to a program which fetches other
|
paul@105 | 288 | profile details as follows:
|
paul@104 | 289 |
|
paul@138 | 290 | tools/users.sh COM DEV DOC SEC > users.txt
|
paul@138 | 291 |
|
paul@138 | 292 | After editing...
|
paul@138 | 293 |
|
paul@138 | 294 | cat users.txt \
|
paul@138 | 295 | | tools/get_profiles.py http://wiki.list.org/ \
|
paul@138 | 296 | > profiles.txt
|
paul@104 | 297 |
|
paul@104 | 298 | If no users are to be removed in migration, the following command could be
|
paul@104 | 299 | issued:
|
paul@104 | 300 |
|
paul@138 | 301 | tools/users.sh COM DEV DOC SEC \
|
paul@138 | 302 | | tools/get_profiles.py http://wiki.list.org/ \
|
paul@138 | 303 | > profiles.txt
|
paul@105 | 304 |
|
paul@105 | 305 | The get_profiles.py program needs to be told the URL of the original
|
paul@105 | 306 | Confluence site. Note that it accesses the site at a default rate of around
|
paul@105 | 307 | one request per second; a different delay between requests can be specified
|
paul@105 | 308 | using an additional argument.
|
paul@105 | 309 |
|
paul@138 | 310 | (!) The above steps can be performed from any location, but the command
|
paul@138 | 311 | pipelines below need to be run on the server due to the use of a program that
|
paul@138 | 312 | updates the deployed wiki.
|
paul@138 | 313 |
|
paul@105 | 314 | The output of the get_profiles.py program can be passed to another program
|
paul@105 | 315 | which adds users to MoinMoin, and so the following commands can be used:
|
paul@105 | 316 |
|
paul@138 | 317 | cat profiles.txt \
|
paul@138 | 318 | | tools/addusers.py wiki
|
paul@138 | 319 |
|
paul@138 | 320 | Alternatively, the users can be converted to profiles and immediately added
|
paul@138 | 321 | without creating a profiles file:
|
paul@138 | 322 |
|
paul@105 | 323 | cat users.txt \
|
paul@105 | 324 | | tools/get_profiles.py http://wiki.list.org/ \
|
paul@105 | 325 | | tools/addusers.py wiki
|
paul@105 | 326 |
|
paul@138 | 327 | Or just using one single command without inspecting the users or profiles at
|
paul@138 | 328 | all:
|
paul@105 | 329 |
|
paul@105 | 330 | tools/users.sh COM DEV DOC SEC \
|
paul@105 | 331 | | tools/get_profiles.py http://wiki.list.org/ \
|
paul@105 | 332 | | tools/addusers.py wiki
|
paul@104 | 333 |
|
paul@104 | 334 | The addusers.py program needs to be told the directory containing the wiki
|
paul@105 | 335 | configuration.
|
paul@104 | 336 |
|
paul@30 | 337 | Output Structure
|
paul@30 | 338 | ----------------
|
paul@30 | 339 |
|
paul@30 | 340 | The structure of a converted workspace is a directory hierarchy containing the
|
paul@30 | 341 | following directories:
|
paul@30 | 342 |
|
paul@30 | 343 | * pages (a collection of directories defining each page or content item,
|
paul@30 | 344 | corresponding to Page, Comment and BlogPost elements in the XML
|
paul@30 | 345 | exported from Confluence)
|
paul@30 | 346 |
|
paul@30 | 347 | * versions (a collection of files, each defining a revision or version of
|
paul@30 | 348 | some content, corresponding to BodyContent elements in the XML
|
paul@30 | 349 | exported from Confluence)
|
paul@30 | 350 |
|
paul@30 | 351 | Each page directory contains the following things:
|
paul@30 | 352 |
|
paul@100 | 353 | * pagetype (either "Page", "Comment" or "BlogPost")
|
paul@100 | 354 |
|
paul@40 | 355 | * manifest (a list of version entries in a format similar to the MoinMoin
|
paul@40 | 356 | page package manifest format)
|
paul@30 | 357 |
|
paul@40 | 358 | * attachments (a list of attachment version entries in a format similar to
|
paul@40 | 359 | the MoinMoin page package manifest format)
|
paul@30 | 360 |
|
paul@40 | 361 | * pagetitle (an optional page title imposed on the page by another content
|
paul@40 | 362 | item)
|
paul@40 | 363 |
|
paul@40 | 364 | * children (a list of child page names defined for the page)
|
paul@30 | 365 |
|
paul@100 | 366 | * comments (a list of creation date plus comment page identifier pairs)
|
paul@100 | 367 |
|
paul@30 | 368 | In the output structure, content items such as comments are represented as
|
paul@30 | 369 | pages and each reference a content version. Since comments will ultimately be
|
paul@30 | 370 | represented as subpages of some parent page, they will have a pagetitle file
|
paul@30 | 371 | in their directory with an appropriate subpage name written according to the
|
paul@30 | 372 | parent page's name and comment details.
|
paul@30 | 373 |
|
paul@20 | 374 | Troubleshooting
|
paul@20 | 375 | ---------------
|
paul@20 | 376 |
|
paul@20 | 377 | The page package import activity in particular can be a source of problems.
|
paul@20 | 378 | Generally, any error occurring when attempting to import a package is likely
|
paul@20 | 379 | to be due to insufficient privileges when writing to the pages directory of a
|
paul@100 | 380 | wiki or to its edit-log file.
|
paul@20 | 381 |
|
paul@100 | 382 | The moinsetup software can generate scripts that set the ownership of wiki
|
paul@20 | 383 | files or apply ACLs (access control lists) to those files in order to make
|
paul@100 | 384 | access to wiki data more convenient. Where the ownership of the files must be
|
paul@20 | 385 | set (to www-data or nobody), the import step can be run as that user given
|
paul@20 | 386 | sufficient privileges. However, the easiest solution is to apply ACLs, thus
|
paul@100 | 387 | allowing the user who created the wiki to retain write access to it.
|
paul@20 | 388 |
|
paul@20 | 389 | Contact, Copyright and Licence Information
|
paul@20 | 390 | ------------------------------------------
|
paul@20 | 391 |
|
paul@20 | 392 | The current Web page for ConfluenceConverter at the time of release is:
|
paul@20 | 393 |
|
paul@20 | 394 | http://hgweb.boddie.org.uk/ConfluenceConverter
|
paul@20 | 395 |
|
paul@20 | 396 | Copyright and licence information can be found in the docs directory - see
|
paul@20 | 397 | docs/COPYING.txt and docs/LICENCE.txt for more information.
|