paul@20 | 1 | Introduction
|
paul@20 | 2 | ------------
|
paul@20 | 3 |
|
paul@20 | 4 | ConfluenceConverter is a distribution of software that converts exported data
|
paul@100 | 5 | from Confluence wiki instances, provided in the form of an XML file, to a
|
paul@100 | 6 | collection of wiki pages and resources that can be imported into a MoinMoin
|
paul@20 | 7 | instance as a page package.
|
paul@20 | 8 |
|
paul@20 | 9 | Prerequisites
|
paul@20 | 10 | -------------
|
paul@20 | 11 |
|
paul@20 | 12 | ConfluenceConverter requires a library called xmlread that can be found at the
|
paul@20 | 13 | following location:
|
paul@20 | 14 |
|
paul@20 | 15 | http://hgweb.boddie.org.uk/xmlread
|
paul@20 | 16 |
|
paul@20 | 17 | The xmlread.py file from the xmlread distribution can be copied into the
|
paul@20 | 18 | ConfluenceConverter directory.
|
paul@20 | 19 |
|
paul@40 | 20 | ConfluenceConverter also requires access to the MoinMoin.wikiutil module found
|
paul@40 | 21 | in the MoinMoin distribution.
|
paul@40 | 22 |
|
paul@20 | 23 | The moinsetup program is highly recommended for the installation of page
|
paul@100 | 24 | packages and the management of MoinMoin wiki instances:
|
paul@20 | 25 |
|
paul@20 | 26 | http://moinmo.in/ScriptMarket/moinsetup
|
paul@20 | 27 |
|
paul@20 | 28 | If moinsetup is not being used, the page package installer documentation
|
paul@20 | 29 | should be consulted:
|
paul@20 | 30 |
|
paul@20 | 31 | http://moinmo.in/HelpOnPackageInstaller
|
paul@20 | 32 |
|
paul@100 | 33 | MoinMoin Prerequisites
|
paul@100 | 34 | ----------------------
|
paul@100 | 35 |
|
paul@100 | 36 | The page package installer does not preserve user information when installing
|
paul@100 | 37 | page revisions. This can be modified by applying a patch to MoinMoin as
|
paul@100 | 38 | follows while at the top level of the MoinMoin source distribution:
|
paul@100 | 39 |
|
paul@100 | 40 | patch -p1 $CCDIR/patches/patch-moin-1.9-MoinMoin-packages.diff
|
paul@100 | 41 |
|
paul@100 | 42 | Here, CCDIR is the path to the top level of this source distribution where
|
paul@100 | 43 | this README.txt file is found.
|
paul@100 | 44 |
|
paul@49 | 45 | Wiki Content Prerequisites
|
paul@49 | 46 | --------------------------
|
paul@49 | 47 |
|
paul@49 | 48 | For the output of the converter, the following MoinMoin extensions are
|
paul@49 | 49 | required:
|
paul@49 | 50 |
|
paul@49 | 51 | http://moinmo.in/ParserMarket/ImprovedTableParser
|
paul@49 | 52 | http://hgweb.boddie.org.uk/MoinSupport
|
paul@79 | 53 | http://moinmo.in/MacroMarket/Color2
|
paul@49 | 54 |
|
paul@20 | 55 | Quick Start
|
paul@20 | 56 | -----------
|
paul@20 | 57 |
|
paul@100 | 58 | Given an XML export archive file for a Confluence wiki instance (in the
|
paul@100 | 59 | example below, the file is called COM-123456-789012.zip), the following
|
paul@100 | 60 | command can be used to prepare a page package for MoinMoin:
|
paul@20 | 61 |
|
paul@100 | 62 | python convert.py COM-123456-789012.zip COM
|
paul@20 | 63 |
|
paul@20 | 64 | In addition to the filename, a workspace name is required. Confluence appears
|
paul@20 | 65 | to require a workspace as a container for collections of pages, but this also
|
paul@100 | 66 | permits us to selectively import parts of a wiki into MoinMoin. If attachments
|
paul@100 | 67 | were included in the export from Confluence, these will be imported into the
|
paul@100 | 68 | page package.
|
paul@20 | 69 |
|
paul@20 | 70 | The result of the above command will be a directory having the same name as
|
paul@20 | 71 | the chosen workspace, together with a zip archive for that directory's
|
paul@20 | 72 | contents. Thus, the above command would produce a directory called COM and an
|
paul@20 | 73 | archive called COM.zip.
|
paul@20 | 74 |
|
paul@20 | 75 | To import the result, use moinsetup as follows:
|
paul@20 | 76 |
|
paul@20 | 77 | python moinsetup.py -m install_page_package COM.zip
|
paul@20 | 78 |
|
paul@20 | 79 | This requires a suitable moinsetup.cfg file in the working directory.
|
paul@20 | 80 |
|
paul@100 | 81 | Mappings from Identifiers to Pages
|
paul@100 | 82 | ----------------------------------
|
paul@100 | 83 |
|
paul@100 | 84 | Confluence uses numbers to label content revisions, and links to Confluence
|
paul@100 | 85 | sites sometimes use these numbers instead of a readable page name. MoinMoin,
|
paul@100 | 86 | meanwhile, only uses page names and has no external numeric identifier scheme.
|
paul@100 | 87 | Consequently, it is necessary to produce a mapping from Confluence identifiers
|
paul@100 | 88 | to MoinMoin page names. In addition to numeric identifiers, Confluence also
|
paul@100 | 89 | provides "tiny URLs" which are an alphanumeric encoding of the numeric
|
paul@100 | 90 | identifiers.
|
paul@100 | 91 |
|
paul@100 | 92 | To generate mappings for the Confluence content, use the mappings script as
|
paul@100 | 93 | follows:
|
paul@100 | 94 |
|
paul@103 | 95 | tools/mappings.sh COM
|
paul@100 | 96 |
|
paul@100 | 97 | Here, COM is a directory name containing converted Confluence content,
|
paul@100 | 98 | corresponding to a space name in the original Confluence wiki. More than one
|
paul@100 | 99 | space name can be used to generate a complete mapping for a site.
|
paul@100 | 100 |
|
paul@100 | 101 | The following files are generated:
|
paul@100 | 102 |
|
paul@100 | 103 | * mapping-id-to-page.txt
|
paul@100 | 104 | * mapping-tiny-to-id.txt
|
paul@100 | 105 | * mapping-tiny-to-page.txt
|
paul@100 | 106 |
|
paul@100 | 107 | The most useful of these is the first as it includes all the necessary
|
paul@100 | 108 | information provided by the arbitrary mapping from identifiers to page names.
|
paul@100 | 109 | The second mapping merely converts the "tiny URLs" to identifiers, which can
|
paul@100 | 110 | be done by applying an algorithm without any external knowledge of the wiki
|
paul@100 | 111 | structure. The third mapping is provided as a convenience, combining the "tiny
|
paul@100 | 112 | URL" conversion and the arbitrary mapping to page names.
|
paul@100 | 113 |
|
paul@102 | 114 | Translating Requests Using the Mappings
|
paul@102 | 115 | ---------------------------------------
|
paul@102 | 116 |
|
paul@100 | 117 | Where Web server facilities such as RewriteMap are available for use, the
|
paul@100 | 118 | first and third mapping files can be used directly. See the Apache
|
paul@100 | 119 | documentation for details of RewriteMap:
|
paul@100 | 120 |
|
paul@100 | 121 | http://httpd.apache.org/docs/2.4/rewrite/rewritemap.html
|
paul@100 | 122 |
|
paul@100 | 123 | Otherwise, it is more likely that the first file is used by a program that can
|
paul@100 | 124 | perform a redirect to the appropriate wiki page, and the "tiny URL" decoding
|
paul@100 | 125 | is also done by this program when deployed in a suitable location to receive
|
paul@102 | 126 | such requests. To support this, the following resources are provided:
|
paul@102 | 127 |
|
paul@102 | 128 | * scripts/redirect.py
|
paul@102 | 129 | * config/mailmanwiki-redirect
|
paul@102 | 130 |
|
paul@102 | 131 | The latter configuration file should be combined with the Web server
|
paul@102 | 132 | configuration file such that the appropriate aliases are able to capture
|
paul@102 | 133 | requests and invoke the redirect.py script before the main wiki aliases are
|
paul@102 | 134 | consulted. The script itself should be placed in a suitable filesystem
|
paul@102 | 135 | location, and the mapping-id-to-page.txt file should be placed alongside it,
|
paul@102 | 136 | or it should be placed in a different location and the MAPPING_ID_TO_PAGE
|
paul@102 | 137 | variable changed in the script to refer to this different location.
|
paul@100 | 138 |
|
paul@30 | 139 | Output Structure
|
paul@30 | 140 | ----------------
|
paul@30 | 141 |
|
paul@30 | 142 | The structure of a converted workspace is a directory hierarchy containing the
|
paul@30 | 143 | following directories:
|
paul@30 | 144 |
|
paul@30 | 145 | * pages (a collection of directories defining each page or content item,
|
paul@30 | 146 | corresponding to Page, Comment and BlogPost elements in the XML
|
paul@30 | 147 | exported from Confluence)
|
paul@30 | 148 |
|
paul@30 | 149 | * versions (a collection of files, each defining a revision or version of
|
paul@30 | 150 | some content, corresponding to BodyContent elements in the XML
|
paul@30 | 151 | exported from Confluence)
|
paul@30 | 152 |
|
paul@30 | 153 | Each page directory contains the following things:
|
paul@30 | 154 |
|
paul@100 | 155 | * pagetype (either "Page", "Comment" or "BlogPost")
|
paul@100 | 156 |
|
paul@40 | 157 | * manifest (a list of version entries in a format similar to the MoinMoin
|
paul@40 | 158 | page package manifest format)
|
paul@30 | 159 |
|
paul@40 | 160 | * attachments (a list of attachment version entries in a format similar to
|
paul@40 | 161 | the MoinMoin page package manifest format)
|
paul@30 | 162 |
|
paul@40 | 163 | * pagetitle (an optional page title imposed on the page by another content
|
paul@40 | 164 | item)
|
paul@40 | 165 |
|
paul@40 | 166 | * children (a list of child page names defined for the page)
|
paul@30 | 167 |
|
paul@100 | 168 | * comments (a list of creation date plus comment page identifier pairs)
|
paul@100 | 169 |
|
paul@30 | 170 | In the output structure, content items such as comments are represented as
|
paul@30 | 171 | pages and each reference a content version. Since comments will ultimately be
|
paul@30 | 172 | represented as subpages of some parent page, they will have a pagetitle file
|
paul@30 | 173 | in their directory with an appropriate subpage name written according to the
|
paul@30 | 174 | parent page's name and comment details.
|
paul@30 | 175 |
|
paul@20 | 176 | Troubleshooting
|
paul@20 | 177 | ---------------
|
paul@20 | 178 |
|
paul@20 | 179 | The page package import activity in particular can be a source of problems.
|
paul@20 | 180 | Generally, any error occurring when attempting to import a package is likely
|
paul@20 | 181 | to be due to insufficient privileges when writing to the pages directory of a
|
paul@100 | 182 | wiki or to its edit-log file.
|
paul@20 | 183 |
|
paul@100 | 184 | The moinsetup software can generate scripts that set the ownership of wiki
|
paul@20 | 185 | files or apply ACLs (access control lists) to those files in order to make
|
paul@100 | 186 | access to wiki data more convenient. Where the ownership of the files must be
|
paul@20 | 187 | set (to www-data or nobody), the import step can be run as that user given
|
paul@20 | 188 | sufficient privileges. However, the easiest solution is to apply ACLs, thus
|
paul@100 | 189 | allowing the user who created the wiki to retain write access to it.
|
paul@20 | 190 |
|
paul@20 | 191 | Contact, Copyright and Licence Information
|
paul@20 | 192 | ------------------------------------------
|
paul@20 | 193 |
|
paul@20 | 194 | The current Web page for ConfluenceConverter at the time of release is:
|
paul@20 | 195 |
|
paul@20 | 196 | http://hgweb.boddie.org.uk/ConfluenceConverter
|
paul@20 | 197 |
|
paul@20 | 198 | Copyright and licence information can be found in the docs directory - see
|
paul@20 | 199 | docs/COPYING.txt and docs/LICENCE.txt for more information.
|