Immutable Page History Attachments

HelpOnXapian

Page name: HelpOnXapian

Xapian

Using Xapian you can dramatically improve the performance of searching in moin and furthermore unlock some more features (see the search prefixes above) not possible with the legacy search engine.

Requirements

You must have Xapian itself and its Python bindings (xapian-core and xapian-bindings) from http://www.xapian.org/ at least in version 1.0.6 installed. In addition, Windows users need to install pywin32 from http://sourceforge.net/projects/pywin32/.

To process attachment files, moin uses filter plugins - here is the list of filter plugins included:

File type

Dependency

Notes

Text files (.txt)

-

tries utf-8 and iso-8859-15 encodings (or forces to ASCII if those do not work)

JPEG images (.jpg)

-

EXIF data is extracted

Open Office files (.sx?)

-

e.g. from older OpenOffice.org/StarOffice versions

Open Document files (.od?)

-

e.g. from recent OpenOffice.org/StarOffice versions

Binary files

-

moin uses a strings like filter to process those, as well as a blacklist with stuff you don't want to search

MS Word files (.doc)

antiword

filter calls antiword

MS Excel files (.xls)

catdoc

filter calls xls2csv

MS Powerpoint files (.ppt)

catdoc

filter calls catppt

PDF files (.pdf)

xpdf-utils or poppler-utils

filter calls pdftotext

After installing additional filters (or dependencies) you should (re)build your index. Xapian will find the new filters / support packages automagically. The next time your search results may contain results linking directly to your attachments.

Configuration

In your wikiconfig, you have several options on how to configure Xapian:

Variable name Default Description
xapian_index_dir None Directory where the Xapian search index is stored (None = auto-configure wiki local storage)
xapian_index_history False True to enable indexing of non-current page revisions.
xapian_search False True to enable the fast, indexed search (based on the Xapian search library)
xapian_stemming False True to enable Xapian word stemmer usage for indexing / searching.
  • xapian_search (default: False)

    • Setting this to True, enables Xapian search for your MoinMoin wiki.

      (!) Moin will auto disable xapian_search (and fall back to slow search) if it doesn't find a usable index. You can see whether it uses Xapian on SystemInfo.

  • xapian_index_history (default: False)

    • If this option is enabled, all revisions of all pages (except underlay, of which only one revision is available) are indexed. This allows users to search in older revisions of pages if enabled in the search dialogue on FindPage.

    • /!\ You need to rebuild your index if you change this option. Also check the size of your index after build if you're running a big wiki as this feature can eat up a lot of disk space. Creating the index might take rather long, if indexing history is enabled.

  • xapian_index_dir (default: None)

    • This option lets you specify a non-default directory to save your index to. Initially, it gets saved to data_dir/cache/xapian/.

    • /!\ Don't forget to (re-)build an/the index/indices after enabling this!

  • xapian_stemming (default: False)

    • If enabled, words will be indexed in their raw and stemmed forms and terms in your search query are stemmed in your language. This means that searching for "testing" will also yield pages containing the words "tested", "tester" etc.
    • /!\ Enabling/disabling this option needs a complete rebuild of your index!

(Re-)Building an index

You can use the supplied command line tool moin to initially build, completely rebuild or update an existing index.

To (re-)build your index please execute:

moin --config-dir=/where/your/configdir/is --wiki-url=wiki-url/ index build --mode=rebuild

For more information about the moin index command, see HelpOnMoinCommand.

/!\ Please note that you must rebuild your index if you change any of the xapian_index_history, xapian_index_dir or xapian_stemming configuration options!

If you have a large site, you may not wish for searching to be unavailable while your index rebuilds. In this case, you can start with moin index build --mode=buildnewindex. It is slow, but won't interfere with the running wiki. Then, stop the wiki, run moin index build --mode=usenewindex to switch to the new index, and restart the wiki.

Testing

You can test if Xapian is enabled and if an index is available by checking SystemInfo. To check if searches are performed using Xapian, enable show_timings in your wikiconfig, perform a search and look for _xapianSearch on the bottom of the page.

Usage

Xapian is basically used the same way as all other search engines. Due to Xapian's advanced features some new search term prefixed were introduced which are not already available in the legacy search engine (commonly referred to as moin search). See HelpOnSearching for more information and/or use the new advanced search dialogue available on FindPage to see what's available and possible.