87:b50ba4291c5c
|
2011-01-28 |
Paul Boddie |
changeset
files
shortlog
graph
|
Made the read_sequence method simpler to follow and perhaps slightly more
efficient.
Fixed the PhraseFilter to handle out-of-sequence tokens properly as well as
iterators for different tokens contributing identical positions. |
|
|
iixr/files.py iixr/phrases.py
|
|
86:34f535fe8cb0
|
2011-01-25 |
Paul Boddie |
changeset
files
shortlog
graph
|
Introduced various optimisation attempts. |
|
|
iixr/data.py iixr/files.py iixr/terms.py
|
|
85:c4da9505f73e
|
2011-01-25 |
Paul Boddie |
changeset
files
shortlog
graph
|
Added a threshold or interval which causes the term dictionary to be flushed
when a certain number of document positions have been recorded.
Updated the copyright information. |
|
|
docs/COPYING.txt iixr/index.py
|
|
84:80df3e7605a4
|
2011-01-21 |
Paul Boddie |
changeset
files
shortlog
graph
|
For large numbers of positions, sorting afterwards is likely to be much quicker. |
|
|
iixr/phrases.py
|
|
83:3ddb93334c95
|
2011-01-11 |
Paul Boddie |
changeset
files
shortlog
graph
|
Permit fields for documents to be spread across partitions, potentially because
documents have been added more than once to an index. |
|
|
iixr/merging.py
|
|
82:9867931a9269
|
2010-12-17 |
Paul Boddie |
changeset
files
shortlog
graph
|
Avoid identical adjacent tokens being matched to the same document token. |
|
|
iixr/phrases.py
|
|
81:ea2944f51430
|
2010-11-26 |
Paul Boddie |
changeset
files
shortlog
graph
|
Introduced support for higher-level sequential access to indexes. |
|
|
iixr/index.py iixr/terms.py
|
|
80:e0bd00412dbc
|
2010-11-26 |
Paul Boddie |
changeset
files
shortlog
graph
|
Introduced parameterisation of phrase discovery using different phrase filters
to that provided. |
|
|
iixr/phrases.py
|
|
79:2f94fb23bcff
|
2010-11-26 |
Paul Boddie |
changeset
files
shortlog
graph
|
Updated the copyright and licensing information. |
|
|
iixr/__init__.py
|
|
78:489129c7f225
|
2010-11-26 |
Paul Boddie |
changeset
files
shortlog
graph
|
Changed the from_document method to remember the current document and positions,
although the positions iterator will not be reset upon repeated invocations
involving the same document number. |
|
|
iixr/positions.py
|
|