77:7e79dd580a62
|
2010-11-23 |
Paul Boddie |
changeset
files
shortlog
graph
|
Added support for phrase searching where document positions are specified using
sequences of values, with the first value in each sequence being the token
index/position.
Added more tests of document numbers and position values being specified using
sequences. |
|
|
iixr/phrases.py test.py
|
|
76:f1cbbf5ef885
|
2010-11-22 |
Paul Boddie |
changeset
files
shortlog
graph
|
Made partition discovery more widely available, adding code to find the next
partition number to use, thus avoiding overwriting index data when opening a
writer on an existing index.
Made sure that term and field dictionaries are always written out: this might
not occur if the underlying writers have been obtained from an index writer and
then used to write data directly. |
|
|
iixr/filesystem.py iixr/index.py
|
|
75:8d35240236b2
|
2010-11-22 |
Paul Boddie |
changeset
files
shortlog
graph
|
Added integrity checks for appropriate term and position ordering. |
|
|
iixr/terms.py
|
|
74:d308dc25f5a2
|
2010-11-21 |
Paul Boddie |
changeset
files
shortlog
graph
|
Introduced support for specifying sequences for document numbers and positions,
with the latter being "monotonic" sequences whose elements contain items that
are always greater than or equal to the items in the same position in each
preceding element of the sequence.
Fixed the get_terms method of the term dictionary reader to refer to the
iterator over term information (and not the list of terms provided by the term
index).
Expanded the tests to cover sequences as document numbers and positions. |
|
|
iixr/fields.py iixr/files.py iixr/positions.py iixr/terms.py test.py
|
|
73:6dd92daca068
|
2010-11-20 |
Paul Boddie |
changeset
files
shortlog
graph
|
Introduced code to handle index merging where a large number of partitions
exist, combining the term and field dictionary merging into a common method
which is then parameterised for each kind of data. |
|
|
iixr/index.py
|
|
72:1cccc03f183e
70:4614ef99dbe1
|
2010-11-20 |
Paul Boddie |
changeset
files
shortlog
graph
|
Added get_terms convenience methods to the index and term dictionary readers.
Introduced safer closure of mergers. |
|
|
iixr/index.py iixr/merging.py iixr/terms.py
|
|
71:00995a70f535
|
2010-11-20 |
Paul Boddie |
changeset
files
shortlog
graph
|
An experiment adding preceding text to position records. |
|
|
iixr/phrases.py iixr/positions.py
|
|
70:4614ef99dbe1
71:00995a70f535 72:1cccc03f183e
|
2010-11-20 |
Paul Boddie |
changeset
files
shortlog
graph
|
Added a string serialisation function.
Fixed a parameter/argument name. |
|
|
iixr/data.py iixr/index.py
|
|
69:1077b05c9b76
|
2010-01-10 |
Paul Boddie |
changeset
files
shortlog
graph
|
Introduced position dictionary, file and index iterators which capture the
relevant result data in caches for particular terms, wrapping the underlying
shared file readers.
Added section output to the test program in order to make troubleshooting
easier.
Added a seek method to the File class. |
|
|
docs/COPYING.txt iixr/fields.py iixr/files.py iixr/filesystem.py iixr/positions.py iixr/terms.py test.py
|
|
68:9d836f8a4075
|
2010-01-08 |
Paul Boddie |
changeset
files
shortlog
graph
|
Removed iterators and openers with the intention of having synchronised reading
(such as that done by phrase queries) done by reading batches of positions,
explicitly seeking for each batch, by employing a wrapper around readers for
each term. |
|
|
iixr/files.py iixr/filesystem.py iixr/positions.py iixr/terms.py test.py
|
|