Xapian_tutorial_1
- note:some of the content and code are refer from http://www.coder4.com/archives/2218
basic index build and search
- first of all, xapian is an open source c++ search engine.
- also note that xapian is called “Zap-in”
basic data structure
- used for search
Xapian::Database
- used to read index
Xapian::Enquire
- use with Database
- use to search
Xapian::QueryParser
- query sentence parser
Xapian::Query
- query
Xapian::MSet
- the result set returned by searching
- used for build index
Xapian::WritableDatabase
- use for built index
Xapian::TermGenerator
- use for cut sentence, build index.
- for both
Xapian::Document
- abstract of document
-
Xapian::SimpleStopper
- the word used for ending
-
Xapian::Error
- exception
- use get_description() to get detailed info.
how to build index
- open a
Xapian::WritableDatabase
- Then prepare for the document
- use
set_data(string)
to set data(only one) - use
add_value(slot, string)
to set field(can have more), slot can not be -1 - these two method is only used for storage
- not used for parse or index
- use
- build index field
- use
Document.add_term(word, pos)
- use
Xapian::TermGenerator
and.set_document(doc)
- then pass the string using delimiter space into index_text
- then the doc will have the index field of this document
- use
- after building the document, import into database
- use DB.commit()
how to query
- open
Xapian::Database
, the path is the same as WriteableDatabase - use DB to construct
Xapian::Enquire
- use
Xapian::QueryParser
to parse the string and generateXapian::Query
- use
enquire.set_query()
to query - get the result set by using
enquire.get_mset(start, len)
. - use
Xapian::MsetIterator
to traverse the MSet.- use
get_rank()
to get the rank - use
get_document
to get the document
- use
query grammer
Term | Term | Term
Term -> Term ~ Term
~
is used for similar word
About field
- When building index
- use
Xapian::TermGenerator
for example- we need to set the
TermGenerator.set_database(db)
- when building the index field
index_text(text, wdf_inc=1, prefix)
- The second and third parameter are default
- The second is TF increase
- The third is prefix
- we need to set the
- When query
- add mapping using
Xapian::QueryParser
.add_prefix("title", "T")
- Then the
qp.parse_query
can have field when query the string - for example
- ‘title:news AND content:basketball’
- and now there are two field
- add mapping using
- use
sample code
- create_index.cpp
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
|
- search.cpp
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|