Xapian_tutorial_1
- note:some of the content and code are refer from http://www.coder4.com/archives/2218
basic index build and search
- first of all, xapian is an open source c++ search engine.
- also note that xapian is called “Zap-in”
basic data structure
- used for search
Xapian::Database- used to read index
Xapian::Enquire- use with Database
- use to search
Xapian::QueryParser- query sentence parser
Xapian::Query- query
Xapian::MSet- the result set returned by searching
- used for build index
Xapian::WritableDatabase- use for built index
Xapian::TermGenerator- use for cut sentence, build index.
- for both
Xapian::Document- abstract of document
-
Xapian::SimpleStopper- the word used for ending
-
Xapian::Error- exception
- use get_description() to get detailed info.
how to build index
- open a
Xapian::WritableDatabase - Then prepare for the document
- use
set_data(string)to set data(only one) - use
add_value(slot, string)to set field(can have more), slot can not be -1 - these two method is only used for storage
- not used for parse or index
- use
- build index field
- use
Document.add_term(word, pos) - use
Xapian::TermGeneratorand.set_document(doc)- then pass the string using delimiter space into index_text
- then the doc will have the index field of this document
- use
- after building the document, import into database
- use DB.commit()
how to query
- open
Xapian::Database, the path is the same as WriteableDatabase - use DB to construct
Xapian::Enquire - use
Xapian::QueryParserto parse the string and generateXapian::Query - use
enquire.set_query()to query - get the result set by using
enquire.get_mset(start, len). - use
Xapian::MsetIteratorto traverse the MSet.- use
get_rank()to get the rank - use
get_documentto get the document
- use
query grammer
Term | Term | TermTerm -> Term ~ Term~is used for similar word
About field
- When building index
- use
Xapian::TermGeneratorfor example- we need to set the
TermGenerator.set_database(db) - when building the index field
index_text(text, wdf_inc=1, prefix)- The second and third parameter are default
- The second is TF increase
- The third is prefix
- we need to set the
- When query
- add mapping using
Xapian::QueryParser .add_prefix("title", "T")- Then the
qp.parse_querycan have field when query the string - for example
- ‘title:news AND content:basketball’
- and now there are two field
- add mapping using
- use
sample code
- create_index.cpp
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | |
- search.cpp
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | |