Xapian_tutorial_1
- note:some of the content and code are refer from http://www.coder4.com/archives/2218
basic index build and search
- first of all, xapian is an open source c++ search engine.
- also note that xapian is called “Zap-in”
basic data structure
- used for search
- Xapian::Database- used to read index
 
- Xapian::Enquire- use with Database
- use to search
 
- Xapian::QueryParser- query sentence parser
 
- Xapian::Query- query
 
- Xapian::MSet- the result set returned by searching
 
 
- used for build index
- Xapian::WritableDatabase- use for built index
 
- Xapian::TermGenerator- use for cut sentence, build index.
 
 
- for both
- Xapian::Document- abstract of document
 
-  Xapian::SimpleStopper- the word used for ending
 
-  Xapian::Error- exception
- use get_description() to get detailed info.
 
 
how to build index
- open a Xapian::WritableDatabase
- Then prepare for the document
- use set_data(string)to set data(only one)
- use add_value(slot, string)to set field(can have more), slot can not be -1
- these two method is only used for storage
- not used for parse or index
 
 
- use 
- build index field
- use Document.add_term(word, pos)
- use Xapian::TermGeneratorand.set_document(doc)- then pass the string using delimiter space into index_text
- then the doc will have the index field of this document
 
 
- use 
- after building the document, import into database
- use DB.commit()
how to query
- open Xapian::Database, the path is the same as WriteableDatabase
- use DB to construct Xapian::Enquire
- use Xapian::QueryParserto parse the string and generateXapian::Query
- use enquire.set_query()to query
- get the result set by using enquire.get_mset(start, len).
- use Xapian::MsetIteratorto traverse the MSet.- use get_rank()to get the rank
- use get_documentto get the document
 
- use 
query grammer
- Term | Term | Term
- Term -> Term ~ Term- ~is used for similar word
 
About field
- When building index
- use Xapian::TermGeneratorfor example- we need to set the TermGenerator.set_database(db)
- when building the index field
- index_text(text, wdf_inc=1, prefix)- The second and third parameter are default
- The second is TF increase
- The third is prefix
 
 
 
- we need to set the 
- When query
- add mapping using Xapian::QueryParser
- .add_prefix("title", "T")
- Then the qp.parse_querycan have field when query the string
- for example
- ‘title:news AND content:basketball’
- and now there are two field
 
 
- add mapping using 
 
- use 
sample code
- create_index.cpp
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |  | 
- search.cpp
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |  |