Lucene是一个全文检索的开源软件,对需要查询的关键词进行检索
1. 需要的jar包
- lucene-analyzers-3.0.2.jar
- lucene-core-3.0.2.jar
- lucene-highlighter-3.0.2.jar
- lucene-memory-3.0.2.jar
2. 编码步骤2.1 准备Article文章类
public class Article {
private Integer id;
private String title;
private String content;
public Article(){}
public Article(Integer id, String title, String content) {
super();
this.id = id;
this.title = title;
this.cOntent= content;
}
@Override
public String toString() {
return "Article [id=" + id + ", title=" + title + ", cOntent="
+ content + "]";
}
......
}
2.2 创建索引库
2.2.1 步骤
- 创建Article对象
- 创建Document对象
- 将Article对象的三个属性值绑定到Document对象中
- 创建IndexWriter字符流对象
- 将document对象写入lucene索引库
- 关闭indexWriter字符流对象
2.2.2 方法实例
/**
* 创建索引库
* 将Article对象放入索引库的原始记录中,形成词汇表
* @throws IOException
*/
@Test
public void createIndexDB() throws IOException{
Article article = new Article(1,"处理器","处理器是电脑的核心部件");
Document document = new Document();
document.add(new Field("xid", article.getId().toString(),Store.YES,Index.ANALYZED));
document.add(new Field("xtitle", article.getTitle(),Store.YES,Index.ANALYZED));
document.add(new Field("xcontent", article.getContent(),Store.YES,Index.ANALYZED));
Directory directory = FSDirectory.open(new File("D:/IndextDB"));
Version version = Version.LUCENE_30;
Analyzer analyzer = new StandardAnalyzer(version);
MaxFieldLength maxFieldLength = MaxFieldLength.LIMITED;
IndexWriter indexWriter = new IndexWriter(directory, analyzer, maxFieldLength);
indexWriter.addDocument(document);
indexWriter.close();
}
该方法执行后会在对应的磁盘创建索引库文件
此处有三个.cfs
文件是因为下面创建了多个,下文会做解释
2.3 关键词检索
2.3.1 步骤
- 准备工作,创建需要查询的关键词(String类型)
- 创建接受结果的
List
集合
- 创建IndexSearcher字符流对象
- 创检QueryParser查询解析器对象
- 创建Query对象封装查询关键字
- 根据关键字,去索引库中查找相关词汇返回TopDocs索引号对象
- 迭代词汇表中符合条件的编号
2.3.2 方法实例
/**
* 根据关键字词从索引库中取出来符合条件的内容
* @throws IOException
* @throws ParseException
*/
@Test
public void findIndexDB() throws IOException, ParseException{
String keyword = "处";
ListarticleList = new ArrayList();
Directory directory = FSDirectory.open(new File("D:/IndextDB"));
Version version = Version.LUCENE_30;
Analyzer analyzer = new StandardAnalyzer(version);
MaxFieldLength maxFieldLength = MaxFieldLength.LIMITED;
IndexSearcher indexSearcher = new IndexSearcher(directory);
QueryParser queryParser = new QueryParser(version,"xcontent", analyzer);
Query query = queryParser.parse(keyword);
int MAX_RECORD = 100;
TopDocs topDocs = indexSearcher.search(query, MAX_RECORD);
for(int i=0;i
ScoreDoc scoreDoc = topDocs.scoreDocs[i];
int no = scoreDoc.doc;
Document document = indexSearcher.doc(no);
String xid = document.get("xid");
String xtitle = document.get("xtitle");
String xcOntent= document.get("xcontent");
Article article = new Article(Integer.parseInt(xid), xtitle, xcontent);
articleList.add(article);
}
}
输出查询结果:
Article [id=1, title=处理器, cOntent=处理器是电脑的核心部件]
3. Lucene查询的优化3.1 创建工具类
上面的代码步骤繁多,看起来都很麻烦,我们把常用的和冗余的代码抽取出来,封装到工具类中,可以是代码更简洁。
该工具类用到了 java.lang.reflect 反射包
package com.bart.lucene.util;
import java.io.File;
import java.lang.reflect.Field;
import java.lang.reflect.Method;
import org.apache.commons.beanutils.BeanUtils;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field.Index;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.index.IndexWriter.MaxFieldLength;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
import org.junit.Test;
import com.bart.lucene.entity.Article;
/**
* 封装常用的操作
* @author bart
*
*/
public class LuceneUtils {
private static Directory directory;
private static Version version;
private static Analyzer analyzer;
private static MaxFieldLength maxFieldLength;
static{
try {
directory = FSDirectory.open(new File("E:/IndexDBDBDB"));
version = Version.LUCENE_30;
analyzer = new StandardAnalyzer(version);
maxFieldLength = MaxFieldLength.LIMITED;
} catch (Exception e) {
e.printStackTrace();
throw new RuntimeException(e);
}
}
public static Directory getDirectory() {
return directory;
}
public static Version getVersion() {
return version;
}
public static Analyzer getAnalyzer() {
return analyzer;
}
public static MaxFieldLength getMaxFieldLength() {
return maxFieldLength;
}
private LuceneUtils(){}
/**
* JavaBean转化为Document对象
* @param object
* @return Document
* @throws Exception
*/
public static Document javaBean2Document(Object obj) throws Exception{
Document document = new Document();
Class clazz = obj.getClass();
Field[] fields = clazz.getDeclaredFields();
for(Field field : fields){
field.setAccessible(true);
String name = field.getName();
String methodName = "get"+name.substring(0, 1).toUpperCase()+name.substring(1);
Method method = clazz.getMethod(methodName,null);
String value = method.invoke(obj,null).toString();
document.add(new org.apache.lucene.document.Field(name,value,Store.YES,Index.ANALYZED));
}
return document;
}
/**
* document对象转化为javabean对象
* @param document
* @param clazz
* @return t
* @throws Exception
*/
public static T document2JavaBean(Document document,Classclazz) throws Exception{
T t = clazz.newInstance();
Field[] fields = clazz.getDeclaredFields();
for(Field field : fields){
String name = field.getName();
String value = document.get(name);
BeanUtils.setProperty(t, name, value);
}
return t;
}
public static void main(String[] args) throws Exception{
Article article = new Article(1, "处理器","处理器是一台电子计算机的重要部件");
Document document = LuceneUtils.javaBean2Document(article);
System.out.println(document.toString());
System.out.println("---------");
Article article2 = LuceneUtils.document2JavaBean(document, Article.class);
System.out.println(article2);
}
}
3.2 重构FirstApp类
使用封装的工具类重构FirstApp.java
3.2.1 创建索引库
@Test
public void createIndexDB() throws Exception{
Article article = new Article( 3,"内存条","内存条是电脑的核心部件之一");
Document document = LuceneUtils.javaBean2Document(article);
IndexWriter indexWriter = new IndexWriter(LuceneUtils.getDirectory(),LuceneUtils.getAnalyzer(),LuceneUtils.getMaxFieldLength());
indexWriter.addDocument(document);
indexWriter.close();
}
因为此处创建了三个Article都放在了索引库中,故此在2.2.2中的图中三个.cfs
文件。
3.2.2 根据关键词查询
@Test
public void findIndexDB() throws Exception{
String keyword = "显卡";
ListarticleList = new ArrayList();
IndexSearcher indexSearcher = new IndexSearcher(LuceneUtils.getDirectory());
QueryParser queryParser = new QueryParser(LuceneUtils.getVersion(),"content",LuceneUtils.getAnalyzer());
Query query = queryParser.parse(keyword);
int MAX_RECORD=100;
TopDocs topDocs = indexSearcher.search(query, MAX_RECORD);
for(int i=0;i
ScoreDoc scoreDoc = topDocs.scoreDocs[i];
int no = scoreDoc.doc;
Document document = indexSearcher.doc(no);
Article article = LuceneUtils.document2JavaBean(document, Article.class);
articleList.add(article);
}
for(Article article : articleList){
System.out.println(article);
}
}
输出结果:
Article [id=2, title=显卡, cOntent=显卡是电脑的显示输出部件]
哈哈,作为一个卡巴司机,当然首先查询一下显卡了。
总结
我们发现代码重构之后,代码简洁了很多,而且工具的重用性更强了。在之后的项目中可以导包,直接调用。