Thursday, December 8, 2011

Lucene On Android

嘗試性的把 Lucene 放到 Android 上面來跑,結果不是太理想,但仍是一些心得分享出來,省去後人嘗試的時間。

Lucene 要跑在 Android 上,第一個碰上的問題是,如何把 index files 傳到手機上去,在 Lucene 中,對 index 的讀取,是以目錄為單位的,所以說,無法把 index files 放在 apk 中直接讀取,一定要存放在 device or external storage 上,才能夠使用;或者是自己弄個虛擬目錄出來,不過,這會耗用過多的計憶體空間。

我是選用把 index 放在 'src/res/raw' 底下,讓他變成 apk 的一部份,省去在網路上找個空間來放置 index 的問題,當要更新 index 時,就重編個 apk 叫使用者更新就好。

在放 lucene index 時,如果你想用 compound format 的話,可以用底下的指令,把多個 index files 包裹成單一檔案 .cfs

// run REPL with 'scala -cp luke-3.4.0_1-all.jar'

import org.apache.lucene.store.FSDirectory
import org.apache.lucene.index._
import org.apache.lucene.analysis.standard._

val source = FSDirectory.open(new java.io.File("source"))
val dest = FSDirectory.open(new java.io.File("dest"))

// open source index
val reader = IndexReader.open(source)

// create writer for compound index.
val analyzer = new StandardAnalyzer(org.apache.lucene.util.Version.LUCENE_34)
val writer = new org.apache.lucene.index.IndexWriter(dir, analyzer, IndexWriter.MaxFieldLength.UNLIMITED)

// force writer always use compound index format.
writer.getMergePolicy.asInstanceOf[LogByteSizeMergePolicy].setNoCFSRatio(1.0)


// add source index to dest index.
writer.addIndexes(reader)
writer.optimized
writer.close

reader.close

接著,是把產生的 segment, segments_1, _0.cfs 拷到 src/res/raw 底下,讓這些檔案變成 .apk 的一部份。


接著,是要在第一次執行時,把這些 index 從 apk 中覆製到 SD 卡上或是機子上,這邊,我寫了個小工具來做這件事

import android.content.Context
import android.os.Environment
import android.util.Log

import com.bluetangstudio.android.disastermap.TaipeiDisasterApp.LogTag

import org.apache.commons.io.FileUtils
import org.apache.lucene.store.{FSDirectory, Directory}

import scala.collection.JavaConversions._
import java.io.File

/**
 *  Helper class that search for lucene index directories on the device. The search order is
 *  external storage first then local storage. If lucene index does not exist on device, this
 *  class will copy the index from the apk to the device storage.
 *
 * @param context  the application context
 * @param path     the root folder name of the index directory to use and to look for.
 * @param source   the source of index resource to copy if index does not exist on the device.
 *                 format: Seq[(filename, resourceId)]
 */
case class LuceneOpenHelper(context: Context, path: String, source: Seq[Tuple2[String, Int]]) {

    /**
     * create or open an Directory.
     */
    def open(): Option[Directory] = {
        val candidates = Seq(externalFolder, internalFolder).flatten

        // find the folder with index in it.
        val folder = candidates.filter(f => f.exists() && f.list().length > 0).headOption
        val withIndex = folder.orElse(
            candidates.find(
                f => {
                    // ensure folder is available.
                    f.exists() || f.mkdirs() match {
                        // folder is not accessible
                        case false => false

                        case _ => {
                            Log.d(LogTag, "Duplicating index from apk to %s...".format(f))
                            source.foreach(
                                s => {
                                    val is = context.getResources.openRawResource(s._2)
                                    try {
                                        FileUtils.copyInputStreamToFile(is, new File(f, s._1))
                                    } finally {
                                        is.close()
                                    }
                                }
                            )
                            true
                        }
                    }
                }
            )
        )

        return withIndex.map(FSDirectory.open(_))
    }

    private def externalFolder: Option[File] = {
        Environment.getExternalStorageState match {
            case Environment.MEDIA_MOUNTED => Option(context.getExternalFilesDir(path))
            case _ => None
        }
    }

    private def internalFolder: Option[File] = {
        return Option(new File(context.getFilesDir, path))
    }

}

最後,是在 *App 上加上這段

object MyApp {
    private val INDEX_DIRECTORY = "idx"

    private val INDEX_FILES = Seq(
        ("_0.cfs", R.raw.idx_0), 
        ("segments", R.raw.segments), 
        ("segments_1", R.raw.segments_1)
    )
}
class MyApp extends android.app.Application {

    import MyApp._

    private var _luceneSearcher: Option[IndexSearcher] = None

    def luceneSearcher: Option[IndexSearcher] = {
        if (_luceneSearcher.isEmpty) {
            Log.d(LogTag, "Initializing new IndexSearcher...")
            _luceneSearcher = LuceneOpenHelper(this, INDEX_DIRECTORY, INDEX_FILES).open().map(new IndexSearcher(_))
        }
        _luceneSearcher
    }
   
    override def onLowMemory() {
        _luceneSearcher.foreach(s => s.close())
        _luceneSearcher = None
    }
}

這樣一來,就能在 Android 上面跑 lucene-core 了.