Table of Contents
Dhamma Full Text Search ToolInfoLicenseAcknowledgementsDownload for off-line useBuild Your Own Full Text Search DatabaseHere, for example, we will build a SQLite3 FTS5 powered full text search app:Step 1: gather documents into one place`If you do not remove blank files and empty folders before indexing:To remove empty directories and blank files (0 byte)Step 2: convert them into plain .txt files, perform "data clean" etcStep 3: create an indexed databaseStep 4: search UI or CLI for the indexed databaseAppendix: Data file listFeedback
Some materials used in this app, such as the PTS Pali English dictionary, Roman pāḷi tipiṭaka text (VRI Roman version), etc... are for free distribution and non-commercial only.
Thus, this project should be released under this license:
xxxxxxxxxx
NonCommercial-ShareAlike 4.0 International (CC NC-SA 4.0)
xxxxxxxxxx
*********************************
This tipitaka digital text version copy right Vipassana Research Institute ("VRI"), Mumbai India.
Used by permission of VRI gratefully acknowledged.
*********************************
xxxxxxxxxx
/**
* Copyright Path Nirvana 2018
* The code and character mapping defined in this file can not be used for any commercial purposes.
* Permission from the auther is required for all other purposes.
*/
PTS Pali-English Dictionary buddhadust_pts_ped.utf8.txt is obtained from Buddhadust
xxxxxxxxxx
Corrected reprint © The Pali Text Society
Commercial Rights Reserved
Creative Commons Licence by-nc/3.0/
See the full file list in README.html
Name | Source |
---|---|
Pāḷi tipiṭaka text | Divided into 2662 files by https://tipitaka.app (used digital pāḷi tipiṭaka text VRI version) |
Pāḷi Dictionary | + PTS PED buddhadust_pts_ped.utf8.tx from Buddhadust + Siongui/data Github repository: vi-su-Pali_Viet_Abhi_Terms.tsv vi-su-Pali_Viet_Dictionary.tsv vi-su-Pali_Viet_Vinaya_Terms.tsv |
Pa-Auk Meditation Manual | Some Pa-Auk Forest Monastery Meditation Manual eBooks see file list |
Tam Tạng Pāḷi Việt | A Vietnamese translation of tipiṭaka project (currently it is not finished all yet) fromTam Tang Pali Viet, most files are retrieved from: https://tamtangpaliviet.net/TTPV/TTPV_BanDich.htm |
Other databases | Some other miscellaneous databases like our personal e-books, Webster's Revised Unabridged Dictionary (1913) (this version is now in public domain) etc... which are big in file-size and may not be available on this online version. |
unzip -qq './data/*.zip'
, do not make more folders.xxxxxxxxxx
data
├── paaukmed.sqlite3
├── palidict.sqlite3
├── tptk.sqlite3
├── ttpv_budsas.net.sqlite3
...other files...
In general, to build a full text search app, you need to do these steps:
Step 1: gather documents into one place
Step 2: convert them into plain .txt files, do "data clean" etc.
Step 3: create an indexed database
Step 4: search UI or CLI for the indexed database
xxxxxxxxxx
# Find empty files
find . -type f -size 0b -print
find . -type f -size 0b -delete
# Find empty dirs
find . -empty -type d -print
find . -empty -type d -delete
The "." is current directory.
The first ones with -print are to list (dry run) items only, if you are OK with it, then use the next commands with the -delete option. It will delete the matched items.
tika-app.jar
(download from https://tika.apache.org) to convert documents to txt files with batch mode.xxxxxxxxxx
# Read Getting Started with Apache Tika
# from https://tika.apache.org for more info
java -jar tika-app.jar -t -i <inputDirectory> -o <outputDirectory>
prepare-textdata.py
to clean these text files first. Check https://github.com/vpnry/dhammafts-dev-code for source code files.xxxxxxxxxx
# This will help to fix broken lines
python3 prepare-textdata.py
After you have successfully converted all of your documents into plain text files, you now can use Apache Lucene
to create an index database, or in this case, we simply use SQLite3 FTS5 to do so:
In the Step 2 above, the converted txt files may contain broken lines, use prepare-textdata.py
to fix them (if you have not yet done):
xxxxxxxxxx
python3 prepare-textdata.py
After that you can index them:
xxxxxxxxxx
python3 index-all-others.py
Congrats! Nearly done! :)
Now simply place your indexed databases to the directory data
. And update their paths in the index.php
file. Find the follow line and update it accordingly to your cases.
xxxxxxxxxx
$dbConnection = new SQLite3("data/tptk.sqlite3");
xxxxxxxxxx
paaukmed/
├── 01 Samatha and Rupa(A5).pdf
├── 02 Nama (newFont14.5.11)(A4).pdf
├── 03 Patticca(5thMethod)(newFont14.5.11).pdf
├── 04 Paticca (1st Method)(new font14.5.11).pdf
├── 05 PATHANA (new font14.5.11) 3.pdf
├── 06 CFMP(LakkhanaRasa)(2011).pdf
├── 07 Vipassana(all) (newFont14.5.11)3.pdf
├── 14 Ways En-Ch.pdf
├── 14 Ways Singhalese.pdf
├── NUTRIMENT- BORN MATERIALITY.pdf
├── Nutriment-born(Revised19.12.2012)5(Lg+A4).pdf
├── Recollection of Past Lives by Abhinna Etc.pdf
└── Rupa+Nama Tables (all) 10.pdf
0 directories, 14 files
xttpv/
├── 28_Khp-Dh-Ud-It.pdf
├── 29_Sn.pdf
├── 30_Vv_Pv.pdf
├── 31_Thag_Thig.pdf
├── 32_Ja_I.pdf
├── 33_Ja_II.pdf
├── 34_Ja_III.pdf
├── 35_Nidd_I.pdf
├── 36_Nidd_II.pdf
├── 45_Mil.pdf
├── Indacanda - Kinh Tung Pali Le Bai Tam Bao.pdf
├── ttpv_01_Pr.pdf
├── ttpv_02_Pc_I.pdf
├── ttpv_03_Pc_II.pdf
├── ttpv_04_Mv_I.pdf
├── ttpv_05_Mv_II.pdf
├── ttpv_06_Cv_I.pdf
├── ttpv_07_Cv_II.pdf
├── ttpv_08_Par_I.pdf
├── ttpv_09_Par_II.pdf
├── ttpv_37_Pts_I.pdf
├── ttpv_38_Pts_II.pdf
├── ttpv_39_Ap_I.pdf
├── ttpv_40_Ap_II.pdf
├── ttpv_41_Ap_III.pdf
├── ttpv_42_Bv&Cp.pdf
└── ttpv_bkn_ptm Gioi Bon Tkn.pdf
0 directories, 27 files
May we all be able to understand and practise the Dhamma correctly, quickly. May you all be well and happy!