Download the PHP package venveo/craft-documentsearch without Composer

On this page you can find all versions of the php package venveo/craft-documentsearch. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:
If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.
Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

  • Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
  • To use Composer is sometimes complicated. Especially for beginners.
  • Composer needs much resources. Sometimes they are not available on a simple webspace.
  • If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
  • Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.
Please rate this library. Is it a good library?

Informations about the package craft-documentsearch

Document Search plugin for Craft CMS 4

Extracts keywords and phrases from PDF documents and adds them to Craft CMS' native search index.

NOTE: Please try before you buy and make sure this plugin suits your needs. You may not get the results you're expecting! Certain document types do not necessarily lend themselves well to this process.

NOTE: If you're looking for a full-text document search solution, this may not be it. The purpose of this plugin is to boil down large documents to consumable sizes for your database. If the full-text will fit in the search index, it will be inserted; otherwise, we will parse for common n-grams.

How it works

Document Search exists to augment the exisitng Craft CMS search index. To do this, we want to avoid polluting it with large amount of data, so we will extract only the most important parts. However, if the content of the pdf fits in the search index, it will be stored unmodified.

When a PDF is saved as an asset and the Volume is configured to be searched, the textual content will be extracted by the pdftotext executable. First, this content is sanitized and normalized with "stop words" removed. Stop words are essentially non-useful words such as "and". Stopwords are selected based on the Asset's locale language (with fallback to english.) This content is then processed into the top 30 (30 of each) 1-gram, 2-gram, and 3-grams. In this scenario, 1-grams are going to simply be most commonly occuring words. Two and three grams are going to help prioritize exact phrase matches. For example, processing the Wikipedia page for "cats" yields the following search keywords:

Notice the first 30 keywords are things like: cats, cat, species, domestic however they do not have any contextual relation to their adjacent keywords. As we move down the list, you'll notice short phrases that someone might search for to yield a more exact match, such as: "central point", "eye movement", "widely dispersed inviduals", "great pomp esplanade"

These 2 and 3 grams are not simply based on their number of occurrences but are actually derived by a process known as Rapid Automatic Keyword Extraction (RAKE) to infer the importance based on the words in them.

Document Search Usage

Once installed and configured (see configuration section), PDF assets with text in them (does not work with images, such as scans) will be indexed automatically.

Like other fields in Craft, you may tweak the search query to your liking by targeting the field named contentKeywords

Requirements

Installation

Plugin

To install the plugin, follow these instructions.

  1. Open your terminal and go to your Craft project:

    cd /path/to/project
  2. Then tell Composer to load the plugin:

    composer require venveo/document-search
  3. In the Control Panel, go to Settings → Plugins and click the “Install” button for Document Search.

pdftotext Executable

To install on Ubuntu or Debian, the precompiled binaries can be procured from aptitude:

apt-get install poppler-utils

To install on RedHat or CentOS, the precompiled binaries can be procured from yum:

yum install poppler-utils

Note: If you're looking for a full-text document search solution, this isn't it. The purpose of this plugin is to boil down large documents to consumable sizes for a PHP-based web server.

Configuring Document Search

Document Search requires a runnable binary of pdftotext. The default file location for the binary is set to /usr/local/bin/pdftotext but can be changed through config or settings options.

To check if you have pdftotext installed on your server, you can run:

which pdftotext

See the installation section for notes on installing pdftotext.

Using Document Search

The search index will populate keywords extracted from assets when they are saved. Keywords for existing assets are not automatically generated, but can be generated using the ./craft resave/assets command included with Craft.

Brought to you by Venveo


All versions of craft-documentsearch with dependencies

PHP Build Version
Package Version
Requires craftcms/cms Version ^3.1.0
spatie/pdf-to-text Version ^1.1
yooper/php-text-analysis Version ^1.4
voku/stop-words Version ^2.0
Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package venveo/craft-documentsearch contains the following files

Loading the files please wait ....