Download the PHP package s2/rose without Composer
On this page you can find all versions of the php package s2/rose. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Informations about the package rose
Rose
This is a search engine designed for content sites with simplified yet functional English and Russian morphology support. It indexes your content and provides a full-text search.
Requirements
- PHP 7.4 or later.
- A relational database in case of significant content size. Supported databases are:
Database | Tests |
---|---|
MySQL 5.6 or later and MariaDB 10.2 or later | |
PostgreSQL (tested on versions 10...16) | |
SQLite (tested on 3.37.2) |
Installation
Usage
Preparing Storage
The index can be stored in a database or in a file. A storage serves as an abstraction layer that conceals implementation details.
In most cases you need database storage PdoStorage
.
The storage is required for both indexing and searching.
When you want to rebuild the index, you call PdoStorage::erase()
method:
It drops index tables (if they exist) and creates new ones from scratch. This method is sufficient for upgrading to a new version of Rose that may not be backward compatible with the existing index.
Morphology
For natural language processing, Rose uses stemmers. The stemmer truncates the inflected part of words, and Rose processes the resulting stems. Rose does not have built-in dictionaries, but it includes heuristic stemmers developed by Porter. You can integrate any other algorithm by implementing the StemmerInterface.
Indexing
Indexer
builds the search index. It depends on a stemmer and a storage.
Indexer accepts your data in a special format. The data must be wrapped in the Indexable
class:
The constructor of Indexable
has 4 arguments:
- external ID - an arbitrary string that is sufficient for your code to identify the page;
- page title;
- page content;
- instance ID - an optional integer ID of the page source (e.g., for multi-site services), as explained below.
Optional parameters that you can provide include: keywords, description, date, relevance ratio, and URL. Keywords are indexed and searched with higher relevance. The description can be used for building a snippet (see below). It is suggested to use the content of "keyword" and "description" meta-tags, if available, for this purpose. The URL can be an arbitrary string.
The Indexer::index()
method is used for both adding and updating the index.
If the content is unchanged, this method skips the operation. Otherwise, the content is being removed and indexed again.
When you remove a page from the site, simply call
Searching
Full-text search results can be obtained via Finder
class.
$resultSet->getItems()
returns all the information about content items and their relevance.
Modify the Query
object to use a pagination:
Provide instance id to limit the scope of the search with a subsystem:
Highlighting and Snippets
It's a common practice to highlight the found words in the search results. You can obtain the highlighted title:
This method requires the stemmer since it takes into account the morphology and highlights all the word forms. By default, words are highlighted with italics. You can change the highlight template by calling $finder->setHighlightTemplate('<b>%s</b>')
.
Snippets are small text fragments containing found words that are displayed at a search results page. Rose processes the indexed content and selects best matching sentences.
Words in the snippets are highlighted the same way as in titles.
If building snippets takes a lot of time, try to use pagination to reduce the number of snippets processed.
Instances
Instances can be helpful to restrict the scope of search.
For example, you can try to index blog posts with instance_id = 1
and comments with instance_id = 2
.
Then you can run queries with different restrictions:
(new Query('content'))->setInstanceId(1)
searches through blog posts,(new Query('content'))->setInstanceId(2)
searches through comments,(new Query('content'))
searches everywhere.
When indexing, if you omit instance_id or provide instance_id === null
, a value 0
will be used internally.
Such content can only match queries without instance_id restrictions.
Content format and extraction
Rose is designed for the websites and web applications. It supports HTML format of the content by default. However, it is possible to extend the code to support other formats (e.g. plain text, markdown). This can be done by creating a custom extractor:
Recommendations
PdoStorage has the capability to identify similar items within the entire set of indexed items.
Consider a scenario where you have a blog and its posts are indexed using Rose. This particular feature allows you to choose a set of other posts for each individual post, enabling visitors to explore related content.
The data structure within the full-text index is well-suited for the task of selecting similar posts. To put it simply, regular search entails selecting relevant posts based on words from a search query, whereas post recommendations involve selecting other posts based on the words present in a given post.
You can retrieve recommendations by invoking the following method:
[!NOTE] Recommendations are supported on MySQL and PostgreSQL databases. They are not implemented in SQLite due to limited SQL support.
All versions of rose with dependencies
ext-json Version *
symfony/polyfill-mbstring Version ^1.2
psr/log Version ^1.1|^2.0|^3.0