Download the PHP package kevinpijning/pest-plugin-prompt without Composer
On this page you can find all versions of the php package kevinpijning/pest-plugin-prompt. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Informations about the package pest-plugin-prompt
Pest Plugin for Prompt Testing
Test your AI prompts with confidence using Pest's elegant syntax.
This plugin brings LLM prompt testing to your Pest test suite, powered by promptfoo under the hood. Write fluent, expressive tests for evaluating AI model prompts using the familiar Pest API you already love.
Table of Contents
- Why Use This Plugin?
- Prerequisites
- Installation
- Quick Start
- Documentation
- Core Functions
prompt()provider()assertion()- Evaluation Methods
describe()usingProvider()alwaysExpect()expect()and()- Assertion Methods
- Provider Configuration
id()label()temperature()maxTokens()topP()frequencyPenalty()presencePenalty()stop()config()- Usage Examples
- Basic Example
- Multiple Prompts
- Multiple Providers
- Multiple Test Cases
- Provider Configuration
- Global Provider Registration
- Advanced Assertions
- LLM-Based Evaluation
- Complex Example
- CLI Options
--output
- Credits & License
Why Use This Plugin?
- Test prompts against multiple LLM providers - Compare OpenAI, Anthropic, and more in a single test
- Validate responses with content assertions - Check for specific text, JSON validity, HTML structure, and more
- Use LLM-based evaluation - Judge responses with natural language rubrics using AI itself
- Familiar Pest-style fluent API - Feels natural if you're already using Pest
- Automatic cleanup - Temporary files are managed for you
- Battle-tested - Built on promptfoo's proven evaluation framework
Prerequisites
Before you begin, make sure you have:
- PHP 8.3 or higher
- Pest 4.0 or higher
- Node.js and npm - Required for promptfoo execution via
npx - API keys for LLM providers - You'll need keys for the providers you want to test
Setting up API Keys
Set environment variables for the providers you'll use:
If you're using Laravel or a similar framework with .env file support, you can add them there instead.
For more provider options and configuration, check out promptfoo's provider documentation.
Installation
Install the plugin via Composer:
The plugin automatically registers with Pest via package discovery - no additional configuration needed!
Quick Start
Here's the simplest possible example to get you started:
What's happening here?
- We create a prompt with variable interpolation using
{{name}} - We specify OpenAI's GPT-4o-mini as our LLM provider
- We test with the variable
nameset to "Alice" - We assert that the response contains "Alice"
When you run this test, the plugin will:
- Send the prompt to OpenAI with "Alice" substituted for
{{name}} - Receive the response
- Verify that "Alice" appears in the response
- Pass or fail the test accordingly
Documentation
Core Functions
prompt()
Create a new evaluation with one or more prompts. Use {{variable}} syntax for variable interpolation.
provider()
Register a global provider like Pest datasets that can be reused across multiple tests. Providers registered with this function can be referenced by name in usingProvider().
assertion()
Register a reusable assertion group by name. Groups can be defined fluently or with a callback that receives the TestCase (and optional parameters), and then reused via to() / group() or magic toXxx methods.
Evaluation Methods
describe()
Add a description to your evaluation for better test output and debugging.
usingProvider()
Specify which LLM provider(s) to use for evaluation. You can pass provider IDs, Provider instances, callables, or registered provider names.
alwaysExpect()
Set default assertions and variables that apply to all test cases in the evaluation. This is useful when you want to ensure certain conditions are met for every test case without repeating the assertions.
With callback:
You can pass an optional callback function to configure the default test case:
Key points:
alwaysExpect()returns aTestCaseinstance that supports all assertion methods- Assertions added via
alwaysExpect()apply to every test case in the evaluation - Default variables can be set and will be merged with test case variables
- You can chain multiple assertions after
alwaysExpect()or use a callback - The default test case is separate from regular test cases and won't appear in the
testCases()array - If
alwaysExpect()is called multiple times, subsequent calls will execute the callback on the existing default test case
Use cases:
- Ensure all responses meet quality standards (e.g., "always be professional")
- Set common variables that apply to all tests
- Enforce safety checks across all test cases
- Apply format requirements universally (e.g., "always contain JSON")
expect()
Create a test case with variables that will be substituted into your prompt template.
With callback:
You can pass an optional callback function that receives the created TestCase instance. This is useful for grouping multiple assertions or applying conditional logic.
and()
Chain multiple test cases for the same evaluation. Each call to and() creates a new test case with different variables.
With callback:
You can pass an optional callback function that receives the newly created TestCase:
to() and group()
Group multiple assertions together using a callback or invokable class. Both to() and group() are aliases that execute a callback with the current test case, allowing you to organize assertions logically.
Using callbacks:
Using invokable classes:
You can also pass an invokable class (a class with an __invoke() method) to reuse assertion logic across multiple tests.
Key points:
to()andgroup()are functionally identical - use whichever reads better in your context- Accepts either a callable or an invokable class FQN (fully qualified name)
- The callback/invokable receives the current
TestCaseinstance - Useful for organizing related assertions together
- Can be chained multiple times
- Works with all assertion methods
Use cases:
- Group related assertions for better code organization
- Apply conditional logic based on test case variables
- Reuse assertion patterns across multiple test cases with invokable classes
- Create reusable assertion libraries for common quality checks
Assertion Methods
toContain()toContainAll()toContainAny()toContainJson()toContainHtml()toContainSql()toContainXml()toEqual()toBe()toBeJudged()startsWith()toMatchRegex()toBeJson()toEqualJson()toMatchJsonStructure()toHaveJsonFragment()toHaveJsonFragments()toHaveJsonPath()toHaveJsonPaths()toHaveJsonType()toBeHtml()toBeSql()toBeXml()toBeSimilar()toHaveLevenshtein()toHaveRougeN()toHaveFScore()toHavePerplexity()toHavePerplexityScore()toHaveCost()toHaveLatency()toHaveValidFunctionCall()toHaveValidOpenaiFunctionCall()toHaveValidOpenaiToolsCall()toHaveToolCallF1()toHaveFinishReason()toBeClassified()toBeScoredByPi()toBeRefused()toPassJavascript()toPassPython()toPassWebhook()toHaveTraceSpanCount()toHaveTraceSpanDuration()toHaveTraceErrorSpans()notModifier
toContain()
Assert that the response contains specific text. Case-insensitive by default.
toContainAll()
Assert that the response contains all of the specified strings.
toContainAny()
Assert that the response contains at least one of the specified strings.
toContainJson()
Assert that the response contains valid JSON.
toContainHtml()
Assert that the response contains valid HTML.
toContainSql()
Assert that the response contains valid SQL.
toContainXml()
Assert that the response contains valid XML.
toEqual()
Assert that the response exactly equals the expected value. This is useful for deterministic outputs where you expect an exact match. You can also check whether it matches the expected JSON format.
toBe()
This is a convenience alias of toEqual().
toBeJudged()
Use an LLM to evaluate the response against a natural language rubric. This is useful for subjective quality checks.
startsWith()
Assert that the response starts with a specific prefix.
toMatchRegex()
Assert that the response matches a regular expression pattern.
toBeJson()
Assert that the response is valid JSON (not just contains JSON).
toEqualJson()
Assert that the JSON output exactly equals the expected value. Object key order is ignored, but array order is preserved. This is similar to Laravel's assertExactJson().
toMatchJsonStructure()
Assert that the JSON output contains all expected keys. This validates structure without checking values, similar to Laravel's assertJsonStructure().
toHaveJsonFragment()
Assert that the JSON output contains specific key-value pairs. Similar to Laravel's assertJsonFragment().
toHaveJsonFragments()
Assert that the JSON output contains all specified fragments.
toHaveJsonPath()
Assert that a value exists at a specific JSON path. Supports dot notation, numeric array indices, and wildcards.
toHaveJsonPaths()
Assert that multiple JSON paths exist, optionally with expected values.
toHaveJsonType()
Assert that the value at a JSON path has the expected type. Supports: string, number, boolean, array, object, null.
toBeHtml()
Assert that the response is valid HTML.
toBeSql()
Assert that the response is valid SQL (not just contains SQL).
toBeXml()
Assert that the response is valid XML.
toBeSimilar()
Assert that the response is semantically similar to the expected value using embedding similarity.
toHaveLevenshtein()
Assert that the Levenshtein (edit) distance between the response and expected value is below a threshold.
toHaveRougeN()
Assert that the ROUGE-N score is above a threshold.
toHaveFScore()
Assert that the F-score is above a threshold.
toHavePerplexity()
Assert that the perplexity is below a threshold.
toHavePerplexityScore()
Assert that the normalized perplexity score is below a threshold.
toHaveCost()
Assert that the inference cost is below a maximum threshold.
toHaveLatency()
Assert that the response latency is below a maximum threshold (in milliseconds).
toHaveValidFunctionCall()
Assert that the response contains a valid function call matching the provided schema.
toHaveValidOpenaiFunctionCall()
Assert that the response contains a valid OpenAI function call.
toHaveValidOpenaiToolsCall()
Assert that the response contains valid OpenAI tool calls.
toHaveToolCallF1()
Assert that the F1 score comparing actual vs expected tool calls is above a threshold.
toHaveFinishReason()
Assert that the model stopped for the expected reason. You can use either a string or the FinishReason enum.
Standard Finish Reasons:
stop: Natural completion (reached end of response, stop sequence matched)length: Token limit reached (max_tokens exceeded, context length reached)content_filter: Content filtering triggered due to safety policiestool_calls: Model made function/tool calls
Convenience Methods:
For each finish reason, there's a dedicated convenience method:
toBeClassified()
Assert that a HuggingFace classifier returns the expected class above a threshold.
toBeScoredByPi()
Use Pi Labs' preference scoring model as an alternative to LLM-as-a-judge.
toBeRefused()
Assert that the LLM output indicates the model refused to perform the requested task.
toPassJavascript()
Assert that a custom JavaScript function validates the output.
toPassPython()
Assert that a custom Python function validates the output.
toPassWebhook()
Assert that a webhook returns {pass: true}.
toHaveTraceSpanCount()
Assert that trace spans matching patterns meet min/max thresholds.
toHaveTraceSpanDuration()
Assert that trace span durations meet percentile and max duration thresholds.
toHaveTraceErrorSpans()
Detect errors in traces by status codes, attributes, and messages.
not Modifier
Negate any assertion by using the not modifier.
Provider Configuration
When creating or configuring providers, you can use these methods:
id()
Set the provider identifier (e.g., 'openai:gpt-4', 'anthropic:claude-3').
label()
Set a custom label for the provider (useful in test output).
temperature()
Control randomness in responses (0.0 to 1.0). Lower values make responses more deterministic.
maxTokens()
Set the maximum number of tokens to generate.
topP()
Set nucleus sampling parameter (0.0 to 1.0).
frequencyPenalty()
Penalize frequent tokens (-2.0 to 2.0).
presencePenalty()
Penalize new tokens based on presence in text (-2.0 to 2.0).
stop()
Set stop sequences where generation should stop.
config()
Set custom configuration options for the provider. Accepts either an array (replaces config) or a closure (receives current config for merging).
Extending Provider
The Provider class uses Pest's Extendable trait, allowing you to add custom methods:
Usage Examples
Basic Example
Multiple Prompts
Test multiple prompt variations against the same test cases.
Multiple Providers
Compare responses across different LLM providers.
Multiple Test Cases
Test the same prompt with different variable values.
Default Test Cases
Use alwaysExpect() to set assertions that apply to all test cases.
Provider Configuration
Configure providers with specific parameters.
Global Provider Registration
Register providers once and reuse them across tests.
Advanced Assertions
Combine multiple assertion types.
LLM-Based Evaluation
Use AI to evaluate response quality.
Structured JSON Output Testing
Test structured JSON outputs from LLMs, particularly useful with OpenAI's Responses API and structured output features.
Complex Example
A comprehensive example showing multiple features together.
CLI Options
--output
Save promptfoo evaluation results to a directory. Useful for debugging and analysis.
The output directory will contain HTML reports and JSON data from promptfoo evaluations.
Parallel Test Support
This plugin supports parallel test execution with Pest's --parallel flag. Cache isolation and merging is handled automatically.
Credits & License
Created by: Kevin Pijning
Built on the shoulders of giants:
- Pest - The elegant PHP testing framework
- promptfoo - LLM evaluation framework
- Symfony Components - Process and YAML handling
License: MIT License
See the LICENSE file for full details.
Ready to start testing your prompts? Install the plugin and write your first test in under a minute. Happy testing!
All versions of pest-plugin-prompt with dependencies
pestphp/pest Version ^4.0.0
pestphp/pest-plugin Version ^4.0.0
symfony/yaml Version ^7.3