Download the PHP package cloudstudio/laravel-html-crawler without Composer
On this page you can find all versions of the php package cloudstudio/laravel-html-crawler. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download cloudstudio/laravel-html-crawler
More information about cloudstudio/laravel-html-crawler
Files in cloudstudio/laravel-html-crawler
Package laravel-html-crawler
Short Description A Laravel package for cleaning and transforming HTML content with a fluent interface
License MIT
Homepage https://github.com/cloudstudio/laravel-html-crawler
Informations about the package laravel-html-crawler
Laravel HTML Crawler
A Laravel package for cleaning and transforming HTML content. It provides a fluent interface to remove unwanted elements like CSS, scripts, and more, with options to preserve specific elements and even convert the cleaned HTML to Markdown.
Features
- Remove CSS (inline styles and
<style>
blocks) - Remove JavaScript (inline scripts and
<script>
blocks) - Preserve allowed tags through a configurable list or helper methods
- Convert to Markdown for quick text transformations
- Custom Regex Patterns to remove specific parts of the HTML
- Whitespace Normalization with an option to preserve newlines
Installation
Install the package using Composer:
The package will automatically register itself in Laravel.
To publish the configuration file, run:
Usage
1. Basic HTML Cleaning
By default, the package removes disallowed tags (for example, it will strip <div>
tags and any tags not explicitly allowed):
2. Preserving Allowed Tags
You can explicitly specify which tags to preserve:
Using setAllowedTags
Using Helper Methods
The package offers helper methods to preserve groups of tags:
3. Handling Scripts
Removing <script>
by Default
By default, <script>
blocks are removed:
Preserving <script>
with keepScripts()
If you wish to keep <script>
blocks, use the keepScripts()
method:
4. Handling CSS
By default, <style>
blocks and CSS links are removed. To preserve them, use keepCss()
:
5. Using a Custom Regex Pattern
If you need to remove specific parts of the HTML using a regular expression:
6. Converting to Markdown
You can convert the cleaned HTML to Markdown:
7. Handling Newlines
Control how newlines are handled in the HTML:
8. Loading HTML from a URL
You can also load HTML directly from a URL:
Configuration
The package includes a configuration file that allows you to define default options. After publishing the configuration file, you will find it at config/html-crawler.php
:
You can modify these values according to your needs.
Troubleshooting
If you encounter the error:
make sure your tests are running in a Laravel environment using orchestra/testbench. For package testing, install Testbench with:
Then, set up your base test case to extend Testbench (see the package documentation for more details).
Testing
To run the tests, you can use:
or if using PHPUnit:
Changelog
Please see the CHANGELOG for detailed information on recent changes.
Contributing
Please refer to CONTRIBUTING for details on how to contribute to this package.
Security Vulnerabilities
Please review our security policy on how to report security vulnerabilities.
Credits
- Cloud Studio
- All Contributors
License
This package is open-sourced software licensed under the MIT license.
All versions of laravel-html-crawler with dependencies
illuminate/contracts Version ^11.0
league/commonmark Version ^2.4
league/html-to-markdown Version ^5.1.1
spatie/laravel-package-tools Version ^1.14.0