Download the PHP package sebastiansulinski/path-extractor without Composer
On this page you can find all versions of the php package sebastiansulinski/path-extractor. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download sebastiansulinski/path-extractor
More information about sebastiansulinski/path-extractor
Files in sebastiansulinski/path-extractor
Package path-extractor
Short Description Parse html document and extract paths from the images, anchors and other tags.
License MIT
Informations about the package path-extractor
Path extractor
Package, which extracts paths and attributes from the image, anchor and other tags of the provided html.
Installation
Basic usage
Instantiating
You can instantiate Extractor
either by using new
keyword or static make
method.
Constructor takes and optional argument, which represents the string to be parsed.
Specifying input html
Apart from being able to pass your string via constructor, you can also use the Extractor::for
method to set it on the instance.
Extracting images
To extract all images use the Extractor::extract(Image::class)
method.
The above will return array containing the collection of \SSD\PathExtractor\Tags\Image
class instances with properties src
and alt
available.
Extracting anchors
To extract all anchors use the Extractor::extract(Anchor::class)
method.
The above will return array containing the collection of \SSD\PathExtractor\Tags\Anchor
class instances with properties href
, target
, title
and nodeValue
available.
Extracting scripts
To extract all anchors use the Extractor::extract(Script::class)
method.
The above will return array containing the collection of \SSD\PathExtractor\Tags\Script
class instances with properties src
, async
, and defer
available - last two with boolean true
/ false
set based on whether they are present or not.
Limiting extensions
Sometimes you might want to only extract images or anchors with certain extensions.
To do this use the Extractor::withExtensions()
method and pass the required extensions as argument.
Pre-pending url
Sometimes you might wish to prepend the protocol, domain name and even a port to the relative paths extracted from your html.
To do this, use the Extractor::withUrl()
method.
The above will return an array containing two instances of \SSD\PathExtractor\Tags\Image
- one with src
set to https://mywebsite.com/media/image.jpg
and the other to https://ssdtutorials.com/media/image2.jpg
. Please note - it will not replace the paths which already contain protocol and domain.
Tidying / purifying input
If you'd like your input to first undergo the purification, you can use the Extractor::withTidy()
method.
This method takes 2 optional arguments: array $config = []
, which allows you to overwrite default tidy
extension configuration as well as string $encoding = 'utf8'
should you need to change the encoding.
By default config is set to
More on config options at HTML Tidy Configuration Options.
Invalid input exception
If you decide NOT to use tidy
to purify your input, where for instance you will do this before passing the html to the constructor or for
method and if the provided html contains invalid syntax, the \SSD\PathExtractor\InvalidHtmlException
will be thrown - so make sure you catch it and act accordingly.
Accessing attributes of the \SSD\PathExtractor\Tags\Tag
class instance.
Each implementation of \SSD\PathExtractor\Tags\Tag
will have their own, unique set of properties available
Rendering tag for \SSD\PathExtractor\Tags\Tag
class instance.
Once you have extracted the collection of resources, you can then return an html tag for each one by simply casting it to string or by calling the tag()
method on it.
Both of the above will return
You can also obtain array representation of each instance by calling Tag::toArray()
method on it
Adding more tag types
If you need more tag types i.e. link
- simply add new class that extends \SSD\PathExtractor\Tags\Tag
and implement the abstract methods required by it.
Example of extracting only paths
All versions of path-extractor with dependencies
ext-dom Version *
ext-tidy Version *