Download the PHP package layered/url-preview without Composer
On this page you can find all versions of the php package layered/url-preview. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download layered/url-preview
More information about layered/url-preview
Files in layered/url-preview
Package url-preview
Short Description Get detailed info for any URL on the internet! Scraper for HTML, OpenGraph, Schema data
License MIT
Informations about the package url-preview
Page Meta 🕵
Page Meta is a PHP library than can retrieve detailed info on any URL from the internet! It uses data from HTML meta tags and OpenGraph with fallback to detailed HTML scraping.
Highlights
- Works for any valid URL on the internet!
- Follows page redirects
- Uses all scraping methods available: HTML tags, OpenGraph, Schema data
Potential use cases
- Display Info Cards for links in a article
- Rich preview for links in messaging apps
- Extract info from a user-submitted URL
How to use
Installation
Add layered/page-meta
as a dependency in your project's composer.json
file:
Usage
Create a UrlPreview
instance, then call loadUrl($url)
method with your URL as first argument. Preview data is retrieved with get($section)
or getAll()
methods:
Behind the scenes
The library downloads the HTML source of the url you provided, then uses specialized scrapers to extract pieces of information.
Core scrapers can be seen in src/scrapers/
, and they extract general info for a page: title, author, description, page type, main image, etc.
If you would like to extract a new field, see Extending the library section.
User Agent or extra headers can make a big difference when downloading HTML from a website. There are some websites that forbid scraping and hide the content when they detect a tool like this one. Make sure to read their dev docs & TOS.
The default User Agent is blocked on sites like Twitter, Instagram, Facebook and others. A workaround is to use this one (thanks for the tip PVGrad):
'HTTP_USER_AGENT' => 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
Returned data
Returned data will be an Array
with following format:
See UrlPreview::getAll()
for info on each returned field.
Public API
UrlPreview
class provides the following public methods:
__construct(array $headers): UrlPreview
Start the UrlPreview instance. Pass extra headers to send when requesting the page URL
loadUrl(string $url): UrlPreview
Load and start the scrape process for any valid URL
getAll(): array
Get all data scraped from page
Return: Array
with scraped data in following format:
site
- info about the websiteurl
- main site URLname
- site name, ex: 'Instagram' or 'Medium'secure
- Boolean true|false depending on http connectionresponsive
- Boolean true|false.True
if site hasviewport
meta tag present. Basic check for responsivenessicon
- site iconlanguage
- ISO 639-1 language code, ex:en
,es
page
- info about the page at current URLtype
- page type, ex:website
,article
,profile
,video
, etcurl
- canonical URL for the pagetitle
- page titledescription
- page descriptionimage
-Array
containing image info, if present:url
- image URLwidth
- image widthheight
- image widthvideo
-Array
containing video info, if found on page:url
- video URLwidth
- video widthheight
- video width
author
- info about the content author, ex:name
- Author's name on a blog, person's name on social network siteshandle
- Social media site usernameurl
- Author URL for more articles or Profile URL on social network sites
app_links
-Array
containing apps linked to page, like:ios
- iOS appurl
- link for in-app action, ex: 'nflx://www.netflix.com/title/80014749'app_store_id
- Apple AppStore app IDapp_name
- name of the appstore_url
- link to installable appandroid
- Android appurl
- link for in-app action, ex: 'nflx://www.netflix.com/title/80014749'package
- Android PlayStore app IDapp_name
- name of the appstore_url
- link to installable app
get(string $section): array
Get data in one scraped section site
, page
, profile
or app_links
Return: Array
with section scraped data. See UrlPreview::getAll()
for data format
addListener(string $eventName, callable $listener, int $priority = 0): UrlPreview
Attach an event on UrlPreview
for data processing or scrape process. Arguments:
$eventName
- on which event to listen. Available:page.scrape
- fired when the scraping process startsdata.filter
- fired when data is requested bygetData()
orgetAll()
methods
$listener
- a callable reference, which will get the$event
parameter with available data$priority
- order on which the callable should be executed
Extending the library
If there's need to more scraped data for a URL, more functionality can be attached to PageMeta library. Example for returing the 'Terms and Conditions' link from pages:
More
Please report any issues here on GitHub.
Any contributions are welcome