Download the PHP package gidlov/copycat without Composer
On this page you can find all versions of the php package gidlov/copycat. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.
Download gidlov/copycat
More information about gidlov/copycat
Files in gidlov/copycat
Package copycat
Short Description A universal scraping tool that can be used for all kinds of data collection. You can decide from where and what you want. All with regular expression. More info on the Github page.
License LGPL
Homepage http://github.com/gidlov/copycat
Informations about the package copycat
Copycat - A PHP Scraping Class
You may find more info on gidlov.com/en/code/copycat
For Laravel 5/4 Developers
In the require
key of composer.json
file add the following:
Run the Composer update
command.
For Laravel 5 Developers
Add to providers
in app/config/app.php
.
and to aliases
in the same file.
For Laravel 4 Developers
Add to providers
in app/config/app.php
.
and to aliases
in the same file.
Yet another scraping class
I didn’t do much research before I wrote this class, so there is probably something similar out there, and certainly some more decent solution. A Python version of this class is under development.
But still, I needed a class that could pick out selected pieces from a web page, with regular expression, show or save it. I also needed to be able to save files and or pictures, and also specify or complete a current file name.
It is also possible to use a search engine to look up an address to extract data from. Assuming you has entered an expression for that particular page.
Briefly
- Uses regular expression, match one or all.
- Can download and save files with custom file names.
- Possible to search through one or several tens of thousands of pages in sequence.
- Can use search engines to find out the right page.
- Also possible to apply callback functions for all items.
How to use this class
Include the class and initiate your object with some custom cURL parameters, if you need/like.
I use IMDb as our target source in these examples.
Say we want to retrieve a particular film score, for simplicity, we happen to know the address of this very film, Donnie Darko. This is how the code could look like.
It’s basically everything. We specify what has to be matched, and a name for this, and we enter an address. Our answer array will look as follows:
If we were to give the method URLs()
an associative array instead of a string array('Donnie Darko' => 'http://imdb.com/title/tt0246578/')
the answer would be:
Also note that I’m using method chaining, it is supported, but it’s a matter of taste.
But it’s unlikely that we know or can guess IMDb’s choice of URL for a particular movie, so we’ll Binging it when we don’t know it (Google tends to interrupt the sequence after an unknown number of inquiries, therefore I chose Bing).
Now we have introduced fillURLs()
which consists of a search query, a regular expression to match our destination page and keywords that represent the search. The result is the same as in the first example.
Let’s catch more about this film. Original title, rating and votes, release year, director, starring actors and of course we save the cover image. Original file name of the image is something like MV5BMTczMzE4Nzk3N15BMl5BanBnXkFtZTcwNDg5Mjc4NA @ @. _V1SX214.jpg, So we rename it to the title instead.
And the result of such an operation would provide:
Apply your callback functions on all value items and view the results.
To apply functions on selected elements, replace _all_
with your key value, like this:
Note that it is fine to use anonymous functions too.
Drawbacks
PHP itself is not suitable for long time-consuming operations, since the process is interrupted as soon as the user closes the web page, or when PHP's time limit is reached (however set_time_limit(0)
is utilized in the construct method so right there should not be a problem).
Requirements
- PHP 5.3
- cURL extension
License
Copycat is released under LGPL.
Thanks
If this library is useful for you, say thanks buying me a coffee :coffee:!