Download the PHP package xixizizixixi/fatgoose without Composer

On this page you can find all versions of the php package xixizizixixi/fatgoose. It is possible to download/install these versions without Composer. Possible dependencies are resolved automatically.

FAQ

After the download, you have to make one include require_once('vendor/autoload.php');. After that you have to import the classes with use statements.

Example:
If you use only one package a project is not needed. But if you use more then one package, without a project it is not possible to import the classes with use statements.

In general, it is recommended to use always a project to download your libraries. In an application normally there is more than one library needed.
Some PHP packages are not free to download and because of that hosted in private repositories. In this case some credentials are needed to access such packages. Please use the auth.json textarea to insert credentials, if a package is coming from a private repository. You can look here for more information.

  • Some hosting areas are not accessible by a terminal or SSH. Then it is not possible to use Composer.
  • To use Composer is sometimes complicated. Especially for beginners.
  • Composer needs much resources. Sometimes they are not available on a simple webspace.
  • If you are using private repositories you don't need to share your credentials. You can set up everything on our site and then you provide a simple download link to your team member.
  • Simplify your Composer build process. Use our own command line tool to download the vendor folder as binary. This makes your build process faster and you don't need to expose your credentials for private repositories.
Please rate this library. Is it a good library?

Informations about the package fatgoose

一些注意点

快速开始 —— 抓取

快速开始 —— 监视

抓取回调函数自动传入的参数情况

监视回调函数自动传入的参数情况

监视器对监视任务表中的任务进行监视(周期抓取),抓取成功后调用回调函数,回调函数中对页面是否更新进行判断。

返回数组(形式同onSuccess的返回值)则向抓取任务表中追加新的任务(任务级别加1),返回true则更新抓取任务表中与当前监视任务对应的任务为未抓取(0)状态以便再次抓取。

数据去重(布隆过滤器)

针对每个抓取的目标网站建立一个历史url集合,对即将添加到抓取任务表的url判断是否在这个集合中,没在才添加,并将这个新的url加入到集合中。

技术上采用布隆过滤器(bloomfilter):时间和空间效率极高,有误判“结果在集合中,但实际不在集合中”,“结果不在集合中,实际也一定不在集合中”。所以使用bloomfilter最大的损失也就是有一些页面漏抓而已,这是完全可以承受的。

项目地址: https://github.com/pleonasm/bloom-filter 使用时直接参考即可

如果每次创建布隆过滤器对象都通过历史url表,当数据量特别大的时候整个过程会非常慢,而且很占内存。此时可考虑使用布隆过滤器缓存文件,如果缓存文件存在,则用缓存文件创建对象,否则还是通过历史url表创建对象。

注意:①缓存文件只适用于单进程,否则多个进程之间会相互覆盖②如果程序异常终止或强制终止,缓存文件会和历史url表不一致,此时应该手动删除缓存文件以便生成新的缓存文件

配置数组的配置项说明

API说明


All versions of fatgoose with dependencies

PHP Build Version
Package Version
Requires pleonasm/bloom-filter Version *
Composer command for our command line client (download client) This client runs in each environment. You don't need a specific PHP version etc. The first 20 API calls are free. Standard composer command

The package xixizizixixi/fatgoose contains the following files

Loading the files please wait ....