This is undoubtedly the best php curl library.It is widely used by many developers.The library is a wrapper of curl_multi_* functions with best performance,maximum flexibility,maximum ease of use and negligible performance consumption.All in all it's a very very powerful library.
PHP 5.3 +
composer.json
{ "require" : { "phpdr.net/php-curlmulti" : "2.*" } }
Email: admin@phpdr.com
QQ Group:215348766
- Extremely low cpu and memory usage.
- Best program performance(tested spider 2000+ html pages per second and 1000MBps pic download speed).
- Support global parallel and seperate parallel for defferent task type.
- Support running info callback.All info you need is returned, include overall and every task infomation.
- Support adding task in task callback.
- Support user callback.You can do anything in that.
- Support task controll use value returned from process callback .
- Support global error callback and task error callback.All error info is returned.
- Support internal max try for tasks.
- Support user variable flow arbitrarily.
- Support global CURLOPT_* and task CURLOPT_*.
- Powerfull cache.Global and task cache config supported.
- All public property config can be changed on the fly!
- You can develop amazing curl application based on the library.
Without pthreads php is single-threaded language,so the library widely use callbacks.There are only two common functions CurlMulti_Core::add() and CurlMulti_Core::start().add() just add a task to internal taskpool.start() starts callback cycle with the concurrent number of CurlMulti_Core::$maxThread and is blocked until all added tasks(a typical task is a url) are finished.If you have huge number of tasks you will use CurlMulti_Core::$cbTask to specify a callback function to add() urls,this callback is called when the number of running concurrent is less than CurlMulti_Core::$maxThread and internal taskpool is empty.When a task finished the 'process callback' specified in add() is immediately called,and then fetch a task from internal taskpool,and then add the task to the running concurrent.When all added tasks finished the start() finished.
src/Core.php
Kernel class
src/Base.php
A wraper of CurlMulti_Core.Very usefull tools and convention is included.It's very easy to use.All spider shoud inherent this class.
src/Exception.php
CurlMulti_Exception
src/AutoClone.php
A powerfull site clone tool.It's a perfect tool.
Feature:
- It's a work of art on software engineer and programming technique.
- Easy to use, has only one public method start(void).
- Low coupling,easy to extend.Copying a site with CurlMulti is very fast.
- All duplicate url in all pages will be processed only once.
- All url and uri in pages will be accurately processed automaticly!
- @import in css and images in css can be downloaded automaticly,ignore @import depth!
- Can process multi url prefix and config the url individually.
- Subprefix for url can be specified and config for the subprefix can be specified.
- Process 3xx redirect automaticly.
- Resources cross site will be shared.For example,site A use js and css of B,when clone B this css and js will not be processed again.
- In one dir arbitray number site can be located and no file will conflict.
- Download option support multitype control.
issue:
1. Css annotation for IE will not be processed,because a standard way is not founded by now.
Clone of site: http://manual.phpdr.net/
public $maxThread = 10
Max concurrence num, can be changed in the fly.
The limit may be associated with OS or libcurl,but not the library.
public $maxThreadType = array ()
Set maxThread for specified task type.Key is type(specified in add()).Value is parallel.The sum of values can exceed $maxThread.Parallel of notype task is value of $maxThread minus the sum.Parallel of notype less than zero will be set to zero.Zero represent no type task will never be excuted except the config changed in the fly.
public $maxTry = 3
Trigger curl error or user error before max try times reached.If reached $cbFail will be called.
public $opt = array ()
Global CURLOPT_* for all tasks.Overrided by CURLOPT_* in add().
public $cache = array ('enable' => false, 'enableDownload'=> false, 'compress' => false, 'dir' => null, 'expire' =>86400, 'dirLevel' => 1, 'verifyPost' => false, 'overwrite' => false)
The options is very easy to understand.Cache is identified by url.If cache finded,the class will not access the network,but return the cache directly.
public $taskPoolType = 'stack'
Values are 'stack' or 'queue'.This option decide depth-first or width-first.Default value is 'stack' depth-first.
public $cbTask = array(0=>'callback',1=>'callback param')
When the parallel is less than $maxThread and taskpool is empty the class will try to call callback function specified by $cbTask.$cbTask[0] is callback itself.$cbTask[1] is parameters for the callback.
public $cbInfo = null
Callback for running info.Use print_r() to check the info in callback.The speed is limited once per second.
public $cbUser = null
Callback for user operations very frequently.You can do anything there.
public $cbFail = null
Callback for failed tasks.Lower priority than 'fail callback' specified than add().
public function __construct()
Musted be called in subclass.
public function add(array $item, $process = null, $fail = null)
Add a task to taskpool.
$item['url'] Must not be emtpy.
$item['opt']=array() CURLOPT_* for current task.Override the global $this->opt and merged.
$item['args'] Second parameter for callbacks.Include $this->cbFail and $fail and $process.
$item['ctl']=array() do some additional control.type,cache,ahead。
$item['ctl']['type'] Task type use for $this->maxThreadType。
$item['ctl']['cache']=array() Task cache.Override $this->cache and merged.
$item['ctl']['ahead'] Regardless of $this->taskPoolType.The task will be allways add to parallel prioritized.
$process Called if task is success.The first parameter for the callback is array('info'=>array(),'content'=>'','ext'=>array()) and the second parameter is $item['args'] specified in first parameter of add().First callback parameter's info key is http info,content key is url content,ext key has some extended info.
$fail Task fail callback.The first parameter has two keys of info and error.Info key is http info.The error key is full error infomation.The second parameter is $item['args'].
public function start($persist=null)
Start the loop.This is a blocked method. Param $persist is a callback,if true returned and all tasks finished start() will still block.Sleep must be set in callback if needed.
function __construct($curlmulti = null)
Set up use default CurlMulti_Core or your own instance.
function hashpath($name, $level = 2)
Get hashed path.Every directory has max 4096 files.
function substr($str, $start, $end = null, $mode = 'g')
Get substring between start string and end string.Start and end string are excluded.
function cbCurlFail($error, $args)
Default fail callback.
function cbCurlInfo($info,$isFirst,$isLast)
Default CurlMulti_Core::$cbInfo
function encoding($html, $in = null, $out = 'UTF-8', $mode = 'auto')
Powerfull function to convert html encoding and set <head></head> in html.$in can be get from <head></head>.
function isUrl($str)
If is a full url.
function uri2url($uri, $urlCurrent)
Get full url of $uri used in the $urlCurrent html page.
function url2uri($url, $urlCurrent)
get relative uri of the current page.
function urlDir($url)
url should be redirected final url.Final url normally has '/' suffix.
function getCurl()
Return CurlMulti_Core instance.