/BLASTphp

A PHP wrapper for NCBI BLAST URL API

Primary LanguagePHPGNU General Public License v3.0GPL-3.0

BLASTphp

The BLASTphp library is a PHP wrapper for the NCBI BLAST URL API. It allows remote execution of the NCBI BLAST through RESTful services. BLASTphp requests to NCBI BLAST through HTTP/HTTPS interface and elicits a response in HTML, Text, XML, XML2, JSON2, or Tabular (text) format. The default response format is HTML.

Since NCBI BLAST is a shared resource, usage limitations apply. BLASTphp is a lightweight program which consumes less bandwidth and resource. Projects that involve a large number of BLAST searches should use the RESTful interface at Cloud BLAST or stand-alone BLAST. Currently NCBI provides a commercial BLAST server image hosted in Amazon Web Services (AWS), Google Compute Engine (GCE), and Microsoft Azure cloud servers. This allows users to run stand-alone searches with the BLAST+ applications, submit searches through a subset of the NCBI BLAST URL API, and perform searches with a simplified webpage. The server image includes a FUSE client that will download BLAST databases during the first search. The server image runs on Ubuntu Linux.

Please refer to NCBI BLAST URL API Documentation for setting BLAST parameters. If a parameter is not required and not provided, then the default value will be used. That default value may depend upon the BLAST search you are running. NCBI BLAST can be executed by simply passing CMD, PROGRAM, DATABASE, and QUERY parameters to the NCBI BLAST URL API.

To use BLASTphp through a webserver, first set the maximum execution time of webserver to request time of execution (RTOE) value of BLAST, because the default maximum execution time (30 seconds) is not enough. For example,

ini_set('max_execution_time', $RTOE);

In some cases $RTOE value will be not enough to execute the program. If so, the $RTOE value must be increased to $RTOE+60 or higher. A max_execution_time value of 0 will make the max execution time to unlimited. However, it is not recommended that you do this. In rare cases in which a script has somehow gone into an infinite loop, or is in a deadlock because of file level locking, your server will get overloaded and your memory and CPU usage will go above recommended thresholds.

The query sequence must be encoded before passing through QUERY parameter. For example,

$encoded_query = urlencode($sequence);

Where urlencode() is the built-in PHP function to encode the non-alphanumeric characters to equivalent URL codes.

The following is an example script to build the requests to NCBI BLAST.

<?php
$data = array('CMD' => 'Put', 'PROGRAM' => 'blastp', 'DATABASE' => 'pdb', 'QUERY' => $encoded_query);
$options = array(
  'http' => array(
    'header'  => "Content-type: application/x-www-form-urlencoded\r\n",
    'method'  => 'POST',
    'content' => http_build_query($data)
  )
);
$context  = stream_context_create($options);
$result = file_get_contents("https://blast.ncbi.nlm.nih.gov/blast/Blast.cgi", false, $context);
?>

The response may consist of RID = VALUE, RTOE = VALUE, Informational, QBlastInfoBegin, QBlastInfoEnd, Status=WAITING, Status=FAILED, Status=UNKNOWN, and/or Status=READY commands, which are used to track the result.

The following is an example script to retrieve response from the NCBI BLAST.

<?php
$option = array(
  'http' => array(
  	'method' => 'GET'
  )
);
$content = stream_context_create($option);
$output = file_get_contents("https://blast.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Get&RID=$rid", false, $content);
?>

After successful execution, NCBI BLAST returns response in HTML (default) format. To get different types of response, you must specify the file type in the URL. For example, FORMAT_TYPE=Text for plain text file format.

The complete working PHP script 'blastphp.php' is included in the main directory of this repository.

Do not overload the NCBI servers. If you are intending to perform more than 20 searches in a session you should comply with the following guidelines:

  1. Do not contact the server more often than once every three seconds.
  2. Do not poll for any single RID more often than once a minute.
  3. Use the URL parameter email, and tool, so that we can track your project and contact you if there is a problem.
  4. Run scripts weekends or between 9 pm and 5 am Eastern Time weekday if more than 50 searches will be submitted.

NCBI BLAST often runs more efficiently if multiple queries are sent as one search than if each query is sent as an individual search. This is especially true for blastn, megablast, and tblastn. For short queries (less than a few hundred bases), it is suggested to merge them into one search of up to 10,000 bases.

The NCBI servers are a shared resource and not intended for projects that involve a large number of BLAST searches. Stand-alone BLAST and the RESTful API at a cloud provider are provided for such projects.

If there is a problem with a request, NCBI BLAST REST (REpresentational State Transfer) will usually return some sort of human-readable message indicating what went wrong – whether it’s an invalid input, or nothing was found for the given query, or the request was too broad and took too long to complete (more than 30 seconds, the NCBI standard time limit on web service requests), etc.

If the operation was successful, the HTTP status code will be 200 (OK). If the server encounters an error, it will return an HTTP status code that gives some indication of what went wrong; possibly along with, depending on the output format (such as in a tag in XML), some additional more human-readable detail message(s). The codes in the 400-range are errors on the client side, and those in the 500 range indicate a problem on the server side; the codes currently in use are:

HTTP Status Error Code General Error Category
200 (none) Success
202 (none) Accepted (asynchronous operation pending)
400 Bad Request Request is improperly formed (syntax error in the URL, POST body, etc.)
404 Not Found The input record was not found (e.g. invalid RID)
405 Method Not Allowed Request not allowed (such as invalid MIME type in the HTTP Accept header)
504 Timeout The request timed out, from server overload or too broad a request
501 Unimplemented The requested operation has not (yet) been implemented by the server
500 Server Error Some problem on the server side (such as a database server down, etc.)
500 Unknown An unknown error occurred

Please feel free to sent your queries, suggestions and/or comments related to BLASTphp program to ashok.bioinformatics@gmail.com or ashok@biogem.org.

BLASTphp is made available under version 3 of the GNU Lesser General Public License.

Ashok Kumar, T., and Rajagopal, B. (2017). BLASTphp: a PHP wrapper for NCBI BLAST API. International Journal for Computational Biology. 6(1): 31-33. [Abstract] [PDF]