A Dockerfile for the open (older) version of the Full-Text RSS made by FiveFilters.org. This version is a fork of mine, which fixed the site config updates.
A volume mounting /var/www/html/site_config/
is recommended, especially when using custom site configs.
Since this old version is running on PHP 7.3 (where the security support is running out on 6 Dec 2021) additional protection measures are recommended!
Not affiliated with fivefilters.org. The Dockerfile is licensed under Unlicense.
ENV | Default1 | Accepted | Description |
---|---|---|---|
ftr_enabled |
true |
true/false | Set this to false if you want to disable the service. |
ftr_debug |
true |
true/'user'/'admin'/'false' | Enable or disable debugging. When enabled, debugging works by passing &debug to the makefulltextfeed.php querystring. |
ftr_default_entries |
5 |
int | The number of feed items to process when no API key is supplied and no &max=x value is supplied in the querystring. |
ftr_max_entries |
10 |
int | The maximum number of feed items to process when no access key is supplied. |
ftr_content |
'user' |
true/false/'user' | By default Full-Text RSS includes the extracted content in the output. You can exclude this from the output by passing '&content=0' in the querystring. |
ftr_html5_output |
'user' |
true/false/'user' | Full-Text RSS used to rely on libxml to output HTML extracted from a web page. Since version 3.8 we use HTML5-PHP by default. |
ftr_summary |
'user' |
true/false/'user' | By default Full-Text RSS does not include excerpts in the output. You can enable this by passing '&summary=1' in the querystring. This will include a plain text excerpt from the extracted content. |
ftr_rewrite_relative_urls |
true |
true/false | With this enabled relative URLs found in the extracted content block are automatically rewritten as absolute URLs. |
ftr_exclude_items_on_fail |
'user' |
true/false/'user' | Excludes items from the resulting feed if we cannot extract any content from the item URL. |
ftr_singlepage |
true |
true/false | If enabled, we will try to follow single page links (e.g. print view) on multi-page articles (if defined in a site config file). |
ftr_multipage |
true |
true/false | If enabled, we will try to follow next page links on multi-page articles (if defined in a site config file). |
ftr_caching |
false |
true/false | Enable this if you'd like to cache results on disk. |
ftr_cache_time |
10 |
int | How long should a response be cached (minutes)? |
ftr_message_to_prepend |
'' |
str | HTML to insert at the beginning of each feed item when no access key is supplied. |
ftr_message_to_append |
'' |
str | HTML to insert at the end of each feed item when no access key is supplied. |
ftr_error_message |
'[unable to retrieve full-text content]' |
str | Error message when content extraction fails (without access key) |
ftr_keep_enclosures |
true |
true/false | If enabled, we will try to preserve enclosures if present. |
ftr_detect_language |
'user' |
* Ignore language: 0 * Use article/feed metadata (e.g. HTML lang attribute): 1 * As above, but guess if not present: 2 * Always guess: 3 * User decides: 'user' |
Should we try and find/guess the language of the article being processed? |
ftr_user_submitted_config |
false |
true/false | If enabled, a user can submit site config rules directly in the request using the siteconfig request parameter. Disabled (false) by default. |
ftr_remove_native_ads |
false |
true/false | Many news sites now carry native advertising - articles which have been paid for by a corporation to promote their brand or product. |
ftr_admin_credentials |
array('username'=>'admin', 'password'=>'') |
Format like this: admin:my-secret-password | Certain pages/actions, e.g. updating site patterns with our online tool, will require admin credentials. |
ftr_allowed_urls |
array() |
🤷♂️ | List of URLs (or parts of a URL) which the service will accept. |
ftr_blocked_urls |
array() |
🤷♂️ | List of URLs (or parts of a URL) which the service will not accept. |
ftr_blocked_message |
'URL blocked' |
str | If a request is blocked outright because of the two rules above, this is the message that is shown. |
ftr_key_required |
false |
true/false | Set this to true if you want to restrict access only to those with a key. |
ftr_api_keys |
array() |
🤷♂️ | Keys let you group users - those with a key and those without - and restrict access to the service to those without a key. If you want everyone to access the service in the same way, you can leave the array below empty and ignore the access key options further down. |
ftr_default_entries_with_key |
5 |
int | The number of feed items to process when a valid access key is supplied. |
ftr_max_entries_with_key |
10 |
int | The maximum number of feed items to process when a valid access key is supplied. |
ftr_xss_filter |
'user' |
true/false/'user' | We have not enabled this by default because we assume the majority of our users do not display the HTML retrieved by Full-Text RSS in a web page without further processing. If you subscribe to our generated feeds in your news reader application, it should, if it's good software, already filter the resulting HTML for XSS attacks, making it redundant for Full-Text RSS do the same. |
ftr_favour_effective_url |
'user' |
true/false/'user' | When we extract content for feed items, we often end up at a different URL than the one in the original feed. This is often a result of URL shorteners or tracking services being used by the feed publisher. We include the final (effective) URL we reached to get the content inside the dc:identifier field. If you enable this, we'll also use this URL in place of the original item URL in the new feed we produce. |
ftr_favour_feed_titles |
'user' |
true/false/'user' | By default, when processing feeds, we assume item titles in the feed have not been truncated. So after processing web pages, the extracted titles are not used in the generated feed. |
ftr_allowed_parsers |
array('libxml', 'html5php') |
🤷♂️ | Full-Text RSS attempts to use PHP's libxml extension to process HTML. While fast, on some sites it may not always produce good results. |
ftr_allow_parser_override |
true |
true/false | If enabled, user can pass &parser=html5php to override default parser. |
ftr_cors |
false |
true/false | If enabled we'll send the following HTTP header: Access-Control-Allow-Origin: * |
ftr_proxy_servers |
array() |
🤷♂️ array('example2'=>array('host'=>'127.0.0.1:8888', 'auth'=>'user:pass') |
You can specify proxy servers here and ask Full-Text RSS to route HTTP requests through these servers. If no proxy server is listed, all requests will be made directly. |
ftr_proxy |
true |
* Disable: false (no proxy will be used) * Named: specify which server should be used (e.g. 'example1') * Random: true (default) a random one from the set above will be used each time Full-Text RSS is called. |
How the proxy servers above should be used: |
ftr_allow_proxy_override |
true |
true/false | If enabled, user can disable or change the proxy server used. |
ftr_apc |
true |
true/false | If enabled we will store site config files (when requested for the first time) in APC's user cache. [Since there is no APC in this Dockerfile, this setting doesn't do anything.] |
ftr_smart_cache |
true |
true/false | With this option enabled we will not cache to disk immediately. We will store the cache key in APC and if it's requested again we will cache results to disk. Keys prefixed with 'cache.' |
ftr_cache_cleanup |
100 |
0 = script will not clean cache (rename cachecleanup.php and use it for scheduled (e.g. cron) cache cleanup) 1 = clean cache everytime the script runs (not recommended) 100 = clean cache roughly once every 100 script runs |
How often the cache is cleared. |
Footnotes
-
as of commit 384d52f ↩