Need img data-src attribute
Closed this issue · 9 comments
Hi! Great work! But not enough attribute data-src in this attributes set
readability.php/src/Readability.php
Line 1521 in 5bd54a8
Sorry I didn't understand what you mean. Are you saying that I should add data-src
as a image url to extract?
Yes, I mean this )
Fixed via fbd6059
Big Thanx!
And some questions:
(NFRs thinking)
-
What do you think about separation of logic and result-data. I mean that now the result data is built into the service itself.
Now we need to create new instanse on each url.
May be it will be better to separate them? -
What do you think about injecting some user callbacks in parse method for custom dom manipulation in postProcessContent method?
Maybe will be better to open new issues on this..
- Really interesting! Calling the
->parse()
function twice right now will put the system in unexpected states... You could probably end up with old information, for example, in the->getDirection()
getter. What do you suggest? - This was discussed a couple of times and my conclusion was that if you need to tweak the result, I rather have a failing test case submitted by the user than a event manager system or custom dependency injector. If you need to modify the resulting HTML for whatever reason that is external to Readability, you can use the
->getDOMDocument()
getter that I've just added.
Ok, as I think, implementation may looks like below.
With some flexibility remarks
- I expect the result object as the result of the service operation
$html = '<html><body><h1>My HTML content</h1></body></html>';
try {
/** @var Result $result */
$result = $readability->parse($html);
} catch (ParseException $e) {
// error handling here
}
Result will be immutable DTO
final class Result
{
/**
* @var string
*/
private $title;
// ... other props
/**
* Result constructor.
* @param string $title
*/
public function __construct(string $title, ...$other_props_injection)
{
$this->title = $title;
// ... other props sets
}
/**
* @return string
*/
public function getTitle()
{
return $this->title;
}
// ... another getters
}
- We dont need event manager system or custom dependency injector.
We just need interface. And its collection for simple type-hinting and control
interface HandlerInterface
{
/**
* @param DOMDocument $document passing by reference ensures that we always work with the same object
* @return bool
*/
public function handle(DOMDocument &$document);
}
class HandlersCollection implements \IteratorAggregate
{
/**
* @var HandlerInterface[]
*/
protected $handlers = [];
/**
* @return Traversable
*/
public function getIterator()
{
return new \ArrayIterator($this->handlers);
}
/**
* @param HandlerInterface $handler
* @return HandlersCollection
*/
public function add(HandlerInterface $handler)
{
$this->handlers[] = $handler;
return $this;
}
// ... maybe other methods like remove etc.
}
And some implementations
class MyHandler implements HandlerInterface
{
/**
* @param DOMDocument $document
* @return bool
*/
public function handle(DOMDocument &$document)
{
foreach ($document->getElementsByTagName('img') as $img) {
// do something with images
}
return true;
}
}
class MyAnotherHandler implements HandlerInterface
{
/**
* @param DOMDocument $document passing by reference ensures that we always work with the same object
* @return bool
*/
public function handle(DOMDocument &$document)
{
// another dom manipulations
return true;
}
}
And so we can do this
$handlers = (new HandlersCollection())
->add(new MyHandler())
->add(new MyAnotherHandler());
try {
/** @var Result $result */
$result = $readability->parse($html, $handlers);
} catch (ParseException $e) {
// error handling here
}
and somewhere in postProcessContent method or after its call in parse method (last operation before setting result)
if ($handlers) {
/** @var HandlerInterface $handler */
foreach ($handlers as $handler) {
$handler->handle($domObject);
}
}
- And as I said - some flexibility remarks :)
$config = new Configuration();
$logger = new NullLogger();//for example
// logger is not part of configuration, config is only params
// and I want store them in different data structures
// and will be cool to inject my params into config constructor something like this:
$params = [
'MaxTopCandidates' => 5,
'SubstituteEntities' => true,
// ... etc
];
// I can store this params in php-array or in ini-file or in yml-file
// it will be a cleaner approach
$config = new Configuration($params);
// and this will give me the flexibility to assemble the Readability object in a di-container
// separate logger and config
$readability = new Readability($config, $logger);
passing by reference ensures that we always work with the same object - its really not nessesary...
-
DTO as result: Love it. Will work on it and release it on the next major version. It's going to be a backwards incompatible change but i really like the idea.
-
Interface handlers: I get what you're saying, but I don't agree on the purpose. What's the point of adding a handler at the end of the process if you can get the final, fully functional DOMDocument? What's so important that you cannot add it as part of the algorithm and you can't either do it yourself after the system returns you the result? I understand the idea of adding flexibility but I don't think it's going to add value to the project. Maaaaybe this would open a door for "plug ins" (other devs could create their own handlers and share them to other people as downloadable plugins) but, again, I don't see the point of doing it inside Readability instead of manipulating the final DOMDocument.
-
Using arrays as configuration: It used to be that way before v1. I switched to a more object oriented approach when I had issues misspelling the parameter names. I think it's a much cleaner, better approach to give the developer a configuration interface instead of a array processing function. You're more than welcome to work on a PR that adds this functionality to the Configuration object
Thanks again for your feedback, I really appreciate it :D
-
Yeah. It will break BC. And I'll be waiting for this changes )
-
Ok, the decision is yours. I just only say once again that this is not the addition of a functional, it is the possibility of a custom expansion of the functional to produce a result in the result-object, rather than getting DOM and further manipulating it. This is not a part of the library, but it allows the developer to influence the final result of the library work. Only if he wish it.
-
I don't talk about "array as config". Array as config really bad idea for manipulating config-data in service.
I'm just talking about the possibility to pass array of cofig params to the constructor. Usually we storing configs in arrays - native array, ini-files, yml-files...
It will looks like this.
For example I have yml-cofig:
readability:
maxTopCandidates: 3
stripUnlikelyCandidates: false
substituteEntities: true
My framework will read this file and it will be an array which I can pass in to constructor.
Which is what I do in the container:
$di->set('readability_config', function () {
return new Configuration($this->config->get('readability'));
});
// And I can set different sets of configurations and use them depending on the needs or context
$di->set('readability', function () {
return new Readability($this->get('readability_config'));
});
// or we'll do it in a simpler way (it does not matter how)
$di->set('readability', function () {
return new Readability(new Configuration($this->config->get('readability')));
});
And __construct will allow you to add only valid parameters, and all through native setters:
public function __construct(array $config = [])
{
foreach ($config as $key => $value) {
$method = "set{$key}";
if (isset($this->{$key})) { // or method_exists($this, $method)
call_user_func([$this, $method], $value); // or $this->{$method}($value)
}
}
}
We can still use the setters to configure the configuration. But also we can send all parameters at once in the constructor.
Ok, I'll make a PR with this changes some time later.
Best regards! And thank you for your work again! )