Spanish web content not displayed correctly '?' is putted instead of the correct character
ElliotFer2000 opened this issue · 1 comments
Spanish words with accents are not properly displayed, char with accents are being replaced with a "?" character
why is this happening? How can I tell the scrapper I'm dealing with the spanish language?
code:
$web = new \Spekulatius\PHPScraper\PHPScraper;
$web->go("https://www.marca.com");
return $web->outlineWithParagraphs;
I return the outline back to the client in json format, the result I'm getting is something like this:
[
{
"tag": "h2",
"content": "Joao F?lix: \"El Bar?a siempre ha sido mi primera opci?n\""
}
]
I have already tried to solve the problem by putting this at the beggining of the script: setlocale(LC_ALL, 'es_AR')
F?lix
and opci?n
are not properly displayed in the response, it should be Félix
and Opción
, ?
is being showed instead of é
and ó
When I return the result of this function the characters display correctly
utf8_encode(file_get_contents("https://www.marca.com"))
I have tried to request the document with file_get_contents
, encode the result and then pass the result to $web->setContent
function, I get the expected output working in this way.
$web = new PHPScraper;
$rawPageContent = utf8_encode(file_get_contents("https://www.marca.com"));
$web->setContent("https://www.marca.com",$rawPageContent);
Hello @ElliotFer2000
it looks like the fetching isn't using the correct encoding. I managed to confirm the issue. Have you checked how this could be resolved?
Peter