Zend_Dom_Query::loadHTML() problems with UTF-8

in order to prevent damaging charset while usnig Zend_Dom_Query please see the hack below. It happens that you experience wrong characters even if the document is UTF-8 and you specified this encoding when you creted your Zend_Dom_Query instance. The problem is in Zend_Dom_Query::queryXpath() method on line

 
$domDoc->loadHTML($document);

If your html has meta tag written like this: it might have garbage in latter dom queries in output. To fix that there is a dirty hack mentioned here :https://ru.php.net/manual/en/domdocument.loadhtml.php#95251. To fix you can change the mentioned above line to:

 
$domDoc->loadHTML('<?xml encoding="'.$encoding.'">' .$document);

Assuming you’re running 1.11 ZF version where encodings had been introduced in the component. Otherwise you should either download the latest component version or set the encoding yourself.

  • Laura

    That was close, would have worked immediately if you had placed the quotes correctly, like this:

    $domDoc->loadHTML(‘ ‘.$document);

    Thanks, I was pulling my hair out trying to figure out why the special characters were getting corrupted.

  • LeNche

    Thank u so much :)))

  • Anon

    Thanks for this – ditto much hair-pulling…

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.