Somehow with newer libxml2, `<?xml encoding="UTF-8">` no longer enforces UTF-8. Instead, non-ASCII contents are treated as ISO-8859-1 and get broken. For example, `<p>中文</p>` becomes `<p>中文</p>` (should be `<p>中文</p>`). Switching to another trick mentioned on [1] fixes the issue, and the new trick still works with older libxml2 (tested 2.11.5). As a side note, DOMDocument::loadHTML uses HTMLParser in libxml2 [2][3]. [1] https://stackoverflow.com/questions/8218230/php-domdocument-loadhtml-not-encoding-utf-8-correctly [2] https://github.com/php/php-src/blob/php-8.1.26/ext/dom/document.c#L1855 [3] https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-HTMLparser.html
Tiny Tiny RSS
Web-based news feed aggregator, designed to allow you to read news from any location, while feeling as close to a real desktop application as possible.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.
Copyright (c) 2005 Andrew Dolgov (unless explicitly stated otherwise).