Mix unicode and non-latin1 characters in DTML
When Unicode strings are mixed with plain strings in DTML, the plain string is converted to unicode assuming that it contains latin-1 characters (see http://www.zope.org/Members/htrd/howto/unicode). That behaviour fails on utf-8 encoded strings with a UnicodeError.
But we could convert plain strings from utf-8 first and fall back to latin-1 only on error. This gives the ability to support at least two common used encodings.
So here is the workaround: The following monkeypatch of DT_Util.py modifies the method string.join, which is used by cDocumentTemplate.c later on.
--- DT_Util.patched.py +++ DT_Util.py @@ -20,6 +20,29 @@ from RestrictedPython.Guards import safe_builtins from RestrictedPython.Utilities import utility_builtins from RestrictedPython.Eval import RestrictionCapableEval + +def _join_unicode(words, sep=' '): + """join a list of plain strings into a single plain string, + a list of unicode strings into a single unicode strings, + or a list containing a mix into a single unicode string with + the plain strings converted from utf-8, fallback to latin-1. + Try to preserve usual behaviour of string.join method + """ + try: + return sep.join(words) + except UnicodeError: + if sep != '': raise + words = list(words) + for i in range(len(words)): + if isinstance(words[i], str): + try: + words[i] = unicode(words[i], 'utf-8') + except UnicodeError: + words[i] = unicode(words[i], 'latin-1') + return u''.join(words) + +import string +string.join = _join_unicode from cDocumentTemplate import InstanceDict, TemplateDict, \ render_blocks, safe_callable, join_unicode