joonis Logo

Mix unicode and non-latin1 characters in DTML

When Unicode strings are mixed with plain strings in DTML, the plain string is converted to unicode assuming that it contains latin-1 characters (see That behaviour fails on utf-8 encoded strings with a UnicodeError.

But we could convert plain strings from utf-8 first and fall back to latin-1 only on error. This gives the ability to support at least two common used encodings.

So here is the workaround: The following monkeypatch of modifies the method string.join, which is used by cDocumentTemplate.c later on.

@@ -20,6 +20,29 @@
 from RestrictedPython.Guards import safe_builtins
 from RestrictedPython.Utilities import utility_builtins
 from RestrictedPython.Eval import RestrictionCapableEval
+def _join_unicode(words, sep=' '):
+    """join a list of plain strings into a single plain string,
+    a list of unicode strings into a single unicode strings,
+    or a list containing a mix into a single unicode string with
+    the plain strings converted from utf-8, fallback to latin-1.
+    Try to preserve usual behaviour of string.join method
+    """
+    try:
+        return sep.join(words)
+    except UnicodeError:
+        if sep != '': raise
+        words = list(words)
+        for i in range(len(words)):
+            if isinstance(words[i], str):
+                try:
+                    words[i] = unicode(words[i], 'utf-8')
+                except UnicodeError:
+                    words[i] = unicode(words[i], 'latin-1')
+        return u''.join(words)
+import string
+string.join = _join_unicode
 from cDocumentTemplate import InstanceDict, TemplateDict, \
          render_blocks, safe_callable, join_unicode

Kann ich
  Ihnen helfen?

Schreiben Sie mir
doch einfach unter