from core.util import
Utility for working with unicode text.
The text.Text class is all about conversions between unicode text and various other formats.
Text objects always store their text internally as unicode, but provide a lot of properties for easy conversion to other formats.
from core.util import text t = text.Text('Hi!') print t.encoding t.text t.ascii t.bytes t.url
__init__(self, t=None, **kwargs)
Pass a text string (str or unicode). Accepts keyword arg 'enc' for encoding conversions. The default encoding is utf-8. If you pass text with a different encoding, you must specify the encoding.
from core.util import text t = text.Text(someLatin1Text, enc='latin_1') t.utf8
__iter__
Iterate through unicode characters.
__str__
Return as ascii, unicode escaped.
__unicode__
Returns text.
encoding
The encoding specified (in the constructor) for this text.
ascii
Returns the ascii equivalent of the text, with unicode backslashes where needed.
utf8
UTF8 with backslash replacements where needed.
xml
ASCII with xml charref replacements where needed.
xml8
UTF8 with xml charref replacements where needed.
text
Get or set text as unicode.
bytes
Get bytearray or set text from bytearray.
url
Get url-quoted text or set text from url-quoted text.
url64
Get Url-safe base64 text, or set text from Url-safe base64.
base64
Get base64 text, or set text from base64.
base32
Get base64 text, or set text from base64.
base16
Get base64 text, or set text from base64.
hex
Get hexlified text, or set text from hex.
rle
binhex4 style RLE-compression/decompression.
gzip
Return text compressed with gzip, or set text from gzip.
bz
Return text compressed with bz, or set text from bz.
An iterator for characters in a Text object - Implements unicodedata properties for whatever the current character is.
ord
The ordinal for the current character.
name
Returns the name assigned to this Unicode character as a string. If no name is defined, returns None.
decimal
Returns the decimal value assigned to this Unicode character unichr as integer. If no name is defined, returns None.
digit
Returns the digit value assigned to this Unicode character unichr as integer. If none is defined, returns None.
numeric
Returns the digit value assigned to this Unicode character as float. If none is defined, returns None.
category
Returns the general category assigned to this Unicode character as string.
bidirectional
Returns the bidirectional class assigned to this Unicode character as string. If none is defined, an empty string is returned..
combining
Returns the canonical combining class assigned to this Unicodev character as string. Returns 0 if no combining class is defined.
east_asian_width
Returns the east asian width assigned to this Unicode character as string.
mirrored
Returns 1 if the character has been identified as a 'mirrored' character in bidirectional text, 0 otherwise.
decomposition
Returns the character decomposition mapping assigned to the Unicode character unichr as string. An empty string is returned in case no such mapping is defined.