Aimy

from core.util import

text

Utility for working with unicode text.

Text

The text.Text class is all about conversions between unicode text and various other formats.

Text objects always store their text internally as unicode, but provide a lot of properties for easy conversion to other formats.

from core.util import text
t = text.Text('Hi!')
print t.encoding

t.text
t.ascii
t.bytes
t.url
__init__(self, t=None, **kwargs)

Pass a text string (str or unicode). Accepts keyword arg 'enc' for encoding conversions. The default encoding is utf-8. If you pass text with a different encoding, you must specify the encoding.

from core.util import text
t = text.Text(someLatin1Text, enc='latin_1')
t.utf8
	
__iter__

Iterate through unicode characters.

__str__

Return as ascii, unicode escaped.

__unicode__

Returns text.

Read-only Properties

encoding

The encoding specified (in the constructor) for this text.

ascii

Returns the ascii equivalent of the text, with unicode backslashes where needed.

utf8

UTF8 with backslash replacements where needed.

xml

ASCII with xml charref replacements where needed.

xml8

UTF8 with xml charref replacements where needed.

Read-write Properties

text

Get or set text as unicode.

bytes

Get bytearray or set text from bytearray.

url

Get url-quoted text or set text from url-quoted text.

url64

Get Url-safe base64 text, or set text from Url-safe base64.

base64

Get base64 text, or set text from base64.

base32

Get base64 text, or set text from base64.

base16

Get base64 text, or set text from base64.

hex

Get hexlified text, or set text from hex.

rle

binhex4 style RLE-compression/decompression.

gzip

Return text compressed with gzip, or set text from gzip.

bz

Return text compressed with bz, or set text from bz.

TextIter

An iterator for characters in a Text object - Implements unicodedata properties for whatever the current character is.

Read-only Properties

ord

The ordinal for the current character.

name

Returns the name assigned to this Unicode character as a string. If no name is defined, returns None.

decimal

Returns the decimal value assigned to this Unicode character unichr as integer. If no name is defined, returns None.

digit

Returns the digit value assigned to this Unicode character unichr as integer. If none is defined, returns None.

numeric

Returns the digit value assigned to this Unicode character as float. If none is defined, returns None.

category

Returns the general category assigned to this Unicode character as string.

bidirectional

Returns the bidirectional class assigned to this Unicode character as string. If none is defined, an empty string is returned..

combining

Returns the canonical combining class assigned to this Unicodev character as string. Returns 0 if no combining class is defined.

east_asian_width

Returns the east asian width assigned to this Unicode character as string.

mirrored

Returns 1 if the character has been identified as a 'mirrored' character in bidirectional text, 0 otherwise.

decomposition

Returns the character decomposition mapping assigned to the Unicode character unichr as string. An empty string is returned in case no such mapping is defined.