[ Index ]

PHP Cross Reference of phpBB 3.0 Beta 3

title

Body

[close]

/includes/utf/ -> utf_tools.php (summary)

(no description)

Copyright: (c) 2006 phpBB Group
License: http://opensource.org/licenses/gpl-license.php GNU Public License
Version: $Id: utf_tools.php,v 1.26 2006/11/12 14:29:32 naderman Exp $
File Size: 1012 lines (31 kb)
Included or required:0 times
Referenced: 0 times
Includes or requires: 0 files

Defines 28 functions

  utf8_encode()
  utf8_decode()
  utf8_strrpos()
  utf8_strrpos()
  utf8_strpos()
  utf8_strtolower()
  utf8_strtoupper()
  utf8_substr()
  utf8_strlen()
  utf8_strrpos()
  utf8_strpos()
  utf8_strtolower()
  utf8_strtoupper()
  utf8_substr()
  utf8_strlen()
  utf8_str_split()
  utf8_strspn()
  utf8_ucfirst()
  utf8_recode()
  utf8_encode_ncr()
  utf8_encode_ncr_callback()
  utf8_ord()
  utf8_chr()
  utf8_decode_ncr()
  utf8_decode_ncr_callback()
  utf8_case_fold()
  utf8_normalize_nfc()
  utf8_clean_string()

Functions
Functions that are not part of a class:

utf8_encode($str)   X-Ref
Implementation of PHP's native utf8_encode for people without XML support
This function exploits some nice things that ISO-8859-1 and UTF-8 have in common

param: string $str ISO-8859-1 encoded data
return: string UTF-8 encoded data

utf8_decode($str)   X-Ref
Implementation of PHP's native utf8_decode for people without XML support

param: string $string UTF-8 encoded data
return: string ISO-8859-1 encoded data

utf8_strrpos($str, $needle, $offset = null)   X-Ref
UTF-8 aware alternative to strrpos


utf8_strrpos($str, $needle, $offset = null)   X-Ref
UTF-8 aware alternative to strrpos


utf8_strpos($str, $needle, $offset = null)   X-Ref
UTF-8 aware alternative to strpos


utf8_strtolower($str)   X-Ref
UTF-8 aware alternative to strtolower


utf8_strtoupper($str)   X-Ref
UTF-8 aware alternative to strtoupper


utf8_substr($str, $offset, $length = null)   X-Ref
UTF-8 aware alternative to substr


utf8_strlen($text)   X-Ref
Return the length (in characters) of a UTF-8 string


utf8_strrpos($str, $needle, $offset = null)   X-Ref
UTF-8 aware alternative to strrpos
Find position of last occurrence of a char in a string

author: Harry Fuecks
param: string haystack
param: string needle
param: integer (optional) offset (from left)
return: mixed integer position or FALSE on failure

utf8_strpos($str, $needle, $offset = null)   X-Ref
UTF-8 aware alternative to strpos
Find position of first occurrence of a string

author: Harry Fuecks
param: string haystack
param: string needle
param: integer offset in characters (from left)
return: mixed integer position or FALSE on failure

utf8_strtolower($string)   X-Ref
UTF-8 aware alternative to strtolower
Make a string lowercase
Note: The concept of a characters "case" only exists is some alphabets
such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does
not exist in the Chinese alphabet, for example. See Unicode Standard
Annex #21: Case Mappings

param: string
return: string string in lowercase

utf8_strtoupper($string)   X-Ref
UTF-8 aware alternative to strtoupper
Make a string uppercase
Note: The concept of a characters "case" only exists is some alphabets
such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does
not exist in the Chinese alphabet, for example. See Unicode Standard
Annex #21: Case Mappings

param: string
return: string string in uppercase

utf8_substr($str, $offset, $length = NULL)   X-Ref
UTF-8 aware alternative to substr
Return part of a string given character offset (and optionally length)

Note arguments: comparied to substr - if offset or length are
not integers, this version will not complain but rather massages them
into an integer.

Note on returned values: substr documentation states false can be
returned in some cases (e.g. offset > string length)
mb_substr never returns false, it will return an empty string instead.
This adopts the mb_substr approach

Note on implementation: PCRE only supports repetitions of less than
65536, in order to accept up to MAXINT values for offset and length,
we'll repeat a group of 65535 characters when needed.

Note on implementation: calculating the number of characters in the
string is a relatively expensive operation, so we only carry it out when
necessary. It isn't necessary for +ve offsets and no specified length

author: Chris Smith<chris@jalakai.co.uk>
param: string
param: integer number of UTF-8 characters offset (from left)
param: integer (optional) length in UTF-8 characters from offset
return: mixed string or FALSE if failure

utf8_strlen($text)   X-Ref
Return the length (in characters) of a UTF-8 string

param: string    $text        UTF-8 string
return: integer                Length (in chars) of given string

utf8_str_split($str, $split_len = 1)   X-Ref
UTF-8 aware alternative to str_split
Convert a string to an array

author: Harry Fuecks
param: string UTF-8 encoded
param: int number to characters to split string by
return: string characters in string reverses

utf8_strspn($str, $mask, $start = null, $length = null)   X-Ref
UTF-8 aware alternative to strcspn
Find length of initial segment not matching mask

author: Harry Fuecks
param: string
return: int

utf8_ucfirst($str)   X-Ref
UTF-8 aware alternative to ucfirst
Make a string's first character uppercase

author: Harry Fuecks
param: string
return: string with first character as upper case (if applicable)

utf8_recode($string, $encoding)   X-Ref
Recode a string to UTF-8

If the encoding is not supported, the string is returned as-is

param: string    $string        Original string
param: string    $encoding    Original encoding (lowered)
return: string                The string, encoded in UTF-8

utf8_encode_ncr($text)   X-Ref
Replace all UTF-8 chars that are not in ASCII with their NCR

param: string    $text        UTF-8 string in NFC
return: string                ASCII string using NCRs for non-ASCII chars

utf8_encode_ncr_callback($m)   X-Ref
Callback used in encode_ncr()

Takes a UTF-8 char and replaces it with its NCR. Attention, $m is an array

param: array    $m            0-based numerically indexed array passed by preg_replace_callback()
return: string                A HTML NCR if the character is valid, or the original string otherwise

utf8_ord($chr)   X-Ref
Enter description here...

param: string $chr UTF-8 char
return: integer UNICODE code point

utf8_chr($cp)   X-Ref
Converts an NCR to a UTF-8 char

param: integer $cp UNICODE code point
return: string UTF-8 char

utf8_decode_ncr($text)   X-Ref
Convert Numeric Character References to UTF-8 chars

Notes:
- we do not convert NCRs recursively, if you pass &#38;#38; it will return &#38;
- we DO NOT check for the existence of the Unicode characters, therefore an entity
may be converted to an inexistent codepoint

param: string    $text        String to convert, encoded in UTF-8 (no normal form required)
return: string                UTF-8 string where NCRs have been replaced with the actual chars

utf8_decode_ncr_callback($m)   X-Ref
Callback used in decode_ncr()

Takes a NCR (in decimal or hexadecimal) and returns a UTF-8 char. Attention, $m is an array.
It will ignore most of invalid NCRs, but not all!

param: array    $m            0-based numerically indexed array passed by preg_replace_callback()
return: string                UTF-8 char

utf8_case_fold($text, $option = 'full')   X-Ref
Takes an array of ints representing the Unicode characters and returns
a UTF-8 string.

param: string $text text to be case folded
param: string $option determines how we will fold the cases
return: string case folded text

utf8_normalize_nfc($strings)   X-Ref
A wrapper function for the normalizer which takes care of including the class if required and modifies the passed strings
to be in NFC (Normalization Form Composition).

param: mixed    $strings Either an array of references to strings, a reference to an array of strings or a reference to a single string

utf8_clean_string($text)   X-Ref
This function is used to generate a "clean" version of a string.
Clean means that it is a case insensitive form (case folding) and that it is normalized (NFC).
Additionally a homographs of one character are transformed into one specific character (preferably ASCII
if it is an ASCII character).

Please be aware that if you change something within this function or within
functions used here you need to rebuild/update the username_clean column in the users table. And all other
columns that store a clean string otherwise you will break this functionality.

param: $text    An unclean string, mabye user input (has to be valid UTF-8!)
return: Cleaned up version of the input string



Generated: Wed Nov 22 00:35:05 2006 Cross-referenced by PHPXref 0.6