\LinHUniX\Gfx\Componenthtml2text

Takes HTML and converts it to formatted, plain text.

Thanks to Alexander Krug (http://www.krugar.de/) to pointing out and correcting an error in the regexp search array. Fixed 7/30/03.

Updated set_html() function's file reading mechanism, 9/25/03.

Thanks to Joss Sanglier (http://www.dancingbear.co.uk/) for adding several more HTML entity codes to the $search and $replace arrays. Updated 11/7/03.

Thanks to Darius Kasperavicius (http://www.dar.dar.lt/) for suggesting the addition of $allowed_tags and its supporting function (which I slightly modified). Updated 3/12/04.

Thanks to Justin Dearing for pointing out that a replacement for the

tag was missing, and suggesting an appropriate fix. Updated 8/25/04. Thanks to Mathieu Collas (http://www.myefarm.com/) for finding a display/formatting bug in the _build_link_list() function: email readers would show the left bracket and number ("[1") as part of the rendered email address. Updated 12/16/04. Thanks to Wojciech Bajon (http://histeria.pl/) for submitting code to handle relative links, which I hadn't considered. I modified his code a bit to handle normal HTTP links and MAILTO links. Also for suggesting three additional HTML entity codes to search for. Updated 03/02/05. Thanks to Jacob Chandler for pointing out another link condition for the _build_link_list() function: "https". Updated 04/06/05. Thanks to Marc Bertrand (http://www.dresdensky.com/) for suggesting a revision to the word wrapping functionality; if you specify a $width of 0 or less, word wrapping will be ignored. Updated 11/02/06. *** Big housecleaning updates below: Thanks to Colin Brown (http://www.sparkdriver.co.uk/) for suggesting the fix to handle and blank lines (whitespace). Christian Basedau (http://www.movetheweb.de/) also suggested the blank lines fix. Special thanks to Marcus Bointon (http://www.synchromedia.co.uk/), Christian Basedau, Norbert Laposa (http://ln5.co.uk/), Bas van de Weijer, and Marijn van Butselaar for pointing out my glaring error in the handling. Marcus also supplied a host of fixes. Thanks to Jeffrey Silverman (http://www.newtnotes.com/) for pointing out that extra spaces should be compressed--a problem addressed with Marcus Bointon's fixes but that I had not yet incorporated. Thanks to Daniel Schledermann (http://www.typoconsult.dk/) for suggesting a valuable fix with tag handling. Thanks to Wojciech Bajon (again!) for suggesting fixes and additions, including the tag handling that Daniel Schledermann pointed out but that I had not yet incorporated. I haven't (yet) incorporated all of Wojciech's changes, though I may at some future time. *** End of the housecleaning updates. Updated 08/08/07.

Summary

Methods
Properties
Constants
html2text()
search_replace_cb_H1toH3()
search_replace_cb_H4toH6()
search_replace_cb_B()
search_replace_cb_STRONG()
search_replace_cb_A_HREF()
search_replace_cb_TH()
set_html()
get_text()
print_text()
p()
set_allowed_tags()
set_base_url()
_convert()
_build_link_list()
$html
$text
$width
$search
$replace
$allowed_tags
$url
$_converted
$_link_list
$_link_count
No constants found
No protected methods found
No protected properties found
N/A
No private methods found
No private properties found
N/A

Properties

$html

$html : 

Contains the HTML content to convert.

@var string

Type

$text

$text : 

Contains the converted, formatted text.

@var string

Type

$width

$width : integer

Maximum width of the formatted text, in columns.

Set this value to 0 (or less) to ignore word wrapping and not constrain text to a fixed-width column.

Type

integer

$search

$search : 

List of preg* regular expression patterns to search for, used in conjunction with $replace.

@var array

Type

$replace

$replace : 

List of pattern replacements corresponding to patterns searched.

@var array

Type

$allowed_tags

$allowed_tags : 

Contains a list of HTML tags to allow in the resulting text.

@var string

Type

$url

$url : 

Contains the base URL that relative links should resolve to.

@var string

Type

$_converted

$_converted : 

Indicates whether content in the $html variable has been converted yet.

@var bool

Type

$_link_list

$_link_list : 

Contains URL addresses from links to be rendered in plain text.

@var string

Type

$_link_count

$_link_count : 

Number of valid links detected in the text, used for plain text display (rendered similar to footnotes).

@var int

Type

Methods

html2text()

html2text(string  $source = '',   $from_file = false) 

Constructor.

If the HTML source string (or file) is supplied, the class will instantiate with that source propagated, all that has to be done it to call get_text().

Parameters

string $source

HTML content @param bool $from_file Indicates $source is a file to pull content from

$from_file

search_replace_cb_H1toH3()

search_replace_cb_H1toH3(string  $text) : string

search_replace_cb_H1toH3 function.

Parameters

string $text

Returns

string —

convert

search_replace_cb_H4toH6()

search_replace_cb_H4toH6(string  $text) : string

search_replace_cb_H4toH6 function.

Parameters

string $text

Returns

string —

convert

search_replace_cb_B()

search_replace_cb_B(string  $text) : string

search_replace_cb_B function.

Parameters

string $text

Returns

string —

convert

search_replace_cb_STRONG()

search_replace_cb_STRONG(string  $text) : string

search_replace_cb_STRONG function.

Parameters

string $text

Returns

string —

convert

search_replace_cb_A_HREF()

search_replace_cb_A_HREF(string  $text) : string

search_replace_cb_A_HREF function.

Parameters

string $text

Returns

string —

convert

search_replace_cb_TH()

search_replace_cb_TH(string  $text) : string

search_replace_cb_TH function.

Parameters

string $text

Returns

string —

convert

set_html()

set_html(  $source, boolean  $from_file = false) 

Loads source HTML into memory, either from $source string or a file.

@param string $source HTML content

Parameters

$source
boolean $from_file

Indicates $source is a file to pull content from

get_text()

get_text() 

Returns the text, converted from HTML.

@return string

print_text()

print_text() 

Prints the text, converted from HTML.

p()

p() 

Alias to print_text(), operates identically.

@see print_text()

set_allowed_tags()

set_allowed_tags(  $allowed_tags = '') 

Sets the allowed HTML tags to pass through to the resulting text.

Tags should be in the form "

", with no corresponding closing tag.

Parameters

$allowed_tags

set_base_url()

set_base_url(  $url = '') 

Sets a base URL to handle relative links.

Parameters

$url

_convert()

_convert() 

Workhorse function that does actual conversion.

First performs custom tag replacement specified by $search and $replace arrays. Then strips any remaining HTML tags, reduces whitespace and newlines to a readable format, and word wraps the text to $width characters.

_build_link_list()

_build_link_list(string  $link,   $display) 

Helper function called by preg_replace() on link replacement.

Maintains an internal list of links to be displayed at the end of the text, with numeric indices to the original point in the text they appeared. Also makes an effort at identifying and handling absolute and relative links.

Parameters

string $link

URL of the link @param string $display Part of the text to associate number with

@return string

$display