Archive for the ‘Common questions’ Category

Using the PHP Document Object Model (DOM) to get all page links

Wednesday, January 27th, 2010

Further to the article I wrote about parsing links from a html page, here is a more elegant and accurate solution to getting every link using the Document Object Model (DOM)

/**
 * @author Jay Gilford
 */

/**
 * get_links()
 * 
 * @param string $url
 * @return array
 */
function get_links($url) {
    
    // Create a new DOM Document to hold our webpage structure
    $xml = new DOMDocument();
    
    // Load the url's contents into the DOM (the @ supresses any errors from invalid XML)
    @$xml->loadHTMLFile($url);
    
    // Empty array to hold all links to return
    $links = array();
    
    //Loop through each  and  tag in the dom and add it to the link array
    foreach($xml->getElementsByTagName('a') as $link) {
        $links[] = array('url' => $link->getAttribute('href'), 'text' => $link->nodeValue);
    }
    
    //Return the links
    return $links;
}

The code above is clearly documented as to how it all works. To call the function simply use
$links = get_links('http://www.example.com');
changing the website link to the page you require the links off. You could also expand this code to give you further details for the links such as the no follow attributes and so forth

If you have any questions about this feel free to contact me as always

Also please note that this requires PHP 5 in order for you to be able to use the DOMDocument

Number to text converting PHP class

Monday, November 2nd, 2009

One thing that gets asked quite a bit on forums is how to convert a number into words in PHP, so I thought I’d write a small class that can do this
Number to text Class Download
Here is the code for the class

class num2text {
    private $_original = 0;
    private $_parsed_number_text = '';
    private $_single_nums = array(1 => 'One', 2 => 'Two', 3 => 'Three', 4 => 'Four', 5 => 'Five', 6 => 'Six', 7 => 'Seven', 8 => 'Eight', 9 =>
        'Nine', );

    private $_teen_nums = array(0 => 'Ten', 1 => 'Eleven', 2 => 'Twelve', 3 => 'Thirteen', 4 => 'Fourteen', 5 => 'Fifteen', 6 => 'Sixteen', 7 =>
        'Seventeen', 8 => 'Eighteen', 9 => 'Nineteen', );

    private $_tens_nums = array(2 => 'Twenty', 3 => 'Thirty', 4 => 'Forty', 5 => 'Fifty', 6 => 'Sixty', 7 => 'Seventy', 8 => 'Eighty', 9 =>
        'Ninety', );

    private $_chunks_nums = array(1 => 'Thousand', 2 => 'Million', 3 => 'Billion', 4 => 'Trillion', 5 => 'Quadrillion', 6 => 'Quintrillion', 7 =>
        'Sextillion', 8 => 'Septillion', 9 => 'Octillion', 9 => 'Nonillion', 9 => 'Decillion', );

    function __construct($number) {
        $this->_original = trim($number);
        $this->parse();
    }
    
    public function parse($new_number = NULL) {
        if($new_number !== NULL) {
            $this->_original = trim($new_number);
        }
        if($this->_original == 0) return 'Zero';
        
        $num = str_split($this->_original, 1);
        krsort($num);
        $chunks = array_chunk($num, 3);
        krsort($chunks);
        
        $final_num = array();
        foreach ($chunks as $k => $v) {
            ksort($v);
            $temp = trim($this->_parse_num(implode('', $v)));
            if($temp != '') {
                $final_num[$k] = $temp;
                if (isset($this->_chunks_nums[$k]) && $this->_chunks_nums[$k] != '') {
                    $final_num[$k] .= ' '.$this->_chunks_nums[$k];
                }
            }
        }
        $this->_parsed_number_text = implode(', ', $final_num);
        return $this->_parsed_number_text;
    }
    
    public function __toString() {
        return $this->_parsed_number_text;
    }

    private function _parse_num($num) {
        $temp = array();
        if (isset($num[2])) {
            if (isset($this->_single_nums[$num[2]])) {
                $temp['h'] = $this->_single_nums[$num[2]].' Hundred';
            }
        }

        if (isset($num[1])) {
            if ($num[1] == 1) {
                $temp['t'] = $this->_teen_nums[$num[0]];
            } else {
                if (isset($this->_tens_nums[$num[1]])) {
                    $temp['t'] = $this->_tens_nums[$num[1]];
                }
            }
        }
        
        if (!isset($num[1]) || $num[1] != 1) {
            if (isset($this->_single_nums[$num[0]])) {
                if (isset($temp['t'])) {
                    $temp['t'] .= ' '.$this->_single_nums[$num[0]];
                } else {
                    $temp['u'] = $this->_single_nums[$num[0]];
                }
            }
        }
        return implode(' and ', $temp);
    }
}

Using this class is as simple as using two lines of code such as

$n2s = new num2string('123456');
echo $n2s;

Should you wish to parse more than one string after this, you can use

echo $n2s->parse('87654321');

It really is that simple to use
Please be aware that in order for all numbers to work, you must enter them as strings

how to get all links from a web page

Monday, October 26th, 2009

A question that gets asked all the time on forums is “How do I get all links on a web page” inside of <a> tags, so here’s some code with full commenting for each line

/**
 * @author Jay Gilford
 */

// regular expression pattern to match all links on a page
$pattern = '%]+href="(?P[^"]+)"[^>*]*>(?P[^< ]+)%si';

// Webpage URL to get links from
$url = 'http://www.jaygilford.com/';

// Fetch contents of whole page
$page_content = file_get_contents($url);

// Get all matches of links and put them into the $matches variable
preg_match_all($pattern, $page_content, $matches);

// Variable to hold all of our urls and their text
$urls = array();

// Loop through each array item
foreach($matches['url'] as $k=>$v) {
    // combine the url and text into it's own key for ease of access
    $urls[$k] = array('url' => $v,'text' => $matches['text'][$k]);
}

// For display purposes only to show the contents of $urls
echo print_r($urls, true);

If you have any questions regarding this feel free to contact me. Details can be found on the about page