Using the PHP Document Object Model (DOM) to get all page links
Further to the article I wrote about parsing links from a html page, here is a more elegant and accurate solution to getting every link using the Document Object Model (DOM)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | /** * @author Jay Gilford */ /** * get_links() * * @param string $url * @return array */ function get_links($url) { // Create a new DOM Document to hold our webpage structure $xml = new DOMDocument(); // Load the url's contents into the DOM (the @ supresses any errors from invalid XML) @$xml->loadHTMLFile($url); // Empty array to hold all links to return $links = array(); //Loop through each <a> tag in the dom and add it to the link array foreach($xml->getElementsByTagName('a') as $link) { $links[] = array('url' => $link->getAttribute('href'), 'text' => $link->nodeValue); } //Return the links return $links; } |
The code above is clearly documented as to how it all works. To call the function simply use
$links = get_links('http://www.example.com');
changing the website link to the page you require the links off. You could also expand this code to give you further details for the links such as the no follow attributes and so forth
If you have any questions about this feel free to contact me as always
Also please note that this requires PHP 5 in order for you to be able to use the DOMDocument
Tags: Document Object Model, DOM, DOMDocument, Friends, PHP