how to get all links from a web page
A question that gets asked all the time on forums is “How do I get all links on a web page” inside of <a> tags, so here’s some code with full commenting for each line
/** * @author Jay Gilford */ // regular expression pattern to match all links on a page $pattern = '%]+href="(?P[^"]+)"[^>*]*>(?P %si'; // Webpage URL to get links from $url = 'http://www.jaygilford.com/'; // Fetch contents of whole page $page_content = file_get_contents($url); // Get all matches of links and put them into the $matches variable preg_match_all($pattern, $page_content, $matches); // Variable to hold all of our urls and their text $urls = array(); // Loop through each array item foreach($matches['url'] as $k=>$v) { // combine the url and text into it's own key for ease of access $urls[$k] = array('url' => $v,'text' => $matches['text'][$k]); } // For display purposes only to show the contents of $urls echo print_r($urls, true);[^< ]+)
If you have any questions regarding this feel free to contact me. Details can be found on the about page
January 6th, 2011 at 8:47 am
I would like to know, how can we fetch Nofollow links with the same script. is it possible to do this. i need to get the whole information of a URL.
Rodger
January 6th, 2011 at 11:37 am
Hi Rodger
You would be better off using the dom to do this, take a look at this article
http://www.jaygilford.com/php/php-dom-get-all-pagelinks/
It should be clear how to get the nofollow links from that
Jay