how to get all links from a web page

October 26th, 2009

A question that gets asked all the time on forums is “How do I get all links on a web page” inside of <a> tags, so here’s some code with full commenting for each line

/**
 * @author Jay Gilford
 */

// regular expression pattern to match all links on a page
$pattern = '%]+href="(?P[^"]+)"[^>*]*>(?P[^< ]+)%si';

// Webpage URL to get links from
$url = 'http://www.jaygilford.com/';

// Fetch contents of whole page
$page_content = file_get_contents($url);

// Get all matches of links and put them into the $matches variable
preg_match_all($pattern, $page_content, $matches);

// Variable to hold all of our urls and their text
$urls = array();

// Loop through each array item
foreach($matches['url'] as $k=>$v) {
    // combine the url and text into it's own key for ease of access
    $urls[$k] = array('url' => $v,'text' => $matches['text'][$k]);
}

// For display purposes only to show the contents of $urls
echo print_r($urls, true);

If you have any questions regarding this feel free to contact me. Details can be found on the about page

Setting up virtual hosts in apache on a windows machine

February 21st, 2009

Navigate to your apache folder and find the conf folder (for WAMP this is C:\wamp\bin\apache\apache{version number here}\conf\ by default, for xampp this is C:\xampp\apache\conf\ by default)

Next open the httpd.conf file, and search for the line that has Include conf/extra/httpd-vhosts.conf on it. If there is a # before the line, delete it leaving only Include conf/extra/httpd-vhosts.conf. Save the file and exit it
Next, open the folder extra in the conf folder, and open the file httpd-vhosts.conf
You can delete the contents of the file, and replace with the following

NameVirtualHost 127.0.0.1

	DocumentRoot "C:\wamp\www"
	ServerName localhost
    
        Options FollowSymLinks
        AllowOverride All
    

For my example above, I have used the WAMP default document root for the localhost (C:\wamp\www), as it is set by default. You can change the paths to any directory you like, making sure that both match each other. Now you will need to create a custom virtual host. Supposing you want to make the host myhost, so that typing in http://myhost/ brings up a new document root by default, you would add the following to the file after the above code


	DocumentRoot "C:\virtualhosts\myhost"
	ServerName myhost
    
        Options FollowSymLinks
        AllowOverride All
    

Note that there are three parts that are different to the first one. The first two differences are the folders (in this case I’ve set them to C:\virtualhosts\myhost). The second difference is the ServerName which is myhost

Now we need to make a final change to our hosts file. The location of the hosts file is dependant of your operating system. A full list of these can be found at http://en.wikipedia.org/wiki/Hosts_file#Location_and_default_content

Most windows users will find it in %SystemRoot%\system32\drivers\etc\
Right click the file, and click properties. Make sure that the Read-only checkbox in the attributes is unticked and click ok. Then open the file for editing (using notepad is generally the easiest method)

We will need to add our new myhost to this file to allow the computer to map the name to an ip. Since the IP will be our local ip (127.0.0.1) we simply add
127.0.0.1 myhost to the end of the hosts file on a new line (the gap between the ip and the myhost is a tab by the way). Save this file, close, and restart your apache server

You should now be able to go into your browser and type in myhost, and the C:\virtualhosts\myhost folder will run like your normal document root does

If you have any issues in getting this to work, feel free to contact me

Completely customisable PHP pagination class

January 12th, 2009

If you need to paginate your database results quickly and reliably then this could be the class for you. It allows you complete access to all attributes of the pagination, from the link templates to the results padding, and auto querying.

Simple pagination class download


/*******************************************************************************
*                                  Pagination class                            *
*                             Created: 12th January 2009                       *
*                             Updated: 16th Octember 2011                      *
*                         ©Copyright Jay Gilford 2009 - 2011                   *
*                              http://www.jaygilford.com                       *
*                            email: jay [at] jaygilford.com                    *
*******************************************************************************/
 
class pagination
{
    ################################
    # PRIVATE VARS - DO NOT ALTER  #
    ################################
    private $_query = '';
    private $_current_page = 1;
    private $_padding = 2;
    private $_results_resource;
    private $_output;
 
    ################################
    #       RESULTS VARS           #
    ################################
    public $results_per_page = 10;          #Number of results to display at a time
    public $total_results = 0;              #Total number of records
    public $total_pages = 0;                #Total number of pages
 
    public $link_prefix = '/?page=';        #String for link to go before the page number
    public $link_suffix = '';               #String for link to go after the page number
    public $page_nums_separator = ' | ';    #String to go between the page number links
 
    ################################
    #      ERROR HOLDING VAR       #
    ################################
    public $error = null;
 
    ################################
    # PAGINATION TEMPLATE DEFAULTS #
    ################################
    public $tpl_first = '&laquo; | ';
    public $tpl_last = ' | &raquo; ';
 
    public $tpl_prev = '&lsaquo; | ';
    public $tpl_next = ' | &rsaquo; ';
 
    public $tpl_page_nums = '{page}';
    public $tpl_cur_page_num = '{page}';
 
    /**
     * In the above templates {link} is where the link will be inserted and {page} is
     * where the page numbers will be inserted. Other than that, you can modify them
     * as you please
     *
     * NOTE: You should have a separator of some sort at the right of $tpl_first and
     * $tpl_prev as above in the defaults, and also have a separator of some sort
     * before the $tpl_next and $tpl_last templates
     **/
 
 
    ##################################################################################
 
 
    public function __construct($page, $query)
    {
        #Check page number is a positive integer greater than 0 and assign it to $this->_current_page
        if ((int)$page > 0)
            $this->_current_page = (int)$page;
 
        #Remove any LIMIT clauses in the query string and set if
        $query = trim(preg_replace('/[\s]+LIMIT[\s]+\d+([\s,]*,[^\d]*\d+)?/i', '', $query));
        if (empty($query)) {
            return false;
        } else {
            $this->_query = $query;
        }
    }
 
    /**
     * pagination::paginate()
     *
     * Processes all values and query strings and if successful
     * returns a string of html text for use with pagination bar
     *
     * @return string;
     */
    public function paginate()
    {
        $output = '';
 
        #########################################
        # GET TOTAL NUMBER OF RESULTS AND PAGES #
        #########################################
        $result = mysql_query($this->_query);
        if (!$result) {
            $this->error = __line__ . ' - ' . mysql_error();
            return false;
        }
        $this->total_results = mysql_num_rows($result);
        $this->total_pages = ceil($this->total_results / $this->results_per_page);
 
        ########################
        # FREE RESULT RESOURCE #
        ########################
 
        ################################
        # IF TOTAL PAGES <= 1 RETURN 1 #
        ################################
        if ($this->total_pages <= 1)
        {
        	$this->_results_resource = $result;
			$this->_output = '1';
			return $this->_output;
        }
 
        mysql_free_result($result);
 
        ###################################################
        # CHECK CURRENT PAGE ISN'T GREATER THAN MAX PAGES #
        ###################################################
        if ($this->_current_page > $this->total_pages)
            $this->_current_page = $this->total_pages;
 
        ######################################
        # SET FIRST AND LAST PAGE VALUES AND #
        # ERROR CHECK AGAINST INVALID VALUES #
        ######################################
        $start = ($this->_current_page - $this->_padding > 0) ? $this->_current_page - $this->
            _padding : '1';
        $finish = ($this->_current_page + $this->_padding <= $this->total_pages) ? $this->
            _current_page + $this->_padding : $this->total_pages;
 
        ###########################################
        # CREATE LIMIT CLAUSE AND ASSIGN TO QUERY #
        ###########################################
        $limit = ' LIMIT ' . ($this->results_per_page * ($this->_current_page - 1)) .
            ',' . $this->results_per_page;
        $query = $this->_query . $limit;
 
        #############################################
        # RUN QUERY AND ASSIGN TO $_result_resource #
        #############################################
        $result = mysql_query($query);
        if ($result === false) {
            $this->error = __line__ . ' - ' . mysql_error();
            return false;
        }
        $this->_results_resource = $result;
 
        ###########################################
        # ADD FIRST TO OUTPUT IF CURRENT PAGE > 1 #
        ###########################################
        if ($this->_current_page > 1) {
            $output .= preg_replace('/\{link\}/i', 'href="' . $this->link_prefix . '1' . $this->
                link_suffix . '"', $this->tpl_first);
        }
 
        ##########################################
        # ADD PREV TO OUTPUT IF CURRENT PAGE > 1 #
        ##########################################
        if ($this->_current_page > 1) {
            $output .= preg_replace('/\{link\}/i', 'href="' . $this->link_prefix . ($this->
                _current_page - 1) . $this->link_suffix . '"', $this->tpl_prev);
        }
 
        ################################################
        # GET LIST OF LINKED NUMBERS AND ADD TO OUTPUT #
        ################################################
        $nums = array();
        for ($i = $start; $i <= $finish; $i++) {
            if ($i == $this->_current_page) {
                $nums[] = preg_replace('/\{page\}/i', $i, $this->tpl_cur_page_num);
            } else {
                $patterns = array('/\{link\}/i', '/\{page\}/i');
                $replaces = array('href="' . $this->link_prefix . $i . $this->link_suffix . '"', $i);
                $nums[] = preg_replace($patterns, $replaces, $this->tpl_page_nums);
            }
        }
        $output .= implode($this->page_nums_separator, $nums);
 
        ##################################################
        # ADD NEXT TO OUTPUT IF CURRENT PAGE < MAX PAGES #
        ##################################################
        if ($this->_current_page < $this->total_pages) {
            $output .= preg_replace('/\{link\}/i', 'href="' . $this->link_prefix . ($this->
                _current_page + 1) . $this->link_suffix . '"', $this->tpl_next);
        }
 
        ############################################
        # ADD LAST TO OUTPUT IF FINISH < MAX PAGES #
        ############################################
        if ($this->_current_page < $finish) {
            $output .= preg_replace('/\{link\}/i', 'href="' . $this->link_prefix . $this->total_pages . $this->link_suffix . '"', $this->
                tpl_last);
        }
 
        $this->_output = $output;
        return $output;
    }
 
 
    /**
     * pagination::padding()
     *
     * Sets the padding for the pagination string
     *
     * @param int $val
     * @return bool
     */
    public function padding($val)
    {
        if ((int)$val < 1)
            return false;
 
        $this->_padding = (int)$val;
        return true;
    }
 
 
    /**
     * pagination::resource()
     *
     * Returns the resource of the results query
     *
     * @return resource
     */
    function resource()
    {
        return $this->_results_resource;
    }
 
 
    /**
     * pagination::__tostring()
     * returns the last pagination output
     *
     * @return string
     */
    function __tostring()
    {
        if (trim($this->_output)) {
            return trim($this->_output);
        }else{
        	return '';
        }
    }
}

Instructions on class usage:
To create an instance of the class, call it with your current page and the query you want to run
$paginator = new pagination($page_num_variable, 'SELECT * FROM `table_name`');
After that, you need to set up any of the parameters for the class such as
$paginator->results_per_page = 15;
$paginator->padding(5);
$paginator->link_prefix = '/results/page/';
$paginator->link_suffix = '/';
$paginator->page_nums_separator = ' -=- ';

Once you have done that, you can call the paginate method
NOTE: you must be connected to your database in order to run this method as it uses the mysql_query function.
The paginate method returns the string of html for you to insert and sets the $pagination->resource() to be the resource id of the query that is run for the pagination

You can use either the returned paginate() string or echo the variable name of the class and it will generate the pagination
So either$page_links = $paginator->paginate();
And then
echo $page_links; wherever you want the links to go.

Alternatively you can use
$paginator->paginate();
and then
echo $paginator;

When you are using mysql_fetch_array() or mysql_fetch_assoc to get your data from your database, you simply need to replace your old resource handle with $paginator->resource();
Example:

while($row = mysql_fetch_assoc($paginator->resource()))
{
    echo $row['field_name'];
}

If you have any questions, bugs or suggestions regarding this class, feel free to contact me either by comment or via one of the contact methods in the contact section

Jay

Gracefully handling errors in php using advanced techniques

October 11th, 2008

There are a few ways in which you can handle errors in PHP. You can do the not so smart thing and just turn them off altogether using

ini_set('error_reporting',0);
//or
error_reporting(0);

However this is not a good idea, and should never happen. If you want to hide all of your errors, you can set it so that you have them hidden, but log them to a file instead

ini_set('display_errors','off');
ini_set('log_errors','on');

That will then log all of your errors to your server error log by default. If you wish to set the log to a different file you can do so by using

ini_set('error_log','/path/to/your/error.log');

and that will log all of your errors to error.log in the /path/to/your/ directory.

Another option is to use a custom function to handle all errors. This can be useful should you wish to log all of your errors to a database so that you can use queries to view errors instead. The way that we tell PHP that we want to handle the errors ourselves is to use the set_error_handler() function and pass the name of the function we want to handle the errors.

set_error_handler('custom_error_func');

Then we make a function to handle the errors. For example if you wanted to add the error to your database table it would look something like this

function custom_error_func($errno, $errmsg, $filename, $line, $context)
{
    // Create query - NOTE: This is using the mressf() function that I created available at
    // http://www.jaygilford.com/php/sprintf-and-mysql_real_escape_string-all-in-one-function/
    $query = mressf("INSERT INTO `errors` VALUES(NULL,'%s','%s','%s','%s','%s')",
                    $errno,
                    $errmsg,
                    $filename,
                    $line,
                    $context);
    //run query
    mysql_query($query);
}

If you wish to have any of the above ini_set() functionalities set permanently on your server, you may be able to if you have access to either the php.ini file or your .htaccess allows you to make php changes you can do so quite simply. For the php.ini file, simply search the file for display_errors for example and modify the value in there. If you are using .htaccess, you may be able to use the php_flag or php_value settings to modify the data. For more information on setting values this way, see here

sprintf and mysql_real_escape_string all in one function

October 8th, 2008

Well as many php developers will know, there is the arduous task of having to sanitize all of your data before actually being able to add it to your queries for running in MySQL. So I decided to make a small function that would basically be a clone of the sprintf function, with the added bonus of running the mysql_real_escape_string on all of the arguments passed to it

function mressf()
{
    $args = func_get_args();
    if (count($args) < 2)
        return false;
    $query = array_shift($args);
    $args = array_map('mysql_real_escape_string', $args);
    array_unshift($args, $query);
    $query = call_user_func_array('sprintf', $args);
    return $query;
}

You would then call it as you would with your regular sprintf function such as

echo mressf("SELECT * FROM `table` WHERE `name` = '%s' AND `password` = '%s'", 'username', 'pass');

which would return

SELECT * FROM `table` WHERE `name` = 'username' AND `password` = 'pass'