20070829

How to create a simple code checking process in PHP

The other day I was wondering exactly where a call to a function named "get_message" is used in all our php files and xsl files, it's easy to find using some advanced text editor like notepad++ and it's "search in files" functionality.

But, the problem was, I doesn't only need to know where are the calls, I need to know what parameter are used in each call, for example, get_message('HEADER') or get_message('KEYWORDS'), and afterwards I need to check the database to see if the given message (HEADER and KEYWORDS) is inserted there or not.

Of course, do that manually is a very, very, time consuming task, for that reason if prepared a code checking php that can save your day in similar problems.

First, let's create a structure:


function check_dir ($dir_name, $recursive)
{
$handle=opendir($dir_name);
if ($handle)
{
while (false!==($file=readdir($handle)))
{ if (is_file($dir_name.'/'.$file))
{// It's a file
if (substr($file, -3)=='php' && substr($file, 0, 1)!='.')
{// PHP file detected (the second condition avoids MAC temporary files to be checked)
check_php($dir_name.'/'.$file);
}
if (substr($file, -3)=='xsl' && substr($file, 0, 1)!='.')
{// XSL file detected (the second condition avoids MAC temporary files to be checked)
check_xsl($dir_name.'/'.$file);
}
}
}
if ($recursive)
{// The parameter tell us to look for subfolders
closedir($handle);
$handle=opendir($dir_name);
while (false!==($file=readdir($handle)))
{// Loop through the files
if (is_dir($dir_name.'/'.$file))
{// It's a folder
if ($file!='.' && $file!='..')
{// Check the next subfolder, avoiding the current and the parent folders
check_dir($dir_name.'/'.$file, $recursive);
}
}
}
}
closedir($handle);
}
else
{
echo 'I cannot open the folder '.$dir_name;
echo chr(13).chr(10).'
';
}
}



This function loops thought an initial folder and, if the second parameter is set to true, continues checking the directory hierarchy.

There are two functions to check the PHP files and the XSLs files, you can extend this checking to whatever kind of files you want.

Let's see the check_php function:


function check_php($file_path)
{
$file=fopen($file_path,'r');
while(!feof($file))
{// Loop until the end of the file
$line=fgets($file); // read line by line
if (preg_match('/get_message[ ]*\(\'([A-Z_]*)\'\)/', $line, $resultats)>0)
{
array_push($GLOBALS['messages_php'], $resultats[1]);
}
$GLOBALS['cont_lineas_php']++;
}
fclose($file);
}


As you can see in this simple but powerful code, we loop thought the php source file, line by line, and only whith a single if condition check if the line contains a get_message and take the parameter as the result of the evaluation of the regular expression.

This requires further explanation:

preg_match, takes a regular expression as first argument, then checks this regexp against the second argument ($line in the example) and put the resulting part of the evaluation (if any) in an array (the third parameter $resultats in our example). The function returns 0 if no matches where found or the number of matches instead.

The regular expression needs too a little bit of explanation, creating a regex from scratch is relatively simple, but it's a tricky language and very difficult to maintain:

- The regular expression starts with "/" and ends with "/" (first mistery solved)
- "get_message" this is a literal that we are looking for in the input
- "[ ]*" this means that we expect none or any number of whitespaces in the input just after the get_message array, this is for avoid the problem of undetecting some calls like "get_message ('blah')" or "get_message ('blah')" instead of the commonly used "get_message('blah')"
- \(\' and \'\) simply used to detect the string opening with (' and ending with ') the backslashes simply are the escape characters of the regular expressions.
- ([A-Z_]*) Thats the core of the regexp we are trying to create, simply indicates that we are looking for any combination of uppercase letters "A-Z" and underscores "_", the square brackets indicates the different permutations and the star means zero or more characters in the combination. Last but not least, the enclosing parenthesis makes the included combination a result of the regular expression evaluation, only the parenthesized part of the regexp can be taken after the evaluation as part of the result, that is the part that is returned to the $resultats array when finished.

The theory explained, let's look for a couple of examples:
$keywords=get_message ('KEYWORDS'); --> return 1 and $result is an array containing the value KEYWORDS
$description=get_message('DESCRIPTION'); --> return 1 and $result is an array containing the value DESCRIPTION
$hello_world=get_message('HELLO_WORLD'); --> return 1 and $result is an array containing the value HELLO_WORLD
$hola = 'hola'; --> return 0

Additionally we increase the global counter of php lines in all the application with the line $GLOBALS['cont_lineas_php']++;

The check_xsl function is very similar to the check_php function only looking for a different regular expression and updates other variables:


function check_xsl($file_path)
{
$file=fopen($file_path,'r');
while(!feof($file))
{
$line=fgets($file);
if (preg_match('/[ ]*([A-Z_]*)0)
{
if (!stripos($line, ' {
array_push($GLOBALS['messages_xsl'], $resultats[1]);
}
}
$GLOBALS['cont_lineas_xsl']++;
}
fclose($file);
}



Now let's look for the part where we call the functions:


check_dir($_SERVER['DOCUMENT_ROOT'], false);
check_dir($_SERVER['DOCUMENT_ROOT'].'/actions/', true);
check_dir($_SERVER['DOCUMENT_ROOT'].'/admin/', true);
check_dir($_SERVER['DOCUMENT_ROOT'].'/ajax/', true);
check_dir($_SERVER['DOCUMENT_ROOT'].'/batchs/', true);
check_dir($_SERVER['DOCUMENT_ROOT'].'/captcha/', true);
check_dir($_SERVER['DOCUMENT_ROOT'].'/conf/', true);
check_dir($_SERVER['DOCUMENT_ROOT'].'/models/', true);


check_dir($_SERVER['DOCUMENT_ROOT'].'/rss/', true);
check_dir($_SERVER['DOCUMENT_ROOT'].'/styles/', true);
check_dir($_SERVER['DOCUMENT_ROOT'].'/utils/', true);
check_dir($_SERVER['DOCUMENT_ROOT'].'/views/', true);

echo 'PHP LINES: '.$GLOBALS['cont_lineas_php'];
echo chr(13).chr(10).'
';
echo 'XSL LINES: '.$GLOBALS['cont_lineas_xsl'];
echo chr(13).chr(10).'
';
echo chr(13).chr(10).'
';


This calls simply check all the directories I needed to check and output the number of lines of php and xsl that we have.

Finally we want to do a more sophisticated analysis with the data we have collected, we want to look if all the messages collected in
the arrays $GLOBALS['messages_xsl'] and $GLOBALS['messages_php'] really exists in the database and report if there are some inexistent (which is an error and must be solved)


$unics=array_unique(array_merge($GLOBALS['messages_php'], $GLOBALS['messages_xsl']));
$no_existents=array_filter($unics, 'not_exist_message');
echo 'NONEXISTING MESSAGES IN THE DB: '.count($no_existents);
echo chr(13).chr(10).'
';
echo implode($no_existents, ', ');
echo chr(13).chr(10).'
';
echo chr(13).chr(10).'
';

echo 'UNIQUE MESSAGES IN GENERAL: '.count($unics);
echo chr(13).chr(10).'
';
echo implode($unics, ', ');
echo chr(13).chr(10).'
';
echo chr(13).chr(10).'
';


The first line merges the two arrays with array_merge, then generates an unique array and put it into $unics variable.

The second line filters the $unics array with a custom function that checks the database for this message in particular, here is the function:


function not_exist_message ($value)
{
return(!exist_message($value));
}

function exist_message ($value)
{
$sql='select count(*) num
from messages m
where codi="'.$value.'"
';

$dbh=get_db_handler();
$result = mysql_query($sql, $dbh);
if(!$result)
{
return false;
}
$row = mysql_fetch_array($result, MYSQL_ASSOC);
return($row['num']>0);
}


The array_filter, runs the function in the second parameter thought each element of the array, and the resulting array contains only the elements that the function evaluates as true. The function exist_message simply executes an sql statement that counts the number of occurrences of the given message in the database.

That's all, for completeness here is the initial part of the declaration of variables and includes and so on:


require_once($_SERVER['DOCUMENT_ROOT'].'/conf/ompDB.php');

echo 'Code control report:';
echo chr(13).chr(10).'
';

$GLOBALS['cont_lineas_php']=0;
$GLOBALS['messages_php']=array();
$GLOBALS['cont_lineas_xsl']=0;
$GLOBALS['messages_xsl']=array();


And a sample of output generated by the execution of the php file:


Code control report:
PHP LINES: 7886
XSL LINES: 992

NONEXISTING MESSAGES IN THE DB: 28
HEADER_CENTRAL_EDITABLE_POSTS_CLEAR, STATIC_HEADER_HOW_IT_WORKS, STATIC_CONTENT_HOW_IT_WORKS, HEADER_MESSAGE_ADD_POST, STATIC_HEADER_FOLLOW_OFFERS, STATIC_CONTENT_FOLLOW_OFFERS, EXPLICATION_RSS_LINK_CITY, EXPLICATION_RSS_LINK_CATEGORY, SECURITY_ERROR_HEADER, SECURITY_ERROR, SECURITY_ERROR_NOT_EDITABLE, EDIT_YOUR_POST, POST_UPDATED, POST_UPDATED_LONG, POST_UPDATE_ERROR, POST_UPDATE_ERROR_LONG, HEADER_POST_SAVED_ERROR, ERROR_POST_UNKNOWN, ERROR_SITE_UNKNOWN, SUBJECT_SEND_TO_A_FRIEND, PREBODY_SEND_TO_A_FRIEND, POSTBODY_SEND_TO_A_FRIEND, MESSAGE_SENT_SUCCESFULLY, MESSAGE_SENT_ERROR, EXPLICATION_RSS, EXPLICATION_CITIES_RSS, EXPLICATION_CATE_RSS, EDITABLE_POST_LINK

UNIQUE MESSAGES IN GENERAL: 123
LANGUAGE_NOT_SUPPORTED, LANGUAGE_CHANGED, TITLE, ADD_POST_SINGLE, ADD_POST, EDITABLE_POSTS, EDITABLE_POSTS_CLEAR, HEADER_CENTRAL_EDITABLE_POSTS_CLEAR, HEADER_SEARCH, SEARCH_ERROR, STATIC_HEADER_WHO_WE_ARE, STATIC_CONTENT_WHO_WE_ARE, STATIC_HEADER_CONDITIONS, STATIC_CONTENT_CONDITIONS, STATIC_HEADER_WHAT_IS, STATIC_CONTENT_WHAT_IS, STATIC_HEADER_HOW_IT_WORKS, STATIC_CONTENT_HOW_IT_WORKS, DONE, CATEGORY_NEEDED, TITLE_NEEDED, DESCRIPTION_NEEDED, CONTACT_NEEDED, SECURITY_CODE_NEEDED, HEADER_MESSAGE_ADD_POST, INSERT_YOUR_POST, POST_SAVED, POST_SAVED_LONG, POST_SAVE_ERROR, POST_SAVE_ERROR_LONG, HEADER_CENTRAL_ADD_POST, STATIC_HEADER_FOLLOW_OFFERS, STATIC_CONTENT_FOLLOW_OFFERS, EXPLICATION_RSS_LINK_CITY, EXPLICATION_RSS_LINK_CATEGORY, HEADER_CENTRAL_EDIT_POST, SECURITY_ERROR_HEADER, SECURITY_ERROR, SECURITY_ERROR_NOT_EDITABLE, EDIT_YOUR_POST, POST_UPDATED, POST_UPDATED_LONG, POST_UPDATE_ERROR, POST_UPDATE_ERROR_LONG, HEADER_CENTRAL_EDITABLE_POSTS, HEADER_CENTRAL_HOME, HEADER_CENTRAL_POST, HEADER_POST_SAVED, HEADER_POST_SAVED_ERROR, GO_TO, HEADER_CENTRAL_CATEGORIES, REPORT_POSTED, REPORT_NOT_POSTED, FLAG_SHORT, MOTIVES, REPORT, SEND_SHORT, NAME_EMAIL, FRIEND_EMAIL, MESSAGE_EMAIL, SUBMIT_EMAIL, ACTIVE_POSTS, LAST_ACTIVE_POST, CITY, POST_DETAIL_FLAG_INCORRECT, POST_DETAIL_FLAG_CORRECT, ERROR_POST_UNKNOWN, ERROR_NAME_EMAIL, ERROR_FRIEND_EMAIL, ERROR_MESSAGE_EMAIL, ERROR_SITE_UNKNOWN, SUBJECT_SEND_TO_A_FRIEND, PREBODY_SEND_TO_A_FRIEND, POSTBODY_SEND_TO_A_FRIEND, MESSAGE_SENT_SUCCESFULLY, MESSAGE_SENT_ERROR, TIME_DAYS, TIME_DAY, TIME_HOURS, TIME_HOUR, TIME_MINUTES, TIME_MINUTE, TIME_FEW, SEARCH_BOX_TEXT, BUTTON_SEARCH, HEADER_CITIES, HEADER_AD, GLOBAL_SELECT_LANG, GLOBAL_SELECT_CITY, STATIC_WHO_WE_ARE, STATIC_BLOG, STATIC_CONDITIONS, STATIC_WHAT_IS_IPSO, HEADER_CATEGORIES, HOME_SELECT_CITY, HOME_CITIES, HOME_LANGUAGES, HEADER_TITLE_HOME_OFFER, VIEW_DETAIL, POST_CATEGORY, POSTED, NO_POST_IN_SEARCH, POST_TITLE, POST_TITLE_EXPLAIN, POST_CATEGORY_EXPLAIN, POST_DESCRIPTION, POST_DESCRIPTION_EXPLAIN, POST_CONTACT_CONDITIONS, POST_CONTACT_CONDITIONS_EXPLAIN, CAPTCHA, CAPTCHA_EXPLAIN, POST_SUBMIT_BUTTON, EXPLICATION_RSS, EXPLICATION_CITIES_RSS, EXPLICATION_CATE_RSS, EDITABLE_POST_LINK, NO_POST_IN_CATEGORY, RELATED_POSTS, FLAG_LONG, SEND_LONG, VIEWED, TIMES, CATEGORIES


Ways to improve:
- We must upgrade the code count of lines avoiding whitespace-only lines and commented lines.

That's all folks