Matthew Steven Kelly

Massachusetts Data Breach Protection Law

February24

Massachusetts is enacting a data protection law in response to a significant rise in electronic data breaches. The law affects any company that stores personal information of a Massachusetts resident. The four page law can be read here: http://www.mass.gov/Eoca/docs/idtheft/201CMR1700reg.pdf

Companies will be required to develop, implement and maintain a comprehensive information security program that is written and contains administrative, technical and physical safeguards appropriate to safeguard the data.

Every comprehensive information security program shall include:

(1) Administrative Safeguards:

  1. Designating one or more employees to maintain the comprehensive information security program
  2. Taking reasonable steps to select and retain third-party service providers that are capable of maintaining appropriate security measures to protect such personal information consistent with these regulations and any applicable federal regulations
  3. Reviewing the scope of the security measures at least annually or whenever there is a material change in business practices that may reasonably implicate the security or integrity of records containing personal information
  4. Developing security policies for employees relating to the storage, access and transportation of records containing personal information outside of business premises
  5. Providing ongoing employee training that educates the employees on the proper use of the computer security system and the importance of personal information security.
  6. Documenting responsive actions taken in connection with any incident involving a breach of security, and mandatory post-incident review of events and actions taken, if any, to make changes in business practices relating to protection of personal information.

(2) Technical Safeguards:

  1. Preventing terminated employees from accessing records containing personal information
  2. Providing means for detecting and preventing security system failures by regular monitoring to ensure that the comprehensive information security program is operating in a manner reasonably calculated to prevent unauthorized access to or unauthorized use of personal information; and upgrading information safeguards as necessary to limit risks
  3. Implementing secure user authentication protocols including:
    1. (a) control of user IDs and other identifiers;
    2. (b) a reasonably secure method of assigning and selecting passwords, or use of unique identifier technologies, such as biometrics or token devices;
    3. (c) control of data security passwords to ensure that such passwords are kept in a location and/or format that does not compromise the security of the data they protect;
    4. (d) restricting access to active users and active user accounts only; and
    5. (e) blocking access to user identification after multiple unsuccessful attempts to gain access or the limitation placed on access for the particular system;
  4. Implementing secure access control measures that:
    1. (a) restrict access to records and files containing personal information to those who need such information to perform their job duties; and
    2. (b) assign unique identifications plus passwords, which are not vendor supplied default passwords, to each person with computer access, that are reasonably designed to maintain the integrity of the security of the access controls;
  5. Encryption of all:
    1. (a) Transmitted records and files containing personal information that will travel across public networks, and encryption of all data containing personal information to be transmitted wirelessly.
    2. (b) Personal information stored on laptops or other portable devices;
  6. On any system that is connected to the Internet  and contains files with personal information on them
    1. (a) Keeping reasonably up-to-date firewall protection and operating system security patches, reasonably designed to maintain the integrity of the personal information
    2. (b) Keeping reasonably up-to-date versions of system security agent software which must include malware protection and reasonably up-to-date patches and virus definitions, or a version of such software that can still be supported with up-to-date patches and virus definitions, and is set to receive the most current security updates on a regular basis.

(3) Physical Safeguards

  1. Providing reasonable restrictions upon physical access to records containing personal information,and storage of such records and data in locked facilities, storage areas or containers.

Definitions used in the law:

  • Breach of security, the unauthorized acquisition or unauthorized use of unencrypted data or, encrypted electronic data and the confidential process or key that is capable of compromising the security, confidentiality, or integrity of personal information, maintained by a person or agency that creates a substantial risk of identity theft or fraud against a resident of the commonwealth. A good faith but unauthorized acquisition of personal information by a person or agency, or employee or agent thereof, for the lawful purposes of such person or agency, is not a breach of security unless the personal information is used in an unauthorized manner or subject to further unauthorized disclosure.
  • Electronic, relating to technology having electrical, digital, magnetic, wireless, optical, electromagnetic or similar capabilities.
  • Encrypted, the transformation of data into a form in which meaning cannot be assigned without the use of a confidential process or key.
  • Owns or licenses, receives, stores, maintains, processes, or otherwise has access to personal information in connection with the provision of goods or services or in connection with employment.
  • Person, a natural person, corporation, association, partnership or other legal entity, other than an agency, executive office, department, board, commission, bureau, division or authority of the Commonwealth, or any of its branches, or any political subdivision thereof.
  • Personal information, a Massachusetts resident’s first name and last name or first initial and last name in combination with any one or more of the following data elements that relate to such resident: (a) Social Security number; (b) driver’s license number or state-issued identification card number; or (c) financial account number, or credit or debit card number, with or without any required security code, access code, personal identification number or password, that would permit access to a resident’s financial account; provided, however, that “Personal information” shall not include information that is lawfully obtained from publicly available information, or from federal, state or local government records lawfully made available to the general public.
  • Record or Records, any material upon which written, drawn, spoken, visual, or electromagnetic information or images are recorded or preserved, regardless of physical form or characteristics.
  • Service provider, any person that receives, stores, maintains, processes, or otherwise is permitted access to personal information through its provision of services directly to a person that is subject to this regulation.

Compliance Deadline:

  • Every person who owns or licenses personal information about a resident of the Commonwealth shall be in full compliance with 201 CMR 17.00 on or before March 1, 2010.

Yahoo Hot Jobs

December27

Yahoo had an interesting article about what is “in” and what is “out” when creating a professional resume. To quote the article “Fashion changes, and resume styles change, too. If you have solid skills and work experience but your resume isn’t getting any bites, you might need a resume makeover.”

http://hotjobs.yahoo.com/career-articles-the_new_resume_rules_what_s_in_and_what_s_out-1056

To summarize the article:

  1. Include a “Professional Summary” on the top of the resume instead of an “Objective”
  2. Make it easy on the eyes
  3. Customize the resume for the job
  4. One-page resumes are a myth
  5. Quantify your accomplishments
  6. Include website links to previous employers and possibly a brief description of them
  7. Include LinkedIn or other social networking site links at the top of the resume

PHP Best Practices

December26

This was a very informative article: http://www.odi.ch/prog/design/php/guide.php

Best practices

This guide will give you solutions to common PHP design problems. It also provides a sketch of an application layout that I developed during the implementation of some projects.
php.ini quirks
Some settings in the php.ini control how PHP interpretes your scripts. This can lead to unexpected behaviour when moving your application from development to the productive environment. The following measures reduce dependency of your code on php.ini settings.

short_open_tag
Always use the long PHP tags:
Do not use the echo shortcut

asp_tags
Do not use ASP like tags:

gpc_magic_quotes
I recommend that you include code in a global include file which is run before any $_GET or $_POST parameter or $_COOKIE is read. That code should check if the gpc_magic_quotes option is enabled and run all $_GET, $_POST and $_COOKIE values through the stripslashes function.

register_globals
Never rely on this option beeing set. Always access all GET, POST and COOKIE values through the ‘superglobal’ $_GET, $_POST and $_COOKIE variables. For convenience declare $PHP_SELF = $_SERVER['PHP_SELF']; in your global include file after the gpc_magic_quotes quirk.

File uploads:
The maximum size of an uploaded file is determined by the following parameters:
file_uploads must be 1 (default)
memory_limit must be slightly larger than the post_max_size and upload_max_filesize
post_max_size must be large enough
upload_max_filesize must be large enough
Have one single configuration file
You should define all configuration parameters of your application in a single (include) file. This way you can easily exchange this file to reflect settings for your local development site, a test site and the customer’s production environment. Common configuration parameters are:
database connection parameters
email addresses
options
debug and logging output switches
application constants
Keep an eye on the namespace
As PHP does not have a namespace facility like Java packages, you must be very careful when choosing names for your classes and functions.
Avoid functions outside classes whenever possible and feasible. Classes provide some extra namespace for the methods and variables that live inside them.
If you declare global functions use a prefix. Some examples are dao_factory(), db_getConnection(), text_parseDate() etc.
Use a database abstraction layer
In PHP there are no database-independent functions for database access apart from ODBC (which nobody uses on Linux). You should not use the PHP database functions directly because this makes it expensive when the database product changes. Your customer may move from MySQL to Oracle one day or you will need an XML database maybe. You never know. Moreover an abstraction layer can ease development as the PHP database functions are not very userfriendly.
Use Value Objects (VO)
VOs are actually a J2EE pattern. It can easily be implemented in PHP. A value object corresponds directly to a C struct. It’s a class that contains only member variables and no methods other than convenience methods (usually none). A VO corresponds to a business object. A VO typically corresponds directly to a database table. Naming the VO member variables equal to the database fields is a good idea. Do not forget the ID column.
class Person {
var $id, $first_name, $last_name, $email;
}
Use Data Access Objects (DAO)
DAO is actually a J2EE pattern. It can easily be implemented in PHP and helps greatly in separating database access from the rest of your code. The DAOs form a thin layer. The DAO layer can be ‘stacked’ which helps for instance if you want to add DB caching later when tuning your application. You should have one DAO class for every VO class. Naming conventions are a good practice.
class PersonDAO {
var $conn;

function PersonDAO(&$conn) {
$this->conn =& $conn;
}

function save(&$vo) {
if ($v->id == 0) {
$this->insert($vo);
} else {
$this->update($vo);
}
}

function get($id) {
#execute select statement
#create new vo and call getFromResult
#return vo
}

function delete(&$vo) {
#execute delete statement
#set id on vo to 0
}

#– private functions

function getFromResult(&vo, $result) {
#fill vo from the database result set
}

function update(&$vo) {
#execute update statement here
}

function insert(&$vo) {
#generate id (from Oracle sequence or automatically)
#insert record into db
#set id on vo
}
}
A DAO typically implements the following methods:
save: inserts or updates a record
get: fetches a record
delete: removes a record
The DAO may define additional methods as required by your application’s needs. The should only perform actions that require the database (maybe only for performance reasons) and can not be implemented in a different mannor. Examples: isUsed(), getTop($n), find($criteria).

The DAO should only implement basic select / insert / update operations on one table. It must not contain the business logic. For example the PersonDAO should not contain code to send email to a person. For n-to-n relationships create a separate DAO (and even a VO if the relationships has additional properties) for the relation table.

Write a factory function that returns the proper DAO given the class name of a VO.Caching is a good idea here.
function dao_getDAO($vo_class) {
$conn = db_conn(‘default’); #get a connection from the pool
switch ($vo_class) {
case “person”: return new PersonDAO($conn);
case “newsletter”: return new NewsletterDAO($conn);

}
}

Generate code
99% of the code for your VOs and DAOs can be generated automatically from your database schema when you use some naming conventions for your tables and columns. Having a generator script ready saves you time when you are likely to change the database schema during development. I successfully used a perl script to generate my VOs and DAOs for a project. Unfortunately I am not allowed to post it here.
Business logic
Business logic directly reflects the use cases. The business logic deals with VOs, modifies them according to the business requirements and uses DAOs to access the persistence layer. The business logic classes should provide means to retrieve information about errors that occurred.
class NewsletterLogic {
function NewsletterLogic() {
}

function subscribePerson(&$person) {

}

function unsubscribePerson(&$person) {

}

function sendNewsletter(&$newsletter) {

}
}

Page logic (Controller)
When a page is called, the page controller is run before any output is made. The controller’s job is to transform the HTTP request into business objects, then call the approriate logic and prepare the objects used to display the response.
The page logic performs the following steps:

1. The cmd request parameter is evaluated.
2. Based on the action other request parameters are evaluated.
3. Value Objects (or a form object for more complex tasks) are created from the parameters.
4. The objects are validated and the result is stored in an error array.
5. The business logic is called with the Value Objects.
6. Return status (error codes) from the business logic is evaluated.
7. A redirect to another page is executed if necessary.
8. All data needed to display the page is collected and made available to the page as variables of the controller. Do not use global variables.

Note: it is a good idea to have a utility function that returns a parameter that is sent via GET or POST respectivly and provide a default value if the parameter is missing. The page logic is the only non-HTML include file in the actual page! The page logic file must include all other include files used by the logic (see base.inc.php below). Use the require_once PHP command to include non-HTML files.
class PageController {
var $person; #$person is used by the HTML page
var $errs;

function PageController() {
$action = Form::getParameter(‘cmd’);
$this->person = new Person();
$this->errs = array();

if ($action == ‘save’) {
$this->parseForm();
if (!this->validate()) return;

NewsletterLogic::subscribe($this->person);

header(‘Location: confirmation.php’);
exit;
}
}

function parseForm() {
$this->person->name = Form::getParameter(‘name’);
$this->person->birthdate = Util::parseDate(Form::getParameter(‘birthdate’);

}

function validate() {
if ($this->person->name == ”) $this->errs['name'] = FORM_MISSING;
#FORM_MISSING is a constant

return (sizeof($this->errs) == 0);
}
}
Presentation Layer
The top level page will contain the actual HTML code. You may include HTML parts that you reuse across pages like the navigation etc. The page expects the page logic to prepare all business objects that it needs. It’s a good idea to document the business objects needed at the top of the page.
The page accesses properties of those business objects and formats them into HTML.



value=”name); ?>”>

Localization
Localization is a problem. You must choose among
a) duplicating pages
b) removing all hardcoded strings from your HTML.

As I work in a design company I usually take approach a). Approach b) is not feasible as it makes the HTML very hard to read and nearly impossible to edit in a visual web editor like Dreamweaver. Dynamic content is hard enough to edit with Dreamweaver. Removing also all strings, makes the page look quite empty…

So finish the project in one language first. The copy the HTML pages that need translation. Use a naming convention like index_fr.php to designate the French version of the index page. Always use the ISO two letter language codes. Do not invent your own language codes.

To keep track of the language the user selected you must choose among
a) storing the language setting in a session variable or cookie
b) reading the preferred language (locale) from the HTTP headers the browser sends you
c) appending the language to the URL of every link in your application

While a) seems a lot more easier than c) it may be subject to session timeout. Option b) should only be implemented as an extension to a) or c).
Strings in a database must be localized too!
Making your application location independent
PHP has problems in some situations when include files are nested and reside in different folders and it is unclear at which directory level the file will be included. One can solve this by using absolute path names or using $_SERVER['DOCUMENT_ROOT'] as a starting point. However this makes your code location dependent – it will not run anymore if you move it down your directory structure one level. Of cource we do not like that.
I have found a convenient solution to this problem. The toplevel page (the one that is called by the browser) needs to know the relative path to the application root directory. Unfortunately there is no such function in PHP and the webapp context concept is completely absent in PHP. So we can not automatically determine the application root reliably in all situations (It is *really* impossible. Don’t even try. It’s not worth the effort.)
Let’s define a global variable called $ROOT in an include file in every directory that contains toplevel pages. The include file (call it root.inc.php) must be included by the page logic before any other include files. Now you can use the $ROOT variable to reference include files with their exact path!

Sample:
We have toplevel pages in /admin/pages/. The $ROOT variable must therefore be set to $ROOT = ‘../..’;. The page logic included by pages in that folder would reference their include files like require_once(“$ROOT/lib/base.inc.php”);.

In my suggested folder outline (see below) we don’t even need that, since all toplevel pages reside in the webapp root directory anyway. So the webapp root directory is always the current directory.
Folder outline
I suggest you make one file per class and follow a naming convention. Make sure that all your include files end with .php to avoid disclosure of your code to malicious users, which is a major security problem. I suggest the following folder structure:
/ Webapp root directory. Contains the pages that are actually called by the browser.
/lib/ Contains base.inc.php and config.inc.php
/lib/common/ Contains libraries and tools reusable for other projects, like your database abstraction classes.
/lib/model/ Contains the Value Object classes
/lib/dao/ Contains the DAO classes and the DAO factory
/lib/logic/ Contains the business logic classes
/parts/ Contains partial HTML that is included by pages
/control/ Contains the page logic. For larger applications you may want additional sub-directories for the individual parts (e.g. /admin/, /pub/) of your application to make the root directory a little lighter. Each of them would have their own control sub-directory.

Provide a base.inc.php file that includes (require_once) in the right order:
frequently used stuff (database layer) from /lib/common
the config include file
all classes from /lib/model
all classes from /lib/dao

Of course you will have additional directories for your images, uploaded files, … etc.

IT Security Staff

October20

This was a very interesting article I came across: http://technet.microsoft.com/en-us/library/ee672311.aspx

It was part of Microsofts Security Newsletter, which always seems to have something interesting: http://technet.microsoft.com/en-us/dd162324.aspx

Google Patent

July20

Some amazing information about the Google algorithm can be found here: http://www.seomoz.org/article/google-historical-data-patent

All the information is from these sources:

  1. The patent from US Patent and Trademark Office - US Patent #20050071741 – Information retrieval based on historical data
  2. From SEOChat Forums - Information Retrieval Based on Historical Data – Sandbox Explanation, Aging Delay?
  3. From Threadwatch - Google’s War on SEO – Documented
  4. From SearchEngineWatch Forums - Does New Google Patent Validate Sandbox Theory?
  5. From HighRankings Forum - New Google Patent, Must Read
  6. From SERoundtable - Sandbox Explained by Google? “Information retrieval based on historical data”
  7. From Search Science (Xan Porter) - New Google patent proves “sandbox” exists

Blocking domain masking

June9

Any website on the internet can be subject to domain masking.

And every website should be setup to prevent it. For example, for some odd reason the site http://www.xsjl.cn/ is setup to mask my website. I have absolutely no idea why, but if you go to http://www.xsjl.cn/ in the browser it takes you to my site, but keeps http://www.xsjl.cn/ in the URL path. So to stop it I added to the following PHP code to the top of my pages.

<?php
$domain = preg_replace("/^(.*\.)?([^.]*\..*)$/", "$2", $_SERVER['HTTP_HOST']);
if($domain != "matthewstevenkelly.com")
{
echo "<html><head><title>Error domain: ".$domain." is invalid!</title></head><body>Error domain: ".$domain." is invalid!<br><br><a href=\"http://who.godaddy.com/whoischeck.aspx?Domain=".$domain."\">http://who.godaddy.com/whoischeck.aspx?Domain=".$domain."</a></body></html>";
exit;
}
?>

PHP Input Validation

February4

Any time time a user inputs data to your site the input should be validated to ensure it cannot cause any harm to the system. The obvious characters that cause problems are double and single quotes which are used in injection attacks to trick the server into executing malicious code. However, there are many other special characters and situations that can cause problems. This is especially important with taking input data and storing it a database, or emailing it off, etc.

PHP has built in functions to handle these tasks including preg_replace and substr. I created some functions below that I use for field validation:

They can be called like this:

<?php
echo trimLength(“This is a long string that needs to be cut down to ten characters”,10);
echo “<br>”;
echo filterText(“This @is$ t%ex&t w*ith $bad character*()@’s that need filtered”);
echo “<br>”;
echo filterNumeric(“1234ABCD”);
echo “<br>”;
echo filterEmail(“fake’s_email@^liar.com”);
echo “<br>”;
?>

substr is used to trim the length of text like below. This is especially useful when inputting data into a database fields such as varchar that have limited character lengths.

function trimLength($data,$len)
{
if(strlen($data)>$len)
{
$data = substr($data,0,$len);
}
return $data;
}

For the rest of my filtering, I always use regular expressions to filter out bad characters. I do this because regular expressions allow you to filter characters by specifying what characters you allow – not what characters you want to reject. This is an important distinction because there are so many different character sets and special characters that if you only filter by character replacement, instead of character exclusion, you open yourself up to faulty characters entering the system. If you are currently using str_replace to remove apostrophe’s and quote’s consider upgrading to regular expressions.

This text filtering allows for periods, comma’s and spaces to be used in the text:

function filterText($data)
{
return preg_replace(“/[^A-Za-z0-9.,\s\s+]/”,”",$data);
}

Only numbers are returned with this function:

function filterNumeric($data)
{
return preg_replace("/[^0-9]/","",$data);
}

When you need to filter a URL, different special characters such as ?, % and / are to be allowed

function filterURL($data)
{
return preg_replace("/[^A-Za-z0-9:_\%\-.\/\?,+]/","",$data);
}

This email filtering function doesn’t just filter the characters in email address it also validates it is in username@domain.domaintype format:

function filterEmail($data)
{
list($username, $domain) = explode("@", $data, 2);
$username = preg_replace("/[^a-z0-9._-]+/i", "", $username);
$domain = preg_replace("/[^a-z0-9._-]+/i", "", $domain);
if ( $username == "" || $domain == "" || !strpos($domain,"."))
{
return "";
}
else
{
return $username."@".$domain;
}
}