by Andre Gous
A good formal definition comes to us from http://webopedia.lycos.com, which defines it as "the process of organizing data to minimize redundancy. Normalization usually involves dividing a database into two or more tables and defining relationships between the tables. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships."
In a typical non-normalized database, values like city names and state abbreviations are stored once per address record. As a result, the values typically have to be typed in full, and it is difficult to keep the data clean and consistent. Also, there is a lot of unnecessary typing.
In a typical normalized database, there would be a separate container for city values, and a separate container for state values. A city would be associated with a state. As in the non-normalized database, in the address data entry screen, the user would have the opportunity to type the city name. However, as the user types, the typical normalized system will try to guess ahead, as to the value desired. For example, if the user types "sun" and there is only one city name (Sunnyvale) which starts with those letters, the typical normalized system will display "Sunnyvale, Ca." The user can then accept the value, by simply proceeding to the next field. The user does not need to indicate that the city is in California, because the link between Sunnyvale and California has been pre-established.
So, two benefits of the normalized approach are that the user needs to type less, and has cleaner data as a result.
The typical non-normalized system would be difficult to use as a basis for accurate report-writing. For example, asking for addresses in "Sunnyvale" would yield only some of the records intended to be in Sunnyvale. The other records would be invisible to the report, due to having being misspelled. Similarly, a report trying to list all of the addresses in "Ca" would not actually find all of the address records intended to be in California. It would miss "Calif, Ca and Cal."
Let's assume that, in an auto parts business, department "Datsun parts" changed its name to become "Nissan parts." In this example, the non-normalized database would require changing all of the "Datsun" values in the product records to "Nissan" instead. This would be a time-consuming process if there were thousands of "Datsun" product records. Also, if mass update process died halfway through, the database would be internally inconsistent.
In a normalized database, a product record would not contain the value of "Datsun", but would instead point to a record in the "Department" container, which would contain "Datsun." To change "Datsun" to "Nissan", the systems administrator would simply edit the "department" record. This change would affect all of the product records pointing to this "department" record. Suddenly, they would all be "Nissan" products.
Another good example, from the book "Oracle, a Beginner’s Guide" is where a person gets married and changes her last name. In most companies, this requires changing her name in many different places, including security, internal company directory, payroll, personnel, etc.
It would be ideal if a person’s name were stored centrally, so that the change needs to be made in only one place, and the effect of this update would immediately benefit all of the other systems which point to that record.
If the structure were normalized, where the phone types are not embedded in the type of field, no database changes would be necessary.
There would be a "type of phone" defined and stored in a central container, and every phone record would point to a record therein. The "phone type" container would have two records: "voice" and "fax." Some phone records would point to the voice type record, thereby effectively announcing "I am a voice line" and others would point to a fax type record thereby announcing "I am a fax line."
When cellular phones need to be supported, a "cellular" record would simply be added to the central phone type container. Then, when cellular phone numbers are entered, they would be pointed to the "cellular" record type, thereby effectively announcing "I am a cellular phone line." No program changes would be required to effect this change.
The normalized database would not store the city names in the address records, but instead would store the city names in a central "city" data container. In each address record would be a numerical pointer to a central city record. A number requiring two characters of internal memory space would be adequate for pointing to any one of 65536 different city records, presumably enough for all the cities in the U.S.
So, each address record would contain only a number, requiring 2 characters instead of 16.
As a result, the database would need only 2 x 100,000 = 200,000 characters of space, to support city names in the database. Compared to 1,600,000 characters, this is a huge savings in disk space. As a result, the program would be able to run on less expensive hardware, and be able to retrieve data faster.
A major benefit is in programmer productivity, since tasks are more complex but less time-consuming. Systems can be made and expanded faster and maintained more easily.
Normalized systems also make it easier to retrieve the data and to keep it clean.