BCNF in Database Design
The Boyce Codd Normal Form (BCNF) which is a refinement of the
3rd Normal Form, the 4th Normal Form and higher. In practice people rarely go beyond the 3NF and almost never beyond the 4NF.
Q10 What is the BCNF? I heard of it, but I did not really understand it.
Q10. An example may help. Suppose a student can have more than one major and we would like to keep track of the student's major(s) and students advisors in the following table:
Since StudentId repeats it can't be the Primary Key. We can choose either StudentId and
Major as the Primary Key or StudentId and Advisor as Primary Key. Say we choose
StudentId and Major as the Primary Key. But that means that the remaining field Advisor is a fact about both StudentId and Advisor. So we know that Hilbert is an advisor for math majors and advises student 100.
This table is in 3NF but it still has anomalies (inconstancies). It's in 2NF because the advisor is a fact about both the student advised and the major he/she advises. It's in 3Nf because advisor is a fact only about the primary key (StudentId, Major).
Suppose student 300 drops out of school. If we delete student 300 we lose the fact that
Dr. Ruth Advises psychology. This is a deletion anomaly. Also how can we know that
Dr. Freedman advises Economics until student major in Economics? This is an insertion anomaly. So we have inconsistent dependency.
An attribute is a determinant if it determines another attribute. For example StudentId determines Major. Advisor determines the major she/he advises.
A table is in BCNF if it's in 3rd NF and every determinant can be used as a Primary Key.
In our example Advisor attribute determines Major but is not a possible Primary Key. StudentId and Advisor together is a possible
(candidate) Primary Key.
Normalization in Database Design
First Normal Form (1st NF)
• The table cells must be of single value.
• Eliminate repeating groups in individual tables.
• Create a separate table for each set of related data.
• Identify each set of related data with a primary key.
Do not use multiple fields in a single table to store similar data.
For example, to track an inventory item that may come from two possible sources, an inventory record may contain fields for
Vendor Code 1 and Vendor Code 2. But what happens when you a dd a third vendor? Adding a field is not the answer; it requires program and table modifications and does not smoothly accommodate a dynamic number of vendors. Instead, place all vendor information in a separate table called Vendors, then link inventory t o vendors with an item number key, or vendors to inventory with a vendor code key.
Second Normal Form (2nd NF)
If it's in 1st NF and if the Primary Key is composite (multiple columns) then any fact in that table must be a fact about the entire composite Primary Key not just part of the Primary Key.
For example, if an inventory table has a primary key made up of two attributes PartId and WarehouseId. Suppose the inventory table has the warehouse address in it, since warehouse address is a fact about WarehoseId and not about the PartId the w arehouse address is in the wrong table. This is in violation of the 2nd Normal Form.
Third Normal Form (3rd NF)
• If it's in the 2nd NF and there are no non-key fields that depend on attributes in the table other than the Primary
Key.
Suppose in the Student table you had student birth date as an attribute and you also had student's age. Students age depends on the student's birth date (a fact about his/her birth date) so
3rd Normal Form is violated. Also, a student table that containsthe address of the Department of his/her major. That address is in the wrong table, because that address should be in the
Department table. That address is not a fact about the Student
Primary Key. A violation of 3rd Normal Form.
A (non-key) fact in a table should be about the key, the whole key, and nothing but the key.
EXCEPTION: Adhering to the third normal form, while theoretically desirable, is not always practical. If you have a
Customers table and you want to eliminate all possible interfield dependencies, you must create separate tables for cities, ZIP code s, sales representatives, and any other factor that may be duplicated in multiple records. In theory, normalization is worth pursing; however, many small tables may degrade performance or exceed open file and memory capacities. But the worst thing is to have to join too many tables in queries. Joining tables is the most expensive operation in time and memory cost.
It may be more feasible to apply third normal form only to data that changes frequently. If some dependent fields remain, design your application to require the user to verify all related fields when any one is changed.
The Benefits of Building Database Right
The benefits of building it right are many. Just as you would never consider building a house without thoughtful design and planning, so should you approach database design. Picture the building architect who is designing to your specifications; you’ll be asked questions like “How will you use this?" "What are your future plans?" "What do you want this to look like?" “How do you want it to work?” Asking these same questions when you’re building a database will give you the same usability and “livability” that you’ll get out of a well-designed house.
• The benefits of a good foundation: once the blueprint is drawn and the plans are finished, it's time to start building the databases. You would never consider building a house unless it was underlain by a solid foundation, so the first thing you do is pour the foundation. The foundation to your applications and systems is the database. To ensure that your database will be a good foundation for the superstructure of applications and programs that rely on it, build it right in the beginning.
• The benefits of good record design: a well-normalized database makes it easier to maintain the data. In a well-normalized database, when you have to make a data modification you can make it once, make it right and make it fast. You’ll experience better performance on data retrieval because, in a well-normalized database, the relatively short record length allows for many records to be read from or written to the hard disk in a single physical I/O operation.
Virtual (in-memory) operations are enhanced, as well, because of the shorter record length; more records fit on a single page, therefore more records can be moved or processed in a single pass. Resource-intensive tasks such as disk I/O for the data in storage or for the page file will be greatly reduced. To optimize database operations and response time, practice good record design from the beginning.
• The benefits of implementing the design correctly: when you’re implementing a design, any flaw can have negative long-term repercussions. Back to our house-building analogy, a flaw in the electrical wiring when the house is being built could result in an electrical short which is expensive to address later or could lead to a disastrous event. You can have a perfectlyformed design, but if you implement it incorrectly, months or years down the road you’ll find yourself applying remedies to symptoms or focusing on more superficial problems — not the root cause of problems — as you try to compensate for the implementation flaw that you introduced in the beginning. To avoid haphazard code and techniques, implement it right in the beginning.
• The benefits of determining, documenting and implementing business requirements: If your database doesn’t support the business requirements, it will be — at best — a disappointment from the start, and at worst, could cost you your job. To avoid having to deal with a database that just doesn’t support the requirements, make sure that you understand what those requirements are, and not just today’s requirements. Ask about, document, and confirm the business requirements for tomorrow and next year and the year after, and the year after that. Only by understanding where the company plans to be in the future can you get the design right for today’s operations.
Restricted Design Restricts Business Operations
In order to get it right the first time, you have to understand the business requirements. Such was not the case on a project that involved a complete redesign of a production database that supported an Internet retail organization. In this case, the company had four Internet retail websites with a plan to increase this number and possibly evolve the business to a franchisebased operation. The four retail sites all relied on a shared database and application code base.
Due to how the database schema was constructed, early database design decisions were hampering organizational growth.
The database schema, designed by a third party, had served the company well to this point but failed to take into consideration the going-forward business plan. Not being technical people, the management group never recognized these design shortcomings which led to hard-coded limitations on the number of stores and the type of services that this company could offer.
Without fundamentally restructuring the underlying database and rewriting the code base, there was no way to expand the features and functionality of this database system to support business growth initiatives.
For this kind of situation, you should create a checklist for yourself of things to do and things to consider. First on the checklist is to identify the business requirements, both current and future.
You'll want to incorporate these into the new database schema. Enlist the help of end users to ensure that your database design will be able to support their needs and expectations. In this case, since this was a retail database, we studied the behavior of real-life shoppers in a brickand- mortar store, and translated that into the database schema, adding features that were unique to an online shopping experience.
Second on the checklist is to look at the physical implementation, specifically the way the database files have been placed on disk. Using a tool to provide proactive monitoring and management of the database is necessary. In this case, I used an operational profiling tool to test performance and query response both before and after the physical restructuring. My findings before restructuring were not encouraging. The original database had all objects stored in a single file group. This database included all types of data — tabular (numbers, short character strings, dates/times) and text/image (long product descriptions and LOTS of images), and that single file group was located on one physical/logical disk. This configuration adversely impacted performance and degraded the user experience for the Internet shoppers.
When restructuring a database like this — one whose content is in great part composed of binary images and extraordinarily long strings of text — it’s a best practice to separate the text and image data from the tabular data. Store the text and image data in one set of tables, and then use relationships to link each image or long text string to its associated tabular record(s).
You'll also want to separate the system (catalog) tables from the image and tabular data. You will want to place each one of these three types of tables in its own file group or table space.
Then, you will want to locate each file group on its own spindle. Using these simple principles of physical design, you'll significantly enhance database performance.
Third on the checklist is indexing for performance. Whenever you have a one-to-many relationship between two tables, you'll have a foreign key reference in the table on the many side of the relationship. In most database systems, the primary key is automatically indexed; that's not always the case with the foreign key. To ensure best performance on joins, make sure that every foreign key in the database is indexed. The next candidate for indexing is any column that will be used for sorting — any column that will consistently be used in an “order by” clause of a SQL query. You'll also want to index columns that will be used for restricting the returned data set, such as those that consistently appear in “where” clauses.
It's no easy task to determine, document and implement business requirements. It takes a lot of cross-functional collaboration, great attention to detail, and a validation of facts and assumptions in order to do it right. If you want to get it right the first time, define and adhere to a disciplined approach to gather and share the right information. Attention to detail while maintaining the big picture in your head is a must for good database design and getting it “right”.