Normalization == good?

Frice is
talking about database normalization
. Beginners usually do not have a good
grasp of normalization principles, and end up with database designs that create maintenance
nightmares and data inconsistency. On the other hand, intermediate level database
architects sometimes swing too far in the other direction and normalize everything
to a fault. I’ve debugged scalability problems in a wide range of systems,
and one of the most frequent causes of scalability and manageability bottlenecks in
the state management tier is excessive normalization. These are fun problems
to solve, because the DBAs have been conditioned with “normalization religion”
and often don’t see any way out of their problems. When they come to realize
that normalization is just a technique meant to serve a particular purpose, and not
a universal law, it can be liberating. In the most general sense, indexes and
data warehouses violate the “no redundant data” rule of normalization,
yet both are critical components of any scalable data management system. So
how do you know when to relax the normalization rules? One rule of thumb would
be, “normalize first, then denormalize as necessary for scalability, perf, or
data warehousing”. However, I think that experienced database architects
should be able to make better decisions than this, based on the characteristics of
the data. For example, normalization is well-suited for OLTP style loads, where
updates are frequent and queries (especially queries with joins) are infrequent.
But even here it might be wise to violate some normalization rules, particularly about
having tables with identical keys – vertical partitioning often makes sense
depending on the way your data is used. Columns that are update-only could be
separated from columns that are read-only for the same keys – this might make
sense for perf reasons, or even for manageability reasons, since you can round-robin
to a cluster of the read-only servers and keep a single master for writes. Partitioning
(horizontal and vertical), aggregation, and data duplication are all very powerful
weapons in the advanced database designer’s arsenal. The number of things
to consider is large, and by the time you’ve dealt with enough large data management
systems, you’ll get good at making decisions on a more comprehensive level without
needing to depend on the religion of normalization. Now, it’s probably
a good thing that “normalization religion” exists. Losing this religion
takes a certain level of experience, and if the religion were not so venerated, you
would have beginners shipping skunky schemas and claiming that it was OK, since “denormalization
is elite”.

Leave a Reply