The topic for this month’s T-SQL Tuesday #182 hosted by Rob Farley (Blog) is integrity. Read on to see my thoughts on integrity in the world of databases – in particular database corruption.
The CIA and ACID
What I’m going to say here is much less sensational than discussing the usage of LSD from security state officers on a mission. Rather I’m going to remind about the fundamental principles of relational database management systems:
- Atomicity
- Consistency
- Isolation
- Durability
Database corruption affects the consistency of the database. Normally when a transaction tries to change data SQL Server goes from a stable state to a state of flux (your change) and back to a stable state. With database corruption it is never returned to a stable consistent state.
Integrity is one of the three pillars of the CIA model of information security.
- Confidentiality – people see only the data they should see and no other
- Integrity – the trustworthiness of the data, kept accurate and not meddled with
- Availability – systems are up and functional as they should be, open for business
Database corruption affects the integrity of your data being housed in an RDBMS. We must be able to trust the data we query is accurate. Here I am not talking about an accurate interpretation but rather the actual data itself is untrustworthy. Database corruption can cause incorrect or incomplete data to return thus making it unreliable.
Database corruption can also contribute adversely to the availability of your database system. Not so much as a DDoS attack but more like an index is unusable or a table cannot be read. For applications querying this as part of their service’s process, this can impact end users. Taking the corrupted nonclustered index example, it can cause a query plan to execute slower. A corrupted table won’t return all the records (tuples to be acerbic and academic about it) thus resulting in a form of unavailability.
GIGO: Garbage In – Garbage Out
Another way to look at integrity within your database is to consider what you’re putting into it. If you feed your body junk food then you can expect to be out of shape. If you put cheap fuel into your highly tuned engine it won’t perform as well. And if you put garbage data into your database then you will get garbage data out of it.
From an integrity standpoint, this is why it’s very important for your system to be a gatekeeper against bad data. Just like how foreign keys disallow orphan / bastard records from being stored (ex. a relational trait for something which doesn’t exist in another relation), or table based unique and check constraints protect and define what can be entered. The logic can be a layer outside the database too such as in the application layer code – that’s fine as long as you keep bad data from being stored.
Bad Data Versus No Data
There’s a saying throughout my years of professional experience that I’ve grown fond of:
Bad Data is Worse Than No Data
Source – Trust Me Bro!
This is what I’d like to leave you with. If given the choice between having no data or bad data – choose no data.
For example, if you are making a business decision like whether to continue / increase / or decrease an investment, and you have bad data then you will likely make a bad decision. Perhaps through chance you manage to guess right but it won’t be logical or well-reasoned – it’ll be luck. Conversely, if you have no data, you can still make a bad decision – maybe even the same bad decision, but you will at least know that it is an unfounded decision not based in what we know or can quantify. Then your audience will have some caveat about how confident you are with the analysis.
This matters because as data professionals we need to adhere to logic, reason, and a scientific process. The alternative is deciding on something without understanding the details and then rationalizing why i.e. making the data fit your pre-defined conclusion rather than basing your conclusion off where your principles coupled with data lead you.
Conclusion
I think I summed it up just now. Integrity is critical to databases and data so please remember its importance, be careful not to load garbage into your database, and remember always that bad data is worse than no data.
Thanks for reading!
You Might Also Like