The early history of databases and DB2
So I debated about even writing this post. But the truth is that I find database history fascinating. I’m sure this post has nothing on the Wikipedia article, but I’m going to give it a go anyway. Note that this is my own sometimes unsupported view of database history and may include inaccuracies
The early history of databases
As a DB2 DBA and a former IBMer, I’m biased towards IBM’s history of datbases – in which IBM seems to claim that IBM developed the first computer database either for American Airlines in 1962 (SABRE) or for the moon mission (see http://www.dbisoftware.com/db2nightshow/20110425-Z01-roger-miller.pdf) in 1964. I went to research for this blog post and could find very little to support this outside of IBM sources. In my head I will continue to link early computer databases to the space program – if for no other reason than I dreamed of being an astronaut as a child, and this is the closest I will ever come!
I say first ‘computer’ database intentionally, because examples of organized collections of data (the true definition of a database) are everywhere before the computer age. Ancient libraries and encyclopedias were certainly examples – even simple filing cabinets could be considered such. But obviously when I say database for the rest of this post, I’m referring to something on a computer.
Early databases were hierarchical, and could only be navigated based on their pre-defined structure. You had to know the structure, including fixed-width fields in order to access the data, and you couldn’t do a query as we know them today – the relationships other than one parent to one child were not easy to define. Even when they did manage to get multiple children to a parent, you couldn’t relate children to each other. They were slightly better than flat files, but bear more resemblance to flat files than to the Relational databases of today.
The generally accepted father of the Relational database(http://en.wikipedia.org/wiki/Relational_database) is E. F. Codd, who worked for IBM and published a paper on the topic in about 1970. One of the interesting things to me is where the term “relational” comes from. Some may assume it came from the fact that you defined relationships between the tables as one of the integral parts of a relational database, but in fact,
‘relation’ is another name for a table – in relational algebra. Based on user comment, a table would actually be a ‘bag’ in relational algebra. Here shows the fact that I have a computer business degree and not a computer science one. The premise that relational database is not so named based on defining the relationships between data is still solid though – it is so named based on relational algebra. Though as the commenter below stated, the relational algebra is not about the relation or table, but about manipulating it.
It seems that IBM did not recognize the power of E.F. Codd’s concept, and did not give the resources to properly developing it into a product as early as it could have. When it was developed, E.F. Codd was not directly in charge, and thus relational databases today do not exactly match some of his key points – though part of the reason for that may be the gap between theory and reality, too.
It is interesting to note that no “Relational” DBMS of today actually meets all of the rules E.F. Codd set forth to define a relational database. http://en.wikipedia.org/wiki/Codd%27s_12_rules
Which came first – DB2 or Oracle?
Several Oracle DBAs I’ve met proudly proclaim that Oracle was the “first relational database”. Oracle was “commercially available” as a relational database before DB2 in June of 1979 (http://en.wikipedia.org/wiki/Oracle_Corporation). IBM had “System R” internally before that, and “SQL/DS” in in 1981 (later renamed to DB2 for VM/VSE). DB2 as such was released in 1983.
More recent DB2 history
What we call DB2 UDB or DB2 for LUW first became available in the early 90’s. At that time, it included OS/2 support as well.
It gets hairy when you try to describe the dramatic changes in the product since its inception. Many either Oracle or DB2 or some other RDBMS could lay claim to as being the “first” to implement or the first to implement in a certain way. RDBMSes sure have come a long way since initial inception.
I started as a DBA back in 2001 – shortly after DB2 UDB version 7 became available, and worked on versions as old as 5. Enhancements that I particularly remember are: Compressed backups, online LOAD, LOAD without locking the whole tablespace, online REORG, ability to shrink DMS tablespaces, all of the online memory changes and STMM, moving archive logging away from userexit and into the db cfg, drastic locking changes with both registry parameters and the new currently committed behavior, drastic improvements in data propagator, HADR, and TSA integration for HADR. They say the GUI is better with each version, and I do give it a try from time to time, but I’ve never really been fond of GUIS. There are other things I know about and think sound great, but I just haven’t gotten a chance to try yet, like data compression, true XML, and PureScale.
Relational vs. NoSQL
This is a debate I hear from developers from time to time. They question the need to even use a relational database. I’m obviously very biased on this topic, but I just don’t think that NoSQL databases I’ve seen provide the transaction control, concurrency, and flexibility that a relational database can provide. If all you’re using a database for is to store static settings, well, yeah, NoSQL might be better for that – but if you’re running an OLTP database with concurrent users, I just don’t see NoSQL as a viable option.
So, want to argue or agree with me on anything? Comments are always welcome, even if it’s to point out something I missed or something that I’m wrong on. If anyone has nice detailed links or documents on DB2 or database history, I’d love to read them.
Corrected/updated on 2/6/2012 based on user comment.