In case you have not heard yet, big BI projects can have a high rate of failure. I could tell you the many reasons why these projects can fail, but I’d rather explain why this one didn’t. I think we can learn a lot more from what went right in this project instead of what can go wrong and why. In this first blog, I want to explore the unique aspects and challenges of this project. In Part 2, I’ll dive into the reasons I believe we were “insanely great” (humbly borrowing Steve Jobs’ phrase!)
Challenge # 1
Our client had two data warehouses – a Consolidated Data Store (CDS) and an Enterprise Data Warehouse (EDW). The client wanted to retire the CDS and ensure that the EDW became the “go-to” data warehouse having “the single version of the truth”. This meant that all inputs (source systems that fed into the CDS) had to be migrated to the EDW along with all outputs (reports and extracts delivered to third-party applications).
Our one requirement was, “Do whatever the CDS is doing!” If we had to determine how a report had to work, we had to understand how the report worked with the CDS. As is the norm, there were no requirements documents explaining how the existing CDS reports worked. The source code for the reports were the only clues available to determine functionality.
Challenge # 2
There are two important and non-negotiable design constraints on a warehouse (Hat tip to Ralph Kimball for this critical insight; don’t let anybody fool you into believing something else.): understandability (from the end-user perspective) and speed (when it comes to presenting the data to the end-users). Load time is also quite significant, as most warehouses need to support “near-real-time” data. You need to get them right or the warehouse is essentially useless. We faced several challenges with load as well as retrieval performance (that we were able to tackle quite successfully!).
Data Modeling is a very crucial activity in a warehouse. There are no simple answers when it comes time to make modeling decisions. You need to grapple with several design choices and have to balance load performance, retrieval performance, maintainability, ease of use, and understand how end-users access data. Balancing these constraints can be extremely frustrating for an engineer who likes to get things done! Bad design decisions can potentially lead to disastrous consequences! One of the advantages of working on the project for over a year was that I could truly understand the consequences of my design decisions! Some I regret and the other I don’t.
Challenge # 4
Understanding a data warehouse is a challenging cognitive activity. Often times, it would take 3-4 people to answer what we considered a simple requirement question. Once we got an answer, we had to investigate the territory and ensure that the “map matched the territory”. We also had to understand not one but TWO warehouses to be successful – the CDS and the EDW. Both warehouses had different data models and it was challenging to juggle both models in our heads at the same time!
This project required extreme coordination and cooperation among different groups. We had:
- A Data Integration (DI) team that was responsible for the data warehouse.
- An operations team that was responsible for setting up the operational schedule for running the data warehouse jobs.
- A Business Intelligence (BI) team responsible for the reports and other “BI aspects” of the warehouse.
- Teams responsible for the “staging layer” on the warehouse.
With so many teams involved, we had some tough times trying to figure out who was responsible for an issue. When you are involved with a complex system, fixing the system is also a complicated activity!
Challenge # 6
Our development environment was usually a virtual disaster area! We grappled with poor data quality and often times we could not unit test our code to the extent I would have liked. We also could not understand the performance traits of our queries to tune them beforehand. Our feedback loops were long and we had to wait until code was deployed to the TEST environment before we could notice issues. This is not an efficient and effective way to fix code.
Challenge # 7
This project had high visibility from the client It was very important for them to have the migration complete before the end of this year and deadlines were very important. There was definitely a lot of pressure on everybody involved.
There are probably more challenges I am missing. This should give you a very good idea of what our team battled and hopefully help you appreciate what our team achieved! Next post will be how we got through these tough challenges and came out on top.