End to End Data Lineage Tracking the Elusive Holy Grail.
Having been involved in metadata management for over 15 years now, I am still amazed that most folks still get stuck on chasing the elusive lineage holy grail. Many projects fail, go over budget, or just die on the vine as a result of taking this position of only wanting the perfect “If I can’t have my cake and eat it, I won’t do anything at all”. Sorry to be so blunt, but that’s the sad reality.
Let me explain,
In many instances when I speak with a client, I am ultimately faced with the questions around magically stitching metadata across a cornucopia of tools, custom processes, languages, parametrized jobs, source code, and black boxes; yes black boxes!.
The Facts.
While high end ETL tools do provide robust lineage capabilities, they only do so for the data points they are setup for. This is what I call a point solution. Most third party tools do not do a good job in making their metadata easily available in the first place, or even complete (externally anyway). Many gaps exist. This is a fact. On top of that, we have to figure out all the intermediate processes, custom applications, and cross platform issues. Shooting for this is time consuming and tremendously expensive.
In many instances, unless a company is using a single point solution end to end, a complex mapping activity needs to occur. That is if you want to do that. Most don’t. Most companies won’t invest in this strategic non-revenue generating effort, which if you like uphill battles, this is a good one to undertake.
Even if you could somehow magically infer most of that metadata. The business rules, logic, and tribal knowledge won’t be found in an ETL tool or documentation.
At the end of the day you can’t make a silk purse out of a sow’s ear!
What is the Solution?
We’ll it’s simple, streamline your expectations and take a phased approach. I repeat, phased approach! If the lineage metadata is available, you can have a pretty picture of the lineage. If it’s not, you cannot. Not easily anyway. However, what you can do is have a robust, metadata impact analysis capability that is automatic, and requires little manual labor to setup.
Here is one scenario, I need to find the impact of changing a data element. I perform an impact analysis and find that this element is affected by an ETL process. If available, I can see the lineage. However, if it is not, I can always open up the ETL tool and drill into the gory details from there. The metadata tool will tell me information about what tool, and what job or package impacts the element. It points me in the right direction.
An Analogy
Go to Google and search for something you need. It will list possible matches, one result happens to be a PDF document. The title looks interesting and the metadata leads you to believe it contains what you are looking for. You need to open the document to really be sure. Presto, got what I need. Google isn’t magic really, its a powerful search engine, and it’s the worlds most popular for searching the internet.
Conclusion
If you truly want to start leveraging your metadata, think in these terms. Set your expectations realistically. The biggest bang for your buck is going to come from cataloging and delivering powerful search and impact analysis features to your users.These are out of the box capabilties.
If you are not managing or leveraging your metadata, then you are probably doing all of this manually. Very time consuming and unnecessary.You can always start there, and add business lineage through a governance and stewardship effort if desired over time. It’s a win win.