Every organization has it's own unwritten dictionary of terms which are used to describe different aspects of the business. These terms are passed from person to person through experience and on a need-to-know basis until everyone in the organization seems to be speaking the same language. In reality this process is creating inefficiency and risk in your business. The solution is to build and maintain a data dictionary.
In this post I'm going to answer the question - What is a data dictionary? - and how using one can help you dramatically improve efficiency, speed up growth, and dominate your competition.
A data dictionary is a living document which lists and defines key terminology used to describe elements of the business. The goal is to get as many people in the organization to be using the same language and to clearly understand how each term is defined. This will save time, help analysts create better analyses and reports, and eliminate confusion. All of these benefits will save you money and time, two critical resources for any organization.
The data dictionary should be accessible to as many people in the organization as possible and there should be a single owner. I'd recommend including it as part of the onboarding process of new employees.
I'd recommend that either your CTO, Director of Analytics or most senior analyst take ownership of the document. If you have all 3 roles in your organization then give it to the analyst. You'd want the owner of the document to be a native English speaker.
The document should sit on the cloud in either a Google Sheet or in a service like Confluence. Make sure that only the owner has editing permissions.
Once you have a data dictionary in place every analysis that references certain business objects should use the definitions in the dictionary as a guide. If objects are referenced in an analysis and don't exist in the data dictionary then the analyst should raise this and it should be added to the backlog.
Below is a screenshot showing my basic data dictionary template. I recommend starting with this template and customize it according to your needs.
See a live example here.
My data dictionary template has the following pieces of information:
At the top of the sheet you should include the name of the owner of the document. The main idea behind this is to help new employees which get their hands on the spreadsheet to know who is responsible for adding and maintaining it.
The date when the document was last updated. This is to help put a bit of pressure on the owner of the document to keep things up to date. I'd recommend updating the document at least once a month. If no changes are needed then the owner of the document should still update the date. This will help maintain trust in the information shared in the sheet.
The first column in my data dictionary template holds the department associated with the terms we are about to list. It is a good idea to categorize your terms by department so that the data dictionary is organized. If you have a very large list of terms you may want to consider creating a separate sheet for each department.
This column holds the actual term which will be used. The number of terms is really up to you but one idea to consider is to build a type of hierarchy among your terms. An example would be visitors and then new and returning visitors as sub-sets of visitors. You can use multiple levels which can be very helpful if you're operation is very segment orientated.
Term owner is a very important column since it will force key stake holders to list and define their terms. Terms often become KPIs or at least are associated with KPIs so this process can help managers define or at least revisit their KPIs. Some managers will be very poor at coming up with relevant terms and definitions. Make sure you are patient and help them out with this process. Another major advantage of getting the stake holders to define their terms is it makes them partially responsible for the numbers shown in reports, analyses and dashboards.
The definition column is the toughest to complete because it forces you to use the right language so that everyone can understand the term. Keep things as clear as possible and rather write too much than not enough. Write out the definitions in such a way that someone that is new in the company could get a basic idea of what the term means.
The single source of truth or SSOT column holds the name of the data silo which will be used to calculate the value of the term listed. This column is primarily for the company's business analysts and ops people but it can be very helpful to the C and VP management to understand which tools / databases are holding key information.
This column, like the one mentioned above, is primarily for people in the organization that will need to run analyses and build dashboards that use the definitions listed in the dictionary. This column should include technical information on how to calculate the value of each term in the dictionary.
The more coherent you can make your organization the more efficient it will run. A data dictionary is a great way to help get everyone aligned so that the same language and logic is being used in meetings, analyses and most importantly, in decision making.
To get started open my spreadsheet and copy the structure in your own Google Sheet. Once you're ready with your first version move it to the cloud and start sharing it out with everyone in your organization. Get your analysts so update their dashboards with the new logic and make sure you're reviewing the data dictionary every now and then.
Are you using a data dictionary in your company? Do you have any tips on how to improve a data dictionary that you can share in the comments below?
Thanks for reading.