What’s a Controlled Vocabulary — and How Do I Get One?
Erica Hornung
Controlling descriptive metadata helps your business find the data it needs.
JPEG or jpg? Movie or Film? White or Caucasian? If you've faced the frustration of searching for specific images, documents, or data and find yourself going in circles, your business could benefit from defining one or more Controlled Vocabularies to focus the way you describe and discover your data.
In a previous post, we explored the relationships between schemas, taxonomies, and ontologies. However, once you have your metadata model, the multitude of empty metadata fields can be overwhelming. How do you know what to input in each one? Enter the Controlled Vocabulary (CV).
A CV, sometimes called a business glossary, is a curated collection of terms and definitions that apply to your metadata model. You might have encountered CVs as dropdown menus, filter checkboxes, or suggested search terms – all features designed by data specialists. These terms, sometimes known as keywords or tags, are thoughtfully selected and standardized to align with your business needs and overall data strategy.
CVs can be developed in various ways. There are commercial and open-source models available for businesses to leverage. Alternatively, you can create a wholly unique CV using well-established techniques like card sorting and analytic methodologies.
Regardless of the creation method you choose, your CV should be tailored to your business and adaptable to future changes. Let's now explore how to initiate the process of building your own custom CV.
One and Done(ish): Commercially licensed CVs
Controlled Vocabularies might be referred to as business glossaries or thesauri depending on context, but the essential idea is developing a well-considered, tested, and verified list of metadata values that can be added to assets.
It can be a lot of work to create a CV, especially if your business has a lot of moving parts or is unique in some way. The sheer volume of terms, definitions, and relationships to establish necessitates careful consideration and coordination across various departments. Developing a CV is definitely a substantial undertaking.
Fortunately, a knowledge management marketplace exists with pre-designed solutions catered to diverse industries and niches. The AP News Taxonomy Terms, for example, is a licensed CV developed and continuously updated to align with the distinct information management needs of the news industry. This standardized vocabulary ensures consistency in how news topics, events, and themes are described and tagged, streamlining the delivery of relevant articles and content. Another example, the WAND Sales and Marketing Glossary, includes terms and definitions tailored for businesses navigating the complex landscape of sales, marketing, and customer engagement.
By identifying and adopting prebuilt industry-specific CVs, your organization can speed up metadata enrichment and enhance search accuracy. By leveraging commercially licensed CVs, you can bypass the initial groundwork and jump-start your metadata strategy. Furthermore, these pre-existing CVs often come with the advantage of continuous governance, ensuring that the vocabulary remains relevant and up-to-date over time.
While commercially licensed CVs offer significant advantages, it's important to recognize the associated limitations. These pre-packaged solutions typically come at a cost, involving licensing fees that need to be factored into your budget. Moreover, while they provide a solid foundation, they might not align precisely with your unique business requirements. Each organization has its own distinct products, services, and terminology that might not be fully covered by a pre-existing CV. It's crucial to weigh the convenience of a pre-designed CV against costs and the need for customization and tailoring to your specific business context.
A Little of This, a Pinch of That: Mix and Match CVs
If a commercial CV isn’t a good fit for your business, it’s also possible to develop a customized CV based on mixing and matching terms from other freely available glossaries. This will require more effort to piece together and govern, but has the advantage of being free and more customizable.
There are many specialized glossaries curated by professional associations, industry-specific groups, or academic institutions, focusing on particular areas of knowledge. The American Folklore Society Ethnographic Thesaurus, for instance, provides an exhaustive array of terms to capture cultural heritage and anthropological concepts. Homosaurus caters to the LGBTQ+ community, encompassing a wide range of DEIA terms associated with gender, sexuality, and identity. Movielabs covers terminology related to film production and delivery systems.
A mix-and-match approach to CV development allows you to curate terms that best describe your business’ unique products, services, and operations, while still saving the effort of starting from scratch. Specialized glossaries are often crafted by experts in their respective fields, ensuring a high degree of accuracy and relevance. And of course, utilizing free resources can be beneficial to your budget.
On the other hand, freely available glossaries potentially lack consistent governance, leading to potential inaccuracies or outdated terminology. Furthermore, with so many specialized glossaries available, it can be tough to narrow down the most appropriate ones for your enterprise. Finally, the cost savings from utilizing free glossaries can be offset by the extra time and effort required to sift through various sources, extract relevant terms, and merge them into a cohesive CV.
Fully Customized Controlled Vocabularies
Obviously, creating a custom CV from scratch requires a lot of work – however, you can be guaranteed that your terms are completely relevant to your business, achieving greater efficiency for end users. Especially if you have a unique business case, building and managing your own CV can be cost-effective in the long run. But what does this process look like?
User-Generated Metadata
The user-generated metadata already in your system can serve as a valuable foundation for constructing a fully customized CV, providing valuable insights into how your users think about the data they use. By analyzing existing metadata, you can identify the recurring terms, synonyms, and concepts significant to your business. This approach aligns the CV with users' natural language, streamlining vocabulary development and leveraging collective intelligence to shape an appropriate CV. These concepts can be further refined in interactive card-sorting workshops.
Card Sorting Workshops
Gathering stakeholders for a collaborative card-sorting workshop is a time-tested way to develop a custom CV. In these sessions, participants generate, organize, and categorize terms that depict your business's offerings using index cards, stickies, or online tools. Vocabulary terms are grouped to reveal relationships between concepts. Involving stakeholders with domain knowledge enriches the resulting CV with diverse perspectives, enhancing its useability.
Iterate, Refine, Govern
Developing a fully customized CV isn't the end of the process—refining, maintaining, and governing a relevant CV is an iterative process, involving user testing, stress testing, and ongoing term and definition improvement. These iterative processes refine terms and definitions, aligning them with industry trends, user preferences, and business goals. Ongoing governance ensures a CV that evolves with your organization, consistently benefiting users and operations.
All Roads Lead to Better Searching
In data management, a well-constructed CV can guide your users through a metadata model, ensuring that your data remains organized and discoverable. Sorting and browsing your data is much easier when a CV is implemented, making it even easier for users to find what they need. Whether you choose a pre-built commercial solution tailored to your industry, combine terms from various free glossaries to create a unique vocabulary, or craft a fully customized CV, the key goal is always to align the CV with your business objectives and user base. The effort invested in curating a CV can lead to enhanced data search, streamlined workflows, and ultimately, a more efficient and effective business operation. So, start charting your path to better data management today and harness the power of a CV to elevate your business to new heights.