Develop Data Catalog

(Activity) for Tier: Data Management

PURPOSE

Data Cataloging is the process to create a collection of common metadata on datasets to enable efficient management, search, inventory, and fitness assessment of the data within the entire enterprise. A Data Catalog is a collection of metadata combined with data management and search tools that helps users to find data in the enterprise, inventory available data, and evaluate the fitness of data. The fitness of data refers to the breadth and depth of data for intended uses. A Data Catalog is essential because it synthesizes all the details about an organization’s data assets by organizing them into a common metadata format. The Data Catalog also shows where all the enterprise data entities are located. A successful Data Catalog creates a single source of truth at an enterprise level.

WHEN

Prior to the start of a Sprint that utilizes new data.

PARTICIPATING ROLES

INPUTS

ENTRY CRITERIA

  • The data dictionary communicates the structure, the content of the data, and provides meaningful descriptions for individually named data objects.
  • The Enterprise Data Catalog provides the structure and format constraints for the data dictionary.

SUB-ACTIVITIES

  1. Collect Terms

    • Compile a list of terms applicable to the dataset.
    • Deconflict that list of terms with those already existing in the Data Catalog to eliminate redundancy and ensure terms have a proper context.
    • Align the terms in the data dictionary as necessary on the applicable fields.
  2. Define Terms

    • Create definitions for the terms that accurately depict what the dataset provides to the project, product, or enterprise.
  3. Get Alignment

    • Hold a definition decision board amongst teams in the Configuration Control Board (CCB). Reach agreement on definitions across teams or adoption of new definitions where necessary.
    • Have each Product Owner sign off on the use of the definition. Conceptually, these terms are independent of any single system or data source and the CCB will ensure that compliance.
  4. Ensure Automation

    • As the database grows over time the data dictionary and enterprise Data Catalog should have the ability to track that growth along with the relationship the database has to other objects in the Data Catalog.
  5. Ensure Access

    • The Product Owner will determine role-based access to the data dictionary within the Data Catalog for the enterprise with approval from the CCB.

OUTPUTS

EXIT CRITERIA

  • The data cataloging activity is complete provided that all metadata terms for the subject data source has a definition validated by the CCB.
  • Data dictionary is searchable within the Enterprise Data Catalog; and the data lineage has associated mappings in the Enterprise Data Catalog.

NEXT ACTIVITY

SEE ALSO

Process Guidance Version: 10.4