Plan Data Quality

(Activity) for Tier: Data Management

PURPOSE

For data that is to participate in data quality initiatives, create or update a data quality plan. The primary purpose of creating a data quality plan is to aid the Data Steward in conducting a data quality assessment. The goal of the assessment is to determine if data meets identified quality objectives. Data that does not meet quality objectives is cleansed in a later activity. This activity is concerned with creating or updating the data quality plan. Assessments of the data and data cleansing activities are separate activities.

Note

Keep the stakeholders use of the data in mind. Having a goal for strict enforcement of data for its own sake may not be helpful.

WHEN

New features are requested that have a data component, improvement opportunities are identified in the Conduct Data Quality Assessment, other activities or issues with the data are identified.

PARTICIPATING ROLES

ENTRY CRITERIA

  • It is determined that a set of data is to participate in data quality initiatives.

SUB-ACTIVITIES

  1. Create a Data Quality Plan

    • Once a dataset has been selected to participate in data quality initiatives, create and store the plan using an organizationally approved configuration management tool such as Microsoft Word, a Wiki, or other and document within team procedures.

      • Write the data quality objective(s) of the assessment. It may be helpful to include or reference the expected use case(s) of the data.
      • Determine the tools or methods used for the data quality assessment so that the data steward can conduct a quality assessment.
      • Write the criteria so a Data Steward can determine if the data meets quality objectives, mostly meets the data quality objectives, or does not meet the data quality objectives.
      • Data quality objectives are to support the intended use of the data.
      • If there are expected exceptions they should be listed. Expected exceptions are exceptions to the rule(s) that do not need to be reported. Aka, known false positives. This prevents unnecessary issue work items.

      Note

      Be aware of timing issues, for example updating a plan before data is available could result in a negative quality assessment.

    • Examples:

      • Are there expected ranges for data?
      • Is there orphaned data?
      • Completeness- What data is missing or unusable
      • Conformity- What data is stored in non-standard format
      • Consistency- What data value gives conflicting information
      • Accuracy- What data is incorrect or out of range
      • Duplicates- What data records are repeated
      • Integrity- What data is missing
      • Compliance concerns?

      Note

      Avoid providing excessive documentation for how to use the tooling. Focus on the data and the quality and less on the tooling.

      Note

      Consider leveraging the Data Catalog for storing links to the plan and the frequency of conducting the audits/assessments.

  2. Update the Data Quality Plan

    • As architecture evolves; as new requirements are identified; as assessments are conducted and results collected, changes in the plan may be required.
    • Update the plan as necessary to meet quality objectives.
  3. Frequency

    • Document how often an assessment is conducted by the data steward, or record what event triggers the need for conducting an assessment by the data steward and document this within your team procedure.

    Note

    The frequency of the plan may have an impact on its suitability in data science experiments as well as ensuring audits are conducted at expected time intervals.

    Note

    The frequency does not need to be stored in the data quality plan document. If a team is using a data catalog that supports custom attributes or custom metadata, there are organizational benefits to storing the frequency as a query-able field.

EXIT CRITERIA

  • The plan is repeatable, supports the intended use of the data.

SEE ALSO

Process Guidance Version: 10.4