AI ZTA Journey

Why data tagging?

Data tagging provides an identity to your data by associating it with metadata making it much easier to find and secure.

The New Oil

With the volume of data organizations store, a strategy is needed to tag and organize data efficiently.

Below are a few data tagging models that most organizations follow:

Hierarchical Model

Organize tags into a hierarchical model, with broader categories at the top and specific tags at the bottom

Flat Model

Each tag is equally important and no inherent relationship between tags

Segment model

Involves tagging data based on segments

Jargon model

Jargon recognizable within an organization or department that can be used for tagging

Data tagging can be classified into different types based on the format of the data being tagged. Below is an example of different types and each of the types can be further classified based on functionality.

NER – identifies entitles (names, places and objects in a body of text)
POS – associating words in a sentence
2D bounding – drawing a boundary around an object in order to make it recognizable
Semantic Segmentation – tagging every individual pixel in an image

Recommended Data Tagging/Classification Journey steps:

Identify data types/usage needs

Data type: integers (whole numbers); floating-point numbers (3.14); strings (text…ex: “Hello World’); Boolean (true/false); character (single letters); or array (collection of data)

Gather input from relevant stakeholders

A well-defined nomenclature helps employees navigate and retrieve files
Must be recognizable to the end user
Use keywords, departments, projects, managers, team or other relevant identifiers

Construct a model

Data tagging model gives structure to data and contributes towards data classification
Please refer to the next page for the 4 different data tagging models

Conduct usability evaluations

Take into account ease of accessibility and time spent retrieving files

Automate data tagging process

Tagging vs. Classification

Tagging is the labeling of data based on meta details (intends to improve accessibility and organization)
Classification is done based on the level of sensitivity of file contents, intends to secure sensitive data and can be used to flag sensitive data by data loss prevention tools

Red Hat Solutions

OCP Secrets

Red Hat’s data classification feature

Use secrets to securely store sensitive data in conjunction with OCP RBAC (this is crucial for managing data governance and compliance within a containerized environment

OCP “Red Hat Discovery” and “Node Feature Discovery Operator”

Data discovery tools

Collects and scans information about OCP cluster including details about deployed applications, nodes, and system configurations

The discovery operator automatically detects hardware features on cluster nodes, adding relevant labels for easier management

OCP “Label Studio”

Labels diverse data types like images, text, audio and video, enabling users to create bounding boxes, polygons and other annotations