Developer GuidesData Governance

Data Products and Governance

Modeling business data as governed products with ownership, lifecycles, and automated governance through the matrix.

Data governance in Poliglot starts with a simple idea: the data in your organization is the state of your business operating system. A data product is a logical grouping of that state, owned and managed by a specific entity, with its own lifecycle, access policies, and quality constraints. This guide covers how to model data products, how the matrix approach operationalizes governance, and how governance actions become increasingly automated as more workloads incorporate them.

For the full DPROD type reference, see the Engine Reference. The data product model is based on the DPROD standard (extending W3C DCAT). The data product model is based on the DPROD standard (extending W3C DCAT).

What Is a Data Product

A data product is not an API. It's a governed unit of business data. It answers: "What data does this part of the organization manage, who is responsible for it, and what are the rules?"

Consider a company with an HR system, a payroll system, and a benefits platform. All three deal with employee data, but they serve different purposes with different owners:

  • Employee Records: owned by HR, contains personal info, job titles, org structure. Classified as confidential.
  • Compensation Data: owned by Finance, contains salary, bonuses, equity. Classified as restricted.
  • Benefits Enrollment: owned by Benefits team, contains plan selections, dependents. Classified as confidential.

These are three data products even though they might all originate from the same HRIS. The boundaries follow organizational ownership and governance requirements, not system architecture.

What Determines the Boundaries

The organization decides. There's no fixed rule. Some guidelines:

  • Domain ownership: who is responsible for this data? Different owners typically mean different data products.
  • Sensitivity classification: data with different sensitivity levels should be separate products so access policies can be applied independently.
  • Lifecycle: data that changes at different rates or has different retention requirements benefits from separate governance.
  • Consumption patterns: if different consumers need different subsets with different access rules, splitting makes governance simpler.

A data product can be sparse (a single dataset from one service) or dense (multiple datasets spanning several systems). Data products can also be shared across matrices. A "Customer Records" data product owned by the CRM team might be consumed by the sales matrix, the support matrix, and the billing matrix.

The Data Product Model

Four concepts form the chain from governance to technical access:

Data Product (dprod:DataProduct): the governance unit. Declares purpose, ownership, sensitivity classification, and what datasets it exposes.

Dataset (dcat:Dataset): a specific collection of data within the product. Declares what schema the data conforms to and how it's distributed.

Distribution (dcat:Distribution): a specific representation of a dataset. The same dataset could be available as JSON over REST, CSV via export, or XML via SOAP.

Data Service (rars-svc:RemoteDataService): the technical endpoint. This is your actual API that RARS calls at runtime.

When to Use the Full Chain

Use all four layers when you need formal governance: sensitivity classification, schema conformance declarations, multiple representations of the same data.

hr:EmployeeDataProduct
    a dprod:DataProduct ;
    rdfs:label "Employee Records" ;
    dprod:purpose "Employee records for HR operations and organizational reporting." ;
    dprod:informationSensitivityClassification dprod:ConfidentialInformation ;
    dprod:outputDataset hr:EmployeesDataset ;
    dprod:outputPort hr:EmployeeAPI .

hr:EmployeesDataset
    a dcat:Dataset ;
    rdfs:label "Employees" ;
    dct:conformsTo hr:Employee ;
    dcat:distribution hr:EmployeesJsonDistribution .

hr:EmployeesJsonDistribution
    a dcat:Distribution ;
    rdfs:label "Employees (JSON)" ;
    dct:format <https://www.iana.org/assignments/media-types/application/json> ;
    dcat:accessService hr:EmployeeAPI .

hr:EmployeeAPI
    a rars-svc:RemoteDataService ;
    rdfs:label "HR System API" ;
    rars-svc:endpointUrl "https://hr-system.example.com/api/employees"^^xsd:anyURI ;
    rars-svc:protocol rars-http:HTTP .

When to Keep It Simple

Most integrations don't need the full chain. If you have one API serving one kind of data with no complex governance requirements, go straight from data product to service:

tasks:TaskDataProduct
    a dprod:DataProduct ;
    rdfs:label "Task Management" ;
    dct:description "Task records managed through the project management API." ;
    dprod:purpose "Task tracking, assignment, and status management." ;
    dprod:outputPort tasks:TaskAPI .

tasks:TaskAPI
    a rars-svc:RemoteDataService ;
    rdfs:label "Task API" ;
    rars-svc:endpointUrl "https://your-api.example.com/api/v1"^^xsd:anyURI ;
    rars-svc:protocol rars-http:HTTP .

This is enough for most integrations. Add datasets and distributions later if governance requirements grow.

How Governance Gets Operationalized

The power of modeling data products in the matrix is that governance becomes operational. Instead of governance being a separate process (spreadsheets of data owners, manual access reviews, periodic audits), it's embedded in the same system that executes your business logic.

IAM Controls Access

Access to data flows through the classes and actions that interact with data products. When you define an action like hr:ListEmployees that calls the Employee API, IAM policies control who (which agents, which roles) can invoke that action. The governance is on the action and the classes it operates on, not directly on the data product:

# Only HR agents can list employees
hr:EmployeeAccessPolicy
    a rars-iam:ResourcePolicy ;
    rars-iam:effect rars-iam:Allow ;
    rars-iam:action rars-act:InvokeAction ;
    rars-iam:resource hr:ListEmployees ;
    rars-iam:role hr:HRRole .

# Only HR agents can read employee data
hr:EmployeeReadPolicy
    a rars-iam:ResourcePolicy ;
    rars-iam:effect rars-iam:Allow ;
    rars-iam:action rars-os:ReadStatement ;
    rars-iam:resource hr:Employee ;
    rars-iam:role hr:HRRole .

As more matrices integrate with the Employee data product, each one's access is governed by the same IAM model. A sales matrix that needs to look up employee names gets a read-only policy on a subset of properties. The HR matrix gets full access. No manual access review needed: the policies are in the spec, validated at assembly, and enforced at runtime.

Constraints Validate Data Quality

SHACL constraints on the classes that datasets conform to define what valid data looks like. When data flows in from an external service through a response mapping, the mapped resources are validated against the same shapes as any other data in the graph:

hr:EmployeeShape
    a sh:NodeShape ;
    sh:targetClass hr:Employee ;
    sh:property [
        sh:path hr:email ;
        sh:datatype xsd:string ;
        sh:minCount 1 ;
        sh:pattern "^.+@.+\\..+$" ;
        sh:severity sh:Violation ;
        sh:message "Every employee must have a valid email address."
    ] ;
    sh:property [
        sh:path hr:department ;
        sh:class hr:Department ;
        sh:minCount 1 ;
        sh:severity sh:Warning ;
        sh:message "Employees without a department assignment may not appear in org reports."
    ] .

Data quality enforcement is automatic. Every time employee data enters the graph (from a sync action, a manual update, or an agentic workflow), these constraints apply. Bad data gets flagged. RARS can self-correct violations based on the constraint messages.

Provenance Tracks Lineage

Every piece of data that enters the graph through an action is tracked as an observation with provenance: which agent produced it, which action was invoked, when it happened, and what process it was part of. This means data lineage is automatic for any data that flows through the matrix.

If an auditor asks "where did this employee record come from?", the provenance model answers: it was produced by the HR Agent, via the hr:SyncEmployees action, at this timestamp, as part of this scheduled workflow.

Automated Governance at Scale

The mechanisms above (IAM for access, constraints for quality, provenance for lineage) are all enforced automatically as part of normal matrix operation. The more workloads you run through the matrix, the more governance coverage you get without additional work.

For governance operations that aren't built in (data classification audits, retention policy enforcement, PII scanning), you can build actions that automate these. A dprod:AuditDataProduct action could scan datasets for unclassified fields. A dprod:EnforceRetention action could identify and flag data past its retention period. The platform provides the mechanisms; the specific governance workflows are up to you.

Shared Data Products

Data products are not locked to a single matrix. Multiple matrices can reference the same data product when they need access to the same governed data through different lenses.

The CRM matrix owns the Customer data product and provides full CRUD actions. The billing matrix imports the CRM matrix and uses the Customer data product in read-only mode through its own actions. The support matrix does the same. Each matrix has its own IAM policies controlling what it can do with the shared data, but the data product definition lives in one place.

# In the billing matrix: import CRM and reference its data product
billing:
    a rars-mtx:Matrix ;
    rars-mtx:imports <https://acme.com/spec/crm#> .

# Billing actions reference the CRM service
billing:GetCustomerBillingIntegration
    a rars-act:ServiceIntegration ;
    rars-act:service crm:CustomerAPI ;  # Shared service from CRM matrix
    ...

Services and Endpoints

The data service is the technical access layer. Multiple actions can share a single service when they hit the same API:

# Both actions use the same service, different request templates
tasks:ListTasksIntegration
    a rars-act:ServiceIntegration ;
    rars-act:service tasks:TaskAPI ;
    rars-act:requestTemplate tasks:ListTasksRequest ;
    ...

tasks:CreateTaskIntegration
    a rars-act:ServiceIntegration ;
    rars-act:service tasks:TaskAPI ;
    rars-act:requestTemplate tasks:CreateTaskRequest ;
    ...

When actions hit genuinely different systems, define separate services. A matrix that integrates both a project API and a notification service would have two services, potentially under two different data products.

Network Access

With Poliglot-hosted contexts, RARS makes outbound HTTP requests over the public internet. Your service must be deployed to a publicly accessible host with a stable URL.

Private networking: Poliglot is building support for customer-deployed engine planes that run inside your own cloud VPC. The Poliglot control plane manages orchestration while the engine plane runs in your infrastructure with direct private network access to your services. Contact sales@poliglot.io for private deployment options.

Authentication

Authentication is configured on the action's service integration (via request templates and secrets), not on the data product model. Data products declare what data exists and who governs it. How RARS authenticates with the service is an operational concern handled by request mapping and secret management.

See Also

On this page