data_tools.schema.CanonicalPath#

class data_tools.schema.CanonicalPath(origin: str, source: str, event: str, name: str)#

Bases: object

__init__(origin: str, source: str, event: str, name: str)#

Construct a canonical path representing a path to a file in any abstract data source.

Parameters:

origin (str) – Identifies the origin (code) of this data, usually the data pipeline version.
source (str) – The producer of the data pointed to by this canonical path, usually a pipeline stage
event (str) – The event that this data belongs to
name (str) – The name of this data

Methods

`__init__`(origin, source, event, name)	Construct a canonical path representing a path to a file in any abstract data source.
`to_path`()
`to_string`()	Obtain the string representation of this canonical path
`unwrap`()	Decompose this CanonicalPath into its constituent elements.
`unwrap_canonical_path`(canonical_path)	Unwrap a canonical path into its elements.

Attributes

`event`
`name`
`origin`
`source`

property event: str#

property name: str#

property origin: str#

property source: str#

to_path() → Path#

to_string() → str#: Obtain the string representation of this canonical path

unwrap() → List[str]#: Decompose this CanonicalPath into its constituent elements. Equivalent to os.path.split.

static unwrap_canonical_path(canonical_path: str) → List[str]#

Unwrap a canonical path into its elements.

For example, “pipeline_2024_11_01/ingest/TotalPackVoltage” would be unwrapped to [“pipeline_2024_11_01”, “ingest”, “TotalPackVoltage”].

The first element should always be a reference to the origin (code) that produced this data. The second element should always refer to the stage (processing step) that produced this data. The last element should always be the name of this data.

Parameters:: canonical_path – The path to be decomposed
Returns:: A List[str] of path elements