Dataset Transforms

aka. derived datasets.

(experimental)

Intake allows for the definition of data sources which take as their input another source in the same directory, so that you have the opportunity to present processing to the user of the catalog.

Example

This example is taken from the Intake test suite.

Text to come, watch this space…

API

intake.source.derived.DerivedSource(*args, …)

Base source deriving from another source in the same catalog

intake.source.derived.GenericTransform(…)

intake.source.derived.DataFrameTransform(…)

Transform where the input and output are both Dask-compatible dataframes

intake.source.derived.Columns(*args, **kwargs)

Simple dataframe transform to pick columns

class intake.source.derived.DerivedSource(*args, **kwargs)

Base source deriving from another source in the same catalog

Target picking and parameter validation are performed here, but you probably want to subclass from one of the more specific classes like DataFrameTransform.

class intake.source.derived.GenericTransform(*args, **kwargs)
optional_params = {'allow_dask': True}

Perform an arbitrary function to transform an input

transform: function to perform transform

function(container_object) -> output, or a fully-qualified dotted string pointing to it

transform_params: dict

The keys are names of kwargs to pass to the transform function. Values are either concrete values to pass; or param objects which can be made into widgets (but must have a default value) - or a spec to be able to make these objects.

allow_dask: bool (optional, default True)

Whether to_dask() is expected to work, which will in turn call the target’s to_dasK()

read()

Load entire dataset into a container and return it

to_dask()

Return a dask container for this data source

class intake.source.derived.Columns(*args, **kwargs)

Simple dataframe transform to pick columns

Given as an example of how to make a specific dataframe transform. Note that you could use DataFrameTransform directly, by writing a function to choose the columns instead of a method as here.