Optimizes a Pipeline DAG, with auto-caching
This trait provides methods to chain an object with Estimators, LabelEstimators, and other Chainables to construct Pipelines.
An estimator has a fitRDD
method which takes an input and emits a Transformer
This is the result of fitting a Pipeline.
This transformer performs a no-op on its input.
A LabelEstimator has a fitRDDs
method which takes input data and input labels,
and emits a Transformer.
Node-level optimization, such as selecting a Linear Solver
Represents a node-level optimizable Estimator and its optimization rules
Represents a node-level optimizable LabelEstimator and its optimization rules
Represents a node-level optimizable transformer and its optimization rules
A Pipeline takes data as input (single item or an RDD), and outputs some transformation of that data.
This class is a lazy wrapper around the output of a pipeline that was passed an RDD as input.
This class is a lazy wrapper around the output of a pipeline that was passed a single datum as input.
PipelineEnv is an environment shared by multiple Pipelines, containing variables such as the Prefix state table and the current Pipeline Optimizer.
A PipelineResult is a lazy wrapper around the result of applying a Pipeline to data.
Represents a DAG transformation rule: A transformation from one DAG to a differently-executed but logically equivalent DAG.
Transformers are operators that may be applied both to single input items and to RDDs of input items.
A chain of two Transformers in a row (as a Transformer)
A chain of a Transformer followed by an Estimator (as an Estimator)
A chain of a Transformer followed by a LabelEstimator (as a LabelEstimator)
A mix-in that attaches a weight to a node that represents how often it must iterate over its input.
A mix-in that attaches a weight to an operator that represents how often it must iterate over its input.
The default Pipeline optimizer used when executing pipelines.
A rule to merge equivalent nodes in the DAG.
Extract the prefixes of all Nodes whose state we want to save for reuse by other Pipeline apply and fit calls.
A rule to load any saved state for the PipelineEnv.state prefix state table for nodes we want to consider either loading or saving the results of.
A rule to remove all nodes & sources in a graph that don't lead to any sink, and are effectively unused.