Chains a label estimator onto the end of this pipeline, producing a new pipeline.
Chains a label estimator onto the end of this pipeline, producing a new pipeline. If this pipeline has already been executed, it will not need to be fit again.
The estimator to chain onto the end of this pipeline
The training data to use (the estimator will be fit on the result of passing this data through the current pipeline)
The labels to use when fitting the LabelEstimator. Must be zippable with the training data.
Chains a label estimator onto the end of this pipeline, producing a new pipeline.
Chains a label estimator onto the end of this pipeline, producing a new pipeline. If this pipeline has already been executed, it will not need to be fit again.
The estimator to chain onto the end of this pipeline
The training data to use (the estimator will be fit on the result of passing this data through the current pipeline)
The labels to use when fitting the LabelEstimator. Must be zippable with the training data.
Chains a label estimator onto the end of this pipeline, producing a new pipeline.
Chains a label estimator onto the end of this pipeline, producing a new pipeline. If this pipeline has already been executed, it will not need to be fit again.
The estimator to chain onto the end of this pipeline
The training data to use (the estimator will be fit on the result of passing this data through the current pipeline)
The labels to use when fitting the LabelEstimator. Must be zippable with the training data.
Chains a label estimator onto the end of this pipeline, producing a new pipeline.
Chains a label estimator onto the end of this pipeline, producing a new pipeline. If this pipeline has already been executed, it will not need to be fit again.
The estimator to chain onto the end of this pipeline
The training data to use (the estimator will be fit on the result of passing this data through the current pipeline)
The labels to use when fitting the LabelEstimator. Must be zippable with the training data.
Chains an estimator onto the end of this pipeline, producing a new pipeline.
Chains an estimator onto the end of this pipeline, producing a new pipeline. If this pipeline has already been executed, it will not need to be fit again.
The estimator to chain onto the end of this pipeline
The training data to use (the estimator will be fit on the result of passing this data through the current pipeline)
Chains an estimator onto the end of this pipeline, producing a new pipeline.
Chains an estimator onto the end of this pipeline, producing a new pipeline. If this pipeline has already been executed, it will not need to be fit again.
The estimator to chain onto the end of this pipeline
The training data to use (the estimator will be fit on the result of passing this data through the current pipeline)
Chains a pipeline onto the end of this one, producing a new pipeline.
Chains a pipeline onto the end of this one, producing a new pipeline. If either this pipeline or the following has already been executed, it will not need to be fit again.
the pipeline to chain
The application of this Transformer to a single input item.
The application of this Transformer to a single input item. This method MUST be overridden by ML developers.
The output value
The application of this Transformer to an RDD of input items.
The application of this Transformer to an RDD of input items. This method may optionally be overridden by ML developers.
The bulk RDD input to pass into this transformer
The bulk RDD output for the given input
Finalize a hash to incorporate the length and make sure all bits avalanche.
Mix in a block of data into an intermediate hash value.
May optionally be used as the last mixing step.
May optionally be used as the last mixing step. Is a little bit faster than mix, as it does no further mixing of the resulting hash. For the last element this is not necessary as the hash is thoroughly mixed during finalization anyway.
The desired feature space to convert to using the hashing trick.
valid ngram orders, must be consecutive positive integers
A method that converts this object into a Pipeline.
A method that converts this object into a Pipeline. Must be implemented by anything that extends Chainable.
Converts the n-grams of a sequence of terms to a sparse vector representing their frequencies, using the hashing trick: https://en.wikipedia.org/wiki/Feature_hashing
It computes a rolling MurmurHash3 instead of fully constructing the n-grams, making it more efficient than using NGramsFeaturizer followed by HashingTF, although it should return the exact same feature vector. The MurmurHash3 methods are copied from scala.util.hashing.MurmurHash3
Individual terms are hashed using Scala's
.##
method. We may want to convert to MurmurHash3 for strings, as discussed for Spark's ML Pipelines in https://issues.apache.org/jira/browse/SPARK-10574valid ngram orders, must be consecutive positive integers
The desired feature space to convert to using the hashing trick.