StupidBackoffEstimator

Estimates a Stupid Backoff ngram language model, which was introduced in the following paper:

Brants, Thorsten, et al. "Large language models in machine translation." 2007.

The results are scores indicating likeliness of each ngram, but they are not normalized probabilities. The score for an n-gram is defined recursively:

S(w_i | w_{i - n + 1}^{{i - 1}) :=
if numerator > 0: freq(w_{i - n + 1}}i) / freq(w_{i - n + 1}^{{i - 1})
otherwise: \alpha * S(w_i | w_{i - n + 2}}{i - 1})

S(w_i) := freq(w_i) / N, where N is the total number of tokens in training corpus.

unigramCounts: the pre-computed unigram counts of the training corpus
alpha: hyperparameter that gets multiplied once per backoff

Linear Supertypes

Product, Equals, Estimator[(NGram[T], Int), (NGram[T], Double)], EstimatorOperator, Serializable, Serializable, Operator, AnyRef, Any

Instance Constructors

new StupidBackoffEstimator(unigramCounts: Map[T, Int], alpha: Double = 0.4)(implicit arg0: ClassTag[T])

unigramCounts
the pre-computed unigram counts of the training corpus
alpha
hyperparameter that gets multiplied once per backoff

Value Members

final def !=(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def !=(arg0: Any): Boolean

Definition Classes
Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def ==(arg0: Any): Boolean

Definition Classes
Any
val alpha: Double

hyperparameter that gets multiplied once per backoff
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def execute(deps: Seq[Expression]): TransformerExpression

Definition Classes
EstimatorOperator → Operator
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def fit(data: RDD[(NGram[T], Int)]): StupidBackoffModel[T]

The type-safe method that ML developers need to implement when writing new Estimators.
The type-safe method that ML developers need to implement when writing new Estimators.
data
The estimator's training data.
returns
A new transformer

Definition Classes
StupidBackoffEstimator → Estimator
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def label: String

Definition Classes
Operator
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
val unigramCounts: Map[T, Int]

the pre-computed unigram counts of the training corpus
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def withData(data: PipelineDataset[(NGram[T], Int)]): Pipeline[(NGram[T], Int), (NGram[T], Double)]

Constructs a pipeline that fits this estimator to training data, then applies the resultant transformer to the Pipeline input.
Constructs a pipeline that fits this estimator to training data, then applies the resultant transformer to the Pipeline input.
data
The training data
returns
A pipeline that fits this estimator and applies the result to inputs.

Definition Classes
Estimator
final def withData(data: RDD[(NGram[T], Int)]): Pipeline[(NGram[T], Int), (NGram[T], Double)]

Constructs a pipeline that fits this estimator to training data, then applies the resultant transformer to the Pipeline input.
Constructs a pipeline that fits this estimator to training data, then applies the resultant transformer to the Pipeline input.
data
The training data
returns
A pipeline that fits this estimator and applies the result to inputs.

Definition Classes
Estimator

case class StupidBackoffEstimator[T](unigramCounts: Map[T, Int], alpha: Double = 0.4)(implicit evidence$3: ClassTag[T]) extends Estimator[(NGram[T], Int), (NGram[T], Double)] with Product with Serializable

Instance Constructors

new StupidBackoffEstimator(unigramCounts: Map[T, Int], alpha: Double = 0.4)(implicit arg0: ClassTag[T])

Value Members

final def !=(arg0: AnyRef): Boolean

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: AnyRef): Boolean

final def ==(arg0: Any): Boolean

val alpha: Double

final def asInstanceOf[T0]: T0

def clone(): AnyRef

final def eq(arg0: AnyRef): Boolean

def execute(deps: Seq[Expression]): TransformerExpression

def finalize(): Unit

def fit(data: RDD[(NGram[T], Int)]): StupidBackoffModel[T]

final def getClass(): Class[_]

final def isInstanceOf[T0]: Boolean

def label: String

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

final def synchronized[T0](arg0: ⇒ T0): T0

val unigramCounts: Map[T, Int]

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

final def withData(data: PipelineDataset[(NGram[T], Int)]): Pipeline[(NGram[T], Int), (NGram[T], Double)]

final def withData(data: RDD[(NGram[T], Int)]): Pipeline[(NGram[T], Int), (NGram[T], Double)]

Inherited from Product

Inherited from Equals

Inherited from Estimator[(NGram[T], Int), (NGram[T], Double)]

Inherited from EstimatorOperator

Inherited from Serializable

Inherited from Serializable

Inherited from Operator

Inherited from AnyRef

Inherited from Any

Ungrouped