hyperparameter that gets multiplied once per backoff
The type-safe method that ML developers need to implement when writing new Estimators.
The type-safe method that ML developers need to implement when writing new Estimators.
The estimator's training data.
A new transformer
the pre-computed unigram counts of the training corpus
Constructs a pipeline that fits this estimator to training data, then applies the resultant transformer to the Pipeline input.
Constructs a pipeline that fits this estimator to training data, then applies the resultant transformer to the Pipeline input.
The training data
A pipeline that fits this estimator and applies the result to inputs.
Constructs a pipeline that fits this estimator to training data, then applies the resultant transformer to the Pipeline input.
Constructs a pipeline that fits this estimator to training data, then applies the resultant transformer to the Pipeline input.
The training data
A pipeline that fits this estimator and applies the result to inputs.
Estimates a Stupid Backoff ngram language model, which was introduced in the following paper:
Brants, Thorsten, et al. "Large language models in machine translation." 2007.
The results are scores indicating likeliness of each ngram, but they are not normalized probabilities. The score for an n-gram is defined recursively:
S(w_i | w_{i - n + 1}{i - 1}) := if numerator > 0: freq(w_{i - n + 1}i) / freq(w_{i - n + 1}{i - 1}) otherwise: \alpha * S(w_i | w_{i - n + 2}{i - 1})
S(w_i) := freq(w_i) / N, where N is the total number of tokens in training corpus.
the pre-computed unigram counts of the training corpus
hyperparameter that gets multiplied once per backoff