Loads the Amazon Product Reviews dataset for binary classification.
Loads the Amazon Product Reviews dataset for binary classification. Each review is a JSON string with (at least) two fields: "reviewText" and "overAll".
This data loader produces an RDD of labeled reviews.
SparkSession to use (needed for SQL)
Directory of the training data
Lowest value at which to consider a review positive.
A Labeled Dataset that contains the data strings and labels.