from datasets import load_dataset_builder
ds_builder = load_dataset_builder("imdb")
基本資訊
可以得知這是一個電影的資料集, 包含正向與負向的標籤。
print(ds_builder.info.description)
# Large Movie Review Dataset.
print(ds_builder.info.features)
# This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.
{'text': Value(dtype='string', id=None), 'label': ClassLabel(names=['neg', 'pos'], id=None)}
ds1 = ds.filter(lambda x: 'U.S' in x['text'] and len(x['text']) < 500)
ds1[:3]
{
'text': [
'It is not un-common to see U.S. re-makes of foreign movies that fall flat on their face, but here is the flip side!!! This is an awful re-make of the U.S. movie "Wide Awake" by the British!<br /><br />"Wide Awake" is strange but entertaining and funny! "Liam" on the other hand is just strange. I must give credit to "Liam" for one thing, and that is making it clear that I made the right choice in changing my religion!',
'I saw this movie on Comedy Central a few times. This movie was pretty good. It\\'s an interesting adventure with the life of Sunny Davis, who is arranged to marry the king of Ohtar, so that the U.S. can get an army base there to balance power in the Middle East. Some good jokes, including "Sunnygate." I also just loved the ending theme. It gave me great political spirit. Ten out of ten was my rating for this movie.',
'"Antwone Fisher" tells of a young black U.S. Navy enlisted man and product of childhood abuse and neglect (Luke) whose hostility toward others gets him a stint with the base shrink (Washington) leading to introspection, self appraisal, and a return to his roots. Pat, sanitized, and sentimental, "Antwone Fisher" is a solid feel-good flick about the reconciliation of past regrets and closure. Good old Hollywood style entertainment family values entertainment with just a hint of corn. (B)'],
'label': [0, 1, 1]
}