Merge pull request #5 from hiepph/text-emojize-example

Browse files

Add an example: Output emoji visualization from a single text input

Files changed (2) hide show

examples/README.md +32 -24
examples/text_emojize.py +63 -0

examples/README.md CHANGED Viewed

@@ -1,31 +1,39 @@
 # torchMoji examples
-## Initialization
-[create_twitter_vocab.py](create_twitter_vocab.py)
-Create a new vocabulary from a tsv file.
-[tokenize_dataset.py](tokenize_dataset.py)
-Tokenize a given dataset using the prebuilt vocabulary.
-[vocab_extension.py](vocab_extension.py)
-Extend the given vocabulary using dataset-specific words.
-[dataset_split.py](dataset_split.py)
 Split a given dataset into training, validation and testing.
-## Use pretrained model/architecture
-[score_texts_emojis.py](score_texts_emojis.py)
-Use torchMoji to score texts for emoji distribution.
-[encode_texts.py](encode_texts.py)
 Use torchMoji to encode the text into 2304-dimensional feature vectors for further modeling/analysis.
 ## Transfer learning
-[finetune_youtube_last.py](finetune_youtube_last.py)
-Finetune the model on the SS-Youtube dataset using the 'last' method.
-[finetune_insults_chain-thaw.py](finetune_insults_chain-thaw.py)
-Finetune the model on the Kaggle insults dataset (from blog post) using the 'chain-thaw' method.
-[finetune_semeval_class-avg_f1.py](finetune_semeval_class-avg_f1.py)
-Finetune the model on the SemeEval emotion dataset using the 'full' method and evaluate using the class average F1 metric.

 # torchMoji examples
+## Initialization
+[create_twitter_vocab.py](create_twitter_vocab.py)
+Create a new vocabulary from a tsv file.
+[tokenize_dataset.py](tokenize_dataset.py)
+Tokenize a given dataset using the prebuilt vocabulary.
+[vocab_extension.py](vocab_extension.py)
+Extend the given vocabulary using dataset-specific words.
+[dataset_split.py](dataset_split.py)
 Split a given dataset into training, validation and testing.
+## Use pretrained model/architecture
+[score_texts_emojis.py](score_texts_emojis.py)
+Use torchMoji to score texts for emoji distribution.
+[text_emojize.py](text_emojize.py)
+Use torchMoji to output emoji visualization from a single text input (mapped from `emoji_overview.png`)
+```sh
+python examples/text_emojize.py --text "I love mom's cooking\!"
+# => I love mom's cooking! 😋 😍 💓 💛 ❤
+```
+[encode_texts.py](encode_texts.py)
 Use torchMoji to encode the text into 2304-dimensional feature vectors for further modeling/analysis.
 ## Transfer learning
+[finetune_youtube_last.py](finetune_youtube_last.py)
+Finetune the model on the SS-Youtube dataset using the 'last' method.
+[finetune_insults_chain-thaw.py](finetune_insults_chain-thaw.py)
+Finetune the model on the Kaggle insults dataset (from blog post) using the 'chain-thaw' method.
+[finetune_semeval_class-avg_f1.py](finetune_semeval_class-avg_f1.py)
+Finetune the model on the SemeEval emotion dataset using the 'full' method and evaluate using the class average F1 metric.

examples/text_emojize.py ADDED Viewed

	@@ -0,0 +1,63 @@

+# -*- coding: utf-8 -*-
+""" Use torchMoji to predict emojis from a single text input
+"""
+from __future__ import print_function, division, unicode_literals
+import example_helper
+import json
+import csv
+import argparse
+import numpy as np
+import emoji
+from torchmoji.sentence_tokenizer import SentenceTokenizer
+from torchmoji.model_def import torchmoji_emojis
+from torchmoji.global_variables import PRETRAINED_PATH, VOCAB_PATH
+# Emoji map in emoji_overview.png
+EMOJIS = ":joy: :unamused: :weary: :sob: :heart_eyes: \
+:pensive: :ok_hand: :blush: :heart: :smirk: \
+:grin: :notes: :flushed: :100: :sleeping: \
+:relieved: :relaxed: :raised_hands: :two_hearts: :expressionless: \
+:sweat_smile: :pray: :confused: :kissing_heart: :heartbeat: \
+:neutral_face: :information_desk_person: :disappointed: :see_no_evil: :tired_face: \
+:v: :sunglasses: :rage: :thumbsup: :cry: \
+:sleepy: :yum: :triumph: :hand: :mask: \
+:clap: :eyes: :gun: :persevere: :smiling_imp: \
+:sweat: :broken_heart: :yellow_heart: :musical_note: :speak_no_evil: \
+:wink: :skull: :confounded: :smile: :stuck_out_tongue_winking_eye: \
+:angry: :no_good: :muscle: :facepunch: :purple_heart: \
+:sparkling_heart: :blue_heart: :grimacing: :sparkles:".split(' ')
+def top_elements(array, k):
+    ind = np.argpartition(array, -k)[-k:]
+    return ind[np.argsort(array[ind])][::-1]
+if __name__ == "__main__":
+    argparser = argparse.ArgumentParser()
+    argparser.add_argument('--text', type=str, required=True, help="Input text to emojize")
+    argparser.add_argument('--maxlen', type=int, default=30, help="Max length of input text")
+    args = argparser.parse_args()
+    # Tokenizing using dictionary
+    with open(VOCAB_PATH, 'r') as f:
+        vocabulary = json.load(f)
+    st = SentenceTokenizer(vocabulary, args.maxlen)
+    # Loading model
+    model = torchmoji_emojis(PRETRAINED_PATH)
+    # Running predictions
+    tokenized, _, _ = st.tokenize_sentences([args.text])
+    # Get sentence probability
+    prob = model(tokenized)[0]
+    # Top emoji id
+    emoji_ids = top_elements(prob, 5)
+    # map to emojis
+    emojis = map(lambda x: EMOJIS[x], emoji_ids)
+    print(emoji.emojize("{} {}".format(args.text,' '.join(emojis)), use_aliases=True))