A Recurrent Neural Network (RNN) to generate a new Simpsons TV script for a scene at Moe's Tavern using part of the Simpsons dataset of scripts from 27 seasons
Open and view the Project using the .zip
file provided or at my Github Repository
The project has been hosted on Github
The starter project can be downloaded from here
The project will be evaluated by a Udacity code reviewer according to the project rubric
You would require the following tools to develop and run the project:
- Start by installing python and anaconda
- Create a new conda environment
conda create --name generate-scripts python=3
- Enter your new environment:
- Mac/Linux:
>> source activate generate-scripts
- Windows:
>> conda activate generate-scripts
- Mac/Linux:
- Install the requirements using the following command:
pip install -r requirements.txt
If you don't have Conda, a requirements.txt
file is provided to install all of the necessary packages using pip.
Open CLI in the root directory of the project. Run the following command:
python -m venv --system-site-packages .\venv # Creates a virtual environment
.\venv\Scripts\activate # Activates the environment
pip install -r requirements.txt # Installs the required packages
pip list # Show packages installed within the virtual environment
To run the project:
- Activate the Conda or Python virtual environment as mentioned above
- Start the Jupyter Notebook by running the following command:
Open your browser and visit localhost:8888 (or the port indicated in the terminal), and you should see all of the contents of the project in
jupyter notebook
dlnd_tv_script_generation.ipynb
notebook - After completing the development, press the
play
▶️ icon to start the execution of cells. The output will be visible right below each respective cells.
The notebook contains the following functions and configurations:
create_lookup_tables
to create two dictionaries:vocab_to_int
andint_to_vocab
token_lookup
returns a dictionary that can tokenizes the provided symbols.get_inputs
function to create TF Placeholders for the Neural Network with Input, Target and Learning Rate placeholders.- Enough
epochs
to get a near minimum in the training loss but there is no real upper limit for this. Just need to make sure the training loss is low and not improving much with more training. Batch size
should be large enough to train efficiently, but small enough to fit the data in memory. No real “best” value here, it usually depends on GPU memory.- Size of the RNN cells
(number of units in the hidden layers)
should be large enough to fit the data well. No real “best” value. - The sequence length
(seq_length)
here should be about the size of the length of sentences you want to generate and should match the structure of the data.
Follow the instructions in the notebook; they will lead you through the project. You'll be editing the dlnd_tv_script_generation.ipynb
file.
Once you're done with the app, stop it gracefully using the following command:
- Select
File -> Close and Halt
inside jupyter notebook - Press
Ctrl+c
in the cli
>> conda deactivate generate-scripts # Deactivate the environment
>> conda remove --name generate-scripts --all # Delete the environment
>> deactivate # Deactivate the environment
- RNN and LSTM Lecture by Andrej Karpathy
- Word Embeddings by Siwei Lai, Kang Liu, Liheng Xu, Jun Zhao
- A recent Medium article that might help in understanding a similar project
- Tensorflow