Understanding a large amount of text by modeling & visualizing topics

Architecture

Setup

In the S3 console, create a bucket in the US East (Virginia) region (region code us-east-1). Use a unique name, e.g. large-text-understanding-{username}

Enable CORS on the bucket using the below policy

<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
    <AllowedOrigin>*</AllowedOrigin>
    <AllowedMethod>GET</AllowedMethod>
    <MaxAgeSeconds>3000</MaxAgeSeconds>
    <ExposeHeader>x-amz-server-side-encryption</ExposeHeader>
    <ExposeHeader>x-amz-request-id</ExposeHeader>
    <ExposeHeader>x-amz-id-2</ExposeHeader>
    <AllowedHeader>*</AllowedHeader>
</CORSRule>
</CORSConfiguration>

Launch cloudformation stack:

Follow tutorial in Jupyter notebook

Go to SageMaker console and open Jupyter notebook instance once it's ready
Navigate into topic-modeling-visualizations/ and open Topic Modeling Tutorial.ipynb
Follow steps detailed in the notebook

Use the webapp to explore topics and documents

follow the instructions in the notebook to launch the webapp

Topic view

Document view

Clean up

If you created a cloud9 environment, delete it
Delete the large-text-understanding CloudFormation stack
Delete the s3 bucket you created

angelarw/topic-modeling-visualizations

Understanding a large amount of text by modeling & visualizing topics

Architecture

Setup

Follow tutorial in Jupyter notebook

Use the webapp to explore topics and documents

Topic view

Document view

Clean up