Understanding a large amount of text by modeling & visualizing topics
Architecture
Setup
-
In the S3 console, create a bucket in the US East (Virginia) region (region code
us-east-1
). Use a unique name, e.g.large-text-understanding-{username}
-
Enable CORS on the bucket using the below policy
<?xml version="1.0" encoding="UTF-8"?> <CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> <CORSRule> <AllowedOrigin>*</AllowedOrigin> <AllowedMethod>GET</AllowedMethod> <MaxAgeSeconds>3000</MaxAgeSeconds> <ExposeHeader>x-amz-server-side-encryption</ExposeHeader> <ExposeHeader>x-amz-request-id</ExposeHeader> <ExposeHeader>x-amz-id-2</ExposeHeader> <AllowedHeader>*</AllowedHeader> </CORSRule> </CORSConfiguration>
-
Launch cloudformation stack:
Follow tutorial in Jupyter notebook
-
Go to SageMaker console and open Jupyter notebook instance once it's ready
-
Navigate into
topic-modeling-visualizations/
and openTopic Modeling Tutorial.ipynb
-
Follow steps detailed in the notebook
Use the webapp to explore topics and documents
follow the instructions in the notebook to launch the webapp
Topic view
Document view
Clean up
- If you created a cloud9 environment, delete it
- Delete the
large-text-understanding
CloudFormation stack - Delete the s3 bucket you created