This repo contains an AWS SAM definition and a sample streamlit app to play your podcast.
- Sagemaker Endpoint running Llama-2-7b-chat (tested on ml.g5.2xlarge)
- AWS CLI
- AWS SAM CLI see https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/install-sam-cli.html
- boto3 installed
pip install boto3
- Note: If you use AWS Cloud9, steps 2,3,and 4 above are already preinstalled.
- Streamlit
pip install streamlit
- Navigate to the SAM folder
cd SAM
- Build the SAM package
sam build --use-container
- Deploy the package.
sam deploy --guided
Give your stack a name, and use the region where your SageMaker Endpoint is deployed. Use the defaults for the rest of the options. - The following resources will be deployed: AWS Step Functions State Machine, AWS S3 Bucket, 7 Lambda functions, various IAM roles and policies.
- Navigate to the deployed state machine and choose New Execution
- Open the included
sampleStepFunctionInput.json
file, update the following fields, and save:
- Replace the
bucket
with the bucket name deployed by the SAM application - Replace
llmEndpoint
with your SageMaker Endpoint Name. - Replace
numDays
with the number of days from today that you want to process from the RSS feed.
- Execute the Step Function from the command line:
aws stepfunctions start-execution --state-machine-arn <YOUR STATE MACHINE ARN> --input "$(jq -R . sampleStepFunctionInput.json --raw-output)"
- Depending on the number of new announcements in the time range that you specified in
numDays
and your SageMaker Endpoint instance type, execution can take anywhere from 5-15 minutes or longer. - Monitor the execution by running the command:
aws stepfunctions describe-execution --execution-arn <YOUR EXECUTION ARN>
. Get the Execution ARN from the output of Step 7. - Once complete, grab the
runID
andbucket name
from the output of Step 9. You will need this if you want to run the Streamlit app below.
You have 2 options:
- Download the podcast.mp3 file from the s3 bucket
- Run the included Streamlit app
sampleApp.py
To run the Streamlit app, do the following:
- In your terminal, open
sampleApp.py
- Replace
bucketName
with your s3 bucket - Replace
runId
with theUUID
you copied earlier - Save and close
- Run
streamlit run ./sampleApp.py
- Open the URL in your browser. If using Cloud9, you must have your EC2 instance in a public subnet with a public IP, and a Security Group rule allow traffic to the Streamlit port (usually 8501)
1.Empty the S3 bucket or you will get an error when you delete the SAM application
2. aws sam delete --stack-name <your stack name here>
- processed/ - Contains the metadata and scraped content from each RSS feed item, populated by the
processRSS
function. - dialogs/ - Contains the outputs from the LLM generated by the
generateDialog
function for each RSS feed item. - topic_audio/ - Contains the individual mp3 objects for each RSS feed item, generated by Amazon Polly, from the
generateTopicAudio
function. - transition_audio/ - Contains the individual mp3 objects for the transitions between topic categories, generated by the
generateTransitions
function. - other_audio/ - Contains the individual mp3 objects for the Intro and Outro, generated by the
generateIntro
andgenerateOutro
functions. - playlist.json - Sequenced manifest of each mp3 object. Output by the
generatePlaylist
function. Used by the Streamlit app to provide titles and links. - podcast.mp3 - Final mp3 object combining all individual mp3's, output by the
generatePlaylist
fucntion.