A simple Mastra template that processes PDF files and generates comprehensive questions from their content using OpenAI GPT-4o.
This template demonstrates a straightforward workflow:
- Input: PDF URL
- Download: Fetch the PDF file
- Extract Text: Parse PDF using pure JavaScript (no system dependencies!)
- Generate Questions: Create questions using OpenAI GPT-4o
- Node.js 20.9.0 or higher
- OpenAI API key (that's it!)
-
Clone and install dependencies:
git clone <repository-url> cd template-pdf-questions pnpm install
-
Set up environment variables:
cp env.example .env # Edit .env and add your OpenAI API key -
Run the example:
export OPENAI_API_KEY="your-real-api-key-here" npx tsx example.ts
import { mastra } from './src/mastra/index';
const run = await mastra.getWorkflow('pdfToQuestionsWorkflow').createRunAsync();
// Using a PDF URL
const result = await run.start({
inputData: {
pdfUrl: 'https://example.com/document.pdf',
},
});
console.log(result.result.questions);import { mastra } from './src/mastra/index';
const agent = mastra.getAgent('pdfQuestionsAgent');
// The agent can handle the full process with natural language
const response = await agent.stream([
{
role: 'user',
content: 'Please download this PDF and generate questions from it: https://example.com/document.pdf',
},
]);
for await (const chunk of response.textStream) {
console.log(chunk);
}import { mastra } from './src/mastra/index';
import { pdfFetcherTool } from './src/mastra/tools/pdf-fetcher-tool';
import { textExtractorTool } from './src/mastra/tools/text-extractor-tool';
import { questionGeneratorTool } from './src/mastra/tools/question-generator-tool';
// Step 1: Download PDF
const pdfResult = await pdfFetcherTool.execute({
context: { pdfUrl: 'https://example.com/document.pdf' },
runtimeContext: new RuntimeContext(),
});
// Step 2: Extract text
const textResult = await textExtractorTool.execute({
context: { pdfBuffer: pdfResult.pdfBuffer },
runtimeContext: new RuntimeContext(),
});
// Step 3: Generate questions
const questionsResult = await questionGeneratorTool.execute({
context: {
extractedText: textResult.extractedText,
maxQuestions: 10
},
mastra,
runtimeContext: new RuntimeContext(),
});
console.log(questionsResult.questions);{
status: 'success',
result: {
questions: [
"What is the main objective of the research presented in this paper?",
"Which methodology was used to collect the data?",
"What are the key findings of the study?",
// ... more questions
],
success: true
}
}pdfToQuestionsWorkflow: Main workflow orchestrating the processquestionGeneratorAgent: Mastra agent specialized in generating educational questionspdfQuestionsAgent: Complete agent that can handle the full PDF to questions pipelinesimpleOCR: Pure JavaScript PDF text extraction (no system dependencies)
pdfFetcherTool: Downloads PDF files from URLs and returns bufferstextExtractorTool: Extracts text from PDF buffers using OCRquestionGeneratorTool: Generates comprehensive questions from extracted text
download-pdf: Downloads PDF from provided URLextract-text: Extracts text using JavaScript PDF parser (pdf2json)generate-questions: Creates comprehensive questions using the question generator agent
- ✅ Zero System Dependencies: Pure JavaScript solution
- ✅ Simple Setup: Only requires OpenAI API key
- ✅ Fast Text Extraction: Direct PDF parsing (no OCR needed for text-based PDFs)
- ✅ Educational Focus: Generates comprehensive learning questions
- ✅ Multiple Interfaces: Workflow, Agent, and individual tools available
This template uses a pure JavaScript approach that works for most PDFs:
-
Text-based PDFs (90% of cases): Direct text extraction using
pdf2json- ⚡ Fast and reliable
- 🔧 No system dependencies
- ✅ Works out of the box
-
Scanned PDFs: Would require OCR, but most PDFs today contain embedded text
- Simplicity: No GraphicsMagick, ImageMagick, or other system tools needed
- Speed: Direct text extraction is much faster than OCR
- Reliability: Works consistently across different environments
- Educational: Easy for developers to understand and modify
- Single Path: One clear workflow with no complex branching
OPENAI_API_KEY=your_openai_api_key_hereYou can customize the question generation by modifying the questionGeneratorAgent:
export const questionGeneratorAgent = new Agent({
name: 'Question Generator Pro',
instructions: `
You are an expert educational content creator...
// Customize instructions here
`,
model: openai('gpt-4o'),
});src/mastra/
├── agents/
│ └── questionGeneratorAgent.ts # Question generation agent
├── tools/
│ └── simpleOCR.ts # Pure JavaScript PDF parser
├── workflows/
│ └── pdfToQuestionsWorkflow.ts # Main workflow
└── index.ts # Mastra configuration
# Run with a test PDF
export OPENAI_API_KEY="your-api-key"
npx tsx example.ts- Make sure you've set the environment variable
- Check that your API key is valid and has sufficient credits
- Verify the PDF URL is accessible and publicly available
- Check network connectivity
- Ensure the URL points to a valid PDF file
- Some servers may require authentication or have restrictions
- The PDF might be password-protected
- Very large PDFs might take longer to process
- Scanned PDFs without embedded text won't work (rare with modern PDFs)
- Solution: Use a smaller PDF file (under ~5-10 pages)
- Automatic Truncation: The tool automatically uses only the first 4000 characters for very large documents
- Helpful Errors: Clear messages guide you to use smaller PDFs when needed
- Single dependency for PDF processing (
pdf2json) - No system tools or complex setup required
- Works immediately after
pnpm install - Multiple usage patterns (workflow, agent, tools)
- Direct text extraction (no image conversion)
- Much faster than OCR-based approaches
- Handles reasonably-sized documents efficiently
- Pure JavaScript/TypeScript
- Easy to understand and modify
- Clear separation of concerns
- Simple error handling with helpful messages
- Generates multiple question types
- Covers different comprehension levels
- Perfect for creating study materials
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request