- Objective: Create a diverse set of problem categories (e.g., math, algorithms, data structures, string manipulation, etc.).
- Action: Use GPT-4 to generate a list of categories and subcategories. For example:
- Math Problems: Arithmetic, Algebra, Calculus
- Algorithms: Sorting, Searching, Dynamic Programming
- Data Structures: Arrays, Linked Lists, Trees
- Objective: Generate unique problem statements for each category.
- Action:
- Feed GPT-4 with the category and subcategory names and ask it to generate multiple problem statements for each.
- For instance, ask GPT-4 to generate 500 unique problems for each subcategory, which will give you a diverse set of problems across all categories.
- Objective: For each problem statement, create a structured solution in pseudocode or structured text.
- Action:
-
Use GPT-4 to write the pseudocode for the problems generated. You can instruct it to use structured text formats, such as:
Step 1: Initialize variables Step 2: Loop through the array Step 3: Apply condition Step 4: Return the result
-
Iterate this process until you generate solutions for all 200,000 problems.
-
- Objective: Ensure the solutions are accurate and well-structured.
- Action:
- Use GPT-4 to validate the pseudocode by rephrasing it into actual code snippets in a programming language (e.g., Python).
- Run a validation script on a subset of these problems to ensure the correctness of the logic.
- Refine any solutions that do not meet the required standards.
- Objective: Introduce variation in difficulty and complexity across the dataset.
- Action:
- Modify a portion of the problem statements and solutions to introduce different levels of difficulty.
- Ask GPT-4 to add variations such as edge cases, performance optimizations, or alternative approaches.
- Objective: Format the dataset in a consistent manner suitable for training or other applications.
- Action:
- Create a CSV or JSON structure where each entry contains the problem statement, category, subcategory, and structured solution.
- Ensure that each problem has metadata like difficulty level, problem type, and any tags for specific concepts.
- Objective: Ensure the final dataset meets your quality standards.
- Action:
- Perform a quality check on random samples across all categories.
- Use GPT-4 to cross-check and improve consistency in structured text formatting.
- Objective: Scale the process to generate the full 200,000 dataset.
- Action:
- Automate the generation, validation, and formatting processes using scripts that call GPT-4 for different stages.
- Parallelize the task across multiple instances of GPT-4 to expedite the generation process.
- Objective: Finalize the dataset and prepare it for use.
- Action:
- Perform any necessary final cleaning, deduplication, and formatting.
- Export the dataset into the desired format and ensure it's well-documented.
- Objective: Create comprehensive documentation for the dataset.
- Action:
- Document the process, dataset structure, and any specific details about problem categories, complexity levels, and structured text formats.
- Include guidelines on how to use the dataset effectively.