Pinned Repositories
atlas-converse
This is a simplistic synthetic conversation set generator that employs the OpenAI GPT-3.5-turbo model to create detailed, topic-specific text content, which is then cleansed, reformatted, and converted into a JSON Lines (JSONL) format for easy use in data analysis, machine learning, or other natural language processing tasks.
atlas-mathematical-computations
This repository is dedicated to generating a variety of mathematical problems along with their solutions, spanning basic operations like addition, subtraction, and multiplication, to serve as a comprehensive dataset for educational purposes, algorithm testing, or machine learning model training.
atlas-reasoning
This script leverages OpenAI's GPT-3 to generate reasoning tasks based on topics and subtopics, which are then cleansed, formatted, and consolidated from individual text files into a single CSV file.
atlas-storyteller
Simple python script that generates short stories based on several themes. Uses GPT-3.5
atlas-voice
This software is an intuitive and powerful tool that enables users to record, save, and play audio while also associating each audio with specific text and intelligently removing silent segments from the recordings.
compressed-parquet-aud-url-extractor
This reads a compressed PARQUET file, allows the user to select the rows out of the file and extract them into a txt file. Also deduplicates the URLs.
compressed-parquet-img-url-extractor
This reads a compressed PARQUET file, allows the user to select the rows out of the file and extract them into a txt file. Also deduplicates the URLs.
FolderGen
An elegant VBA-code based Excel document for hyperlink validity testing, link extraction from a cell, and data collection from a URL into a folder system. Additional PDF-to-Word-and-Text.
PDF-to-Image-Cluster
This project is designed to automate the process of downloading and processing large datasets from the web: specifically, it scrapes and downloads .snappy.parquet files, converts them to CSV, extracts URLs, downloads associated PDFs, performs OCR on the PDFs to extract text and bounding boxes, and finally organizes and archives the data.
Templates-ComfyUI-
Templates to view the variety of a prompt based on the samplers available in ComfyUI. Variety of sizes and singlular seed and random seed templates.
atlasunified's Repositories
atlasunified/Templates-ComfyUI-
Templates to view the variety of a prompt based on the samplers available in ComfyUI. Variety of sizes and singlular seed and random seed templates.
atlasunified/atlas-reasoning
This script leverages OpenAI's GPT-3 to generate reasoning tasks based on topics and subtopics, which are then cleansed, formatted, and consolidated from individual text files into a single CSV file.
atlasunified/atlas-converse
This is a simplistic synthetic conversation set generator that employs the OpenAI GPT-3.5-turbo model to create detailed, topic-specific text content, which is then cleansed, reformatted, and converted into a JSON Lines (JSONL) format for easy use in data analysis, machine learning, or other natural language processing tasks.
atlasunified/PDF-to-Image-Cluster
This project is designed to automate the process of downloading and processing large datasets from the web: specifically, it scrapes and downloads .snappy.parquet files, converts them to CSV, extracts URLs, downloads associated PDFs, performs OCR on the PDFs to extract text and bounding boxes, and finally organizes and archives the data.
atlasunified/atlas-mathematical-computations
This repository is dedicated to generating a variety of mathematical problems along with their solutions, spanning basic operations like addition, subtraction, and multiplication, to serve as a comprehensive dataset for educational purposes, algorithm testing, or machine learning model training.
atlasunified/atlas-voice
This software is an intuitive and powerful tool that enables users to record, save, and play audio while also associating each audio with specific text and intelligently removing silent segments from the recordings.
atlasunified/FolderGen
An elegant VBA-code based Excel document for hyperlink validity testing, link extraction from a cell, and data collection from a URL into a folder system. Additional PDF-to-Word-and-Text.
atlasunified/atlas-storyteller
Simple python script that generates short stories based on several themes. Uses GPT-3.5
atlasunified/compressed-parquet-aud-url-extractor
This reads a compressed PARQUET file, allows the user to select the rows out of the file and extract them into a txt file. Also deduplicates the URLs.
atlasunified/compressed-parquet-img-url-extractor
This reads a compressed PARQUET file, allows the user to select the rows out of the file and extract them into a txt file. Also deduplicates the URLs.
atlasunified/compressed-parquet-vid-url-extractor
This reads a compressed PARQUET file, allows the user to select the rows out of the file and extract them into a txt file. Also deduplicates the urls.
atlasunified/font-to-image
Takes font files, pulls the metadata off, generates images based on keywords/special characters/letter, runs them through a CNN (ReLU or GELU Code Provided), and then has a Gradio interface to test the results
atlasunified/instruction-set-generator
This Gradio interface inputs, by hand, improved instruction sets into a JSONL file, in the order entered.
atlasunified/mathematics_dataset
This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty.
atlasunified/url-section-text-file-generator
This will extract the HTML from a website URL and create a text file, then split that HTML into headers and extract text. Does not do images currently.