Large language Models (LLM). Browse the Wolfram directory and associated URLs (directory and content pages), to create the category structure and good word embeddings. The goal is to generate enriched prompts for GPT, Wikipedia, ArXiv, Google Scholar, Stack Exchange or Google search. The focus is on one subdiretory: Probability & Statistics.
Documentation is in my project textbook Projects4.pdf
, here in this folder. I strongly encourage you to download the document and browse your local copy with Chrome, Edge, or other viewers. Unlike on GitHub, you will be able to click on all the links and follow the internal navigation features. Look for projects related to NLP and LLM / xLLM. The best starting point is project 7.2.2. It's the core project on this topic, with references to all satellite projects.
The project textbook (with my solutions to all projects) is the core document needed to participate in the free course (deep tech dive) called GenAI Fellowship. For details about the fellowhip, follow this link.
Note: An uncompressed version of crawl_final_stats.txt.gz
is available on my Google drive, here. This file contains all the crawled data needed as input to the Python scripts in the XLLM5 and XLLM6 folders.