Essentials ➔
Health ➔
Quality ➔
Distribution ➔
Community ➔
The sec-parser
project simplifies extracting meaningful information from SEC EDGAR HTML documents by organizing them into semantic elements and a tree structure. Semantic elements might include section titles, paragraphs, and tables, each classified for easier data manipulation. This forms a semantic tree that corresponds to the visual and informational structure of the document.
This tool is especially beneficial for Artificial Intelligence (AI), Machine Learning (ML), and Large Language Models (LLM) applications by streamlining data pre-processing and feature extraction.
- Explore the Demo
- Read the Documentation
- Join the Discussions to get help, propose ideas, or chat with the community
- Become part of our Discord community
- Report bugs in Issues
- Stay updated and contribute to our project's direction in Announcements and Roadmap
- Learn How to Contribute
sec-parser
is versatile and can be applied in various scenarios, including but not limited to:
- Financial Analysis: Extract financial data from 10-Q and 10-K filings for quantitative modeling.
- Risk Assessment: Evaluate risk factors or Management's Discussion and Analysis sections for qualitative analysis.
- Regulatory Compliance: Assist in automating compliance checks for the legal teams.
- Flexible Filtering: Easily filter SEC documents by sections and types, giving you precisely the data you need.
- Academic Research: Facilitate large-scale studies involving public financial disclosures, sentiment analysis, or corporate governance generalization.
- Analytics Ready: Integrate parsed data seamlessly into popular analytics tools for further analysis and visualization.
- Cutting-Edge AI for SEC EDGAR: Apply advanced AI techniques like MemWalker to navigate and extract and transform complex information from SEC documents efficiently. Learn more in our blog post: Cutting-Edge AI for SEC EDGAR: Introducing MemWalker.
- AI Applications: Leverage parsed data for various AI tasks such as text summarization, sentiment analysis, and named entity recognition.
- Data Augmentation: Use authentic financial text to train and test machine learning models.
- Causal Analysis: Use parsed data to understand cause-effect relationships in financial data, beyond mere correlations.
- Predictive Modeling: Enhance predictive models by incorporating causal relationships, leading to more robust and reliable predictions.
- Decision Making: Aid decision-making processes by providing insights into the potential impact of different actions, based on causal relationships identified in the data.
- LLM Compatible: Use parsed data to facilitate complex NLU tasks with Large Language Models like ChatGPT, including question-answering, language translation, and information retrieval.
These use-cases demonstrate the flexibility and power of sec-parser
in handling both traditional data extraction tasks and facilitating more advanced AI-driven analysis.
Warning This project,
sec-parser
, is an independent, open-source initiative and has no affiliation, endorsement, or verification by the United States Securities and Exchange Commission (SEC). It utilizes public APIs and data provided by the SEC solely for research, informational, and educational objectives. This tool is not intended for financial advisement or as a substitute for professional investment advice or compliance with securities regulations. The creators and maintainers make no warranties, expressed or implied, about the accuracy, completeness, or reliability of the data and analyses presented. Use this software at your own risk. For accurate and comprehensive financial analysis, consult with qualified financial professionals and comply with all relevant legal requirements. The project maintainers and contributors are not liable for any financial or legal consequences arising from the use of this tool.
This guide will walk you through the process of installing the sec-parser
package and using it to extract the "Segment Operating Performance" section as a semantic tree from the latest Apple 10-Q filing.
First, install the sec-parser
package using pip:
pip install sec-parser
In order to run the example code in this README, you'll also need the sec_downloader
package:
pip install sec-downloader
Once you've installed the necessary packages, you can start by downloading the filing from the SEC EDGAR website. Here's how you can do it:
from sec_downloader import Downloader
# Initialize the downloader with your company name and email
dl = Downloader("MyCompanyName", "email@example.com")
# Download the latest 10-Q filing for Apple
html = dl.get_latest_html("10-Q", "AAPL")
Note The company name and email address are used to form a user-agent string that adheres to the SEC EDGAR's fair access policy for programmatic downloading. Source
Now, we can parse the filing into semantic elements and arrange them into a tree structure:
import sec_parser as sp
# Parse the HTML into a list of semantic elements
elements = sp.Edgar10QParser().parse(html)
# Construct a semantic tree to allow for easy filtering by section
tree = sp.TreeBuilder().build(elements)
# Find section "Segment Operating Performance"
section = [n for n in tree.nodes if n.text.startswith("Segment")][0]
# Preview the tree
print("\n".join(sp.render(section).split("\n")[:13]) + "...")
TitleElement: Segment Operating Performance ├── TextElement: The following table sho... (dollars in millions): ├── TableElement: Table with 7 rows, 40 numbers, and 414 characters. ├── TitleElement[L1]: Americas │ └── TextElement: Americas net sales decr... net sales of Services. ├── TitleElement[L1]: Europe │ └── TextElement: The weakness in foreign...er net sales of iPhone. ├── TitleElement[L1]: Greater China │ └── TextElement: The weakness in the ren...er net sales of iPhone. ├── TitleElement[L1]: Japan │ └── TextElement: The weakness in the yen..., Home and Accessories. └── TitleElement[L1]: Rest of Asia Pacific ├── TextElement: The weakness in foreign...lower net sales of Mac....
For more examples and advanced usage, you can continue learning how to use sec-parser
by referring to the User Guide, Developer Guide, and Documentation.
You've successfully parsed an SEC document into semantic elements and arranged them into a tree structure. To further analyze this data with analytics or AI, you can use any tool of your choice.
For a tailored experience, consider using our free and open-source library for AI-powered financial analysis:
pip install sec-ai
To ensure your code remains functional even when we update sec-parser
, it's recommended to avoid complex imports. Don't use intricate import statements that go deep into the package, like this:
from sec_parser.semantic_tree.internal_utils import SomeInternalClass
Here are the suggested ways to import modules from sec-parser
:
- Standard Way: Use
import sec_parser as sp
This imports the main package assp
. You can then access its functionalities usingsp.
prefix.
- Package-Level Import: Use
from sec_parser import SomeClass
This allows you to directly useSomeClass
without any prefix.
- Submodule: Use
from sec_parser import semantic_tree
This imports thesemantic_tree
submodule, and you can access its classes and functions usingsemantic_tree.
prefix.
- Submodule-Level: Use
from sec_parser.semantic_tree import SomeClass
This imports a specific classSomeClass
from thesemantic_tree
submodule.
Note The main package
sec_parser
contains only the most common functionalities. For specialized tasks, please use submodule or submodule-level imports.
For information about setting up the development environment, coding standards, and contribution workflows, please refer to our CONTRIBUTING.md guide.
This project is licensed under the MIT License - see the LICENSE file for details.