StackExchange and SecFillings both pulling StackExchange data?
Opened this issue · 0 comments
emorisse commented
The Data preprocessor.py is almost the same except for some comments and the object class name. Diff provided below:
--- Data/StackExchange/preprocessor.py 2024-07-04 12:42:46
+++ Data/SecFilings/preprocessor.py 2024-07-04 12:42:46
@@ -12,10 +12,32 @@
ROOTPATH = dirname(dirname(abspath(__file__)))
-# Please use first the HuggingFace script at https://huggingface.co/datasets/HuggingFaceH4/stack-exchange-preferences to get the data
+# SEC TEMPLATE
-class StackExchangeData:
+# [KEEP] 0.Business: Overview of the company's main operations, including its products or services.
+# [KEEP] 1.Risk Factors: Discussion of risks and challenges the company faces.
+# [REMOVE] 2.Unresolved Staff Comments: Comments by SEC staff on the company's previous filings that haven't been resolved.
+# [REMOVE] 3.Properties: Information about the company's physical properties (like real estate).
+# [REMOVE] 4.Legal Proceedings: Information on any significant legal actions involving the company.
+# [REMOVE] 5.Market for Registrant’s Common Equity, Related Stockholder Matters and Issuer Purchases of Equity Securities: Details about the company’s stock, including dividends, the number of shareholders, and any buyback programs.
+# [REMOVE] 6.Selected Financial Data: Summary of specific financial data for a five-year period.
+
+# [KEEP] 8.Management’s Discussion and Analysis of Financial Condition and Results of Operations (MD&A): A detailed analysis from management’s perspective on the company’s financials and operations.
+# [REMOVE] 9.Quantitative and Qualitative Disclosures About Market Risk: Information on market risk, such as foreign exchange risk, interest rate risk, etc.
+# [REMOVE] 1.Financial Statements and Supplementary Data: Complete financial statements including balance sheets, income statements, and cash flow statements.
+# [REMOVE] 11.Changes in and Disagreements with Accountants on Accounting and Financial Disclosure: If there have been changes or disagreements with accountants, this section provides details.
+# [REMOVE] 12.Directors, Executive Officers and Corporate Governance: Information about the company’s directors and high-level executives.
+# [REMOVE] 13.Executive Compensation: Detailed information about the compensation of top executives.
+# [REMOVE] 14.Security Ownership of Certain Beneficial Owners and Management and Related Stockholder Matters: Details about the shares held by major shareholders and company executives.
+# [REMOVE] 15.Certain Relationships and Related Transactions, and Director Independence: Information about any transactions between the company and its directors or executives.
+# [REMOVE] 16.Principal Accountant Fees and Services: Fees and services provided by the company's accountants.
+# [REMOVE] 17.Exhibits, Financial Statement Schedules: Lists all the exhibits and financial statements schedules.
+# [REMOVE] 18.Form 10-K Summary: Summary of the key information from the 10-K (optional).
+# [REMOVE] 19. [OPTIONAl] CEO and CFO Certifications: As required by the Sarbanes-Oxley Act, certifications by the CEO and CFO regarding the accuracy of the financial statements.
+
+class ExchangeData:
+
def __init__(self,
n_samples: int,
max_char_length: int):
@@ -115,8 +137,7 @@
if __name__ == "__main__":
- stack_exchange_data = StackExchangeData(
- n_samples=400,
- max_char_length=1500)
+ stack_exchange_data = ExchangeData(n_samples=400,
+ max_char_length=1500)
stack_exchange_data.load_save_dataset()