DMQL Milestone 2 Report
I have read and understood the course academic integrity policy in the syllabus of the class.
Team members:
- Ashwin Ashok (aashok3@buffalo.edu)
- Sandeep Kunusoth (skunusot@buffalo.edu)
- Shreyas Sujjaluru Lakshminarasimhan (ssujjalu@buffalo.edu)
The Federal Election Commission (FEC) is responsible for enforcing the campaign finance laws in the United States. As part of this responsibility, the FEC collects and publishes data on campaign contributions and expenditure. This data is used by journalists, researchers, and the public to monitor the influence of money in politics. The goal of this project is to create a comprehensive database of campaign finance data for the 2022-2023 election cycle. This database should be easily searchable and accessible to the public, journalists, and researchers. This data is crucial for building a platform to display the campaign contributions and expenditures from past years. It will be much simpler to maintain this data in databases, which also makes it possible to obtain findings instantly. Overall, it seems that the data elements in dataset are all related to the complex web of money and politics in the United States. By analyzing data together, it may be possible to gain insights into how money influences elections and which individuals and organizations are most involved in the political process.
Link: Federal Election Commission Database
Why Databases and not EXCEL?
EXCEL:
- Large data is challenging to preserve due of size. Additionally, Excel has difficulty maintaining the large number of entities needed to establish links between the data.
- Data relationships are difficult to maintain and are not possible.
- It will not support the real time data.
DATABASE:
- When it comes to manipulating and analyzing data, databases are significantly more powerful and adaptable than Excel. Data can be stored in a variety of formats, including text, photos, audio, and video.
- It is flexible for the larger data.
- The relations between the tables are easy.
- The database facilitates real-time applications.
- Using a database, we may conduct more complicated actions like connecting two tables, doing computations, and bulk updating. They also enable increased security and scalability.
Target User:
Journalists, Researchers, Advocacy Groups, Voters, and General Public comprise the end users. Journalists often rely on campaign finance data to uncover potential conflicts of interest and to report on the influence of money in politics. They may use the database to search for contributions from specific individuals or organizations, or to compare the fundraising and spending patterns of different candidates. Researchers in political science, economics, and other fields may use the database to analyze the impact of money on political campaigns, to identify trends in campaign finance, or to explore the relationship between political donations and policy outcomes. Advocacy groups may use the database to monitor the fundraising and spending activities of candidates or political parties, to track the activities of specific donors or interest groups, or to identify potential avenues for advocacy and engagement. Voters may use the database to research the candidates running in their districts or states, to learn more about the sources of funding for their campaigns, and to assess the potential influence of money on their elected representatives. The database can be accessed by anyone who is interested in learning more about the role of money in politics during the 2022-2023 election cycle in the United States. The data can be used to increase transparency and accountability in the political process and to inform public discourse on the issue.
Administrator: Federal Election commission maintains a public database, called the Electronic Filing System (EFS), which contains financial information provided by political committees and candidates.
Entity Relation Diagram (ERD)
Before we start designing the E/R diagram, we need to understand the tables in the dataset and their relationships. Based on the table names provided, we can identify the following entities:
Entities:
- Candidates
- Committees
- Contributors
- Campaigns
- Contributions
- Expenditures
- Transactions
Now, let's consider the relationships between these entities. Here are some possible relationships that we can identify based on the dataset:
Relationships:
Candidates – Committees:- Many to One
CurrentCampaignsForHouseandSenate - Candidates:- One to One
ContributionsByIndividuals - Committees:- Many to Many
IndependentExpenditures - Candidates:- Many to One
Committees – IndependentExpenditures:- One to Many
Committees – OperatingExpenditures:- One to Many
Committees – PAC&PartySummary:- One to One
Committees – AnyTransactionFromOneCommmitteeToAnother:- Many to Many
Based on these relationships, we can draw an E/R diagram that looks something like this:
In this diagram, we have entities represented as rectangles. The relationships between these entities are represented as lines connecting them.
NOTE:
In all the further sections green indicates the addition of new column/table post normalization and red indicates removal of column/table post the normalization.
Database Implementation:
Database Schema, Attributes, Primary key and Foreign Key:
- candidate**
Description: The candidate contains information about individual candidates running for office. This will include data on the candidate’s name, party affiliation, campaign platform, and other relevant information. The committee relation contains the below attributes.
Attributes and Datatypes:
Column Name | Column Type | Can be Null |
Default | Description |
---|---|---|---|---|
CAND_ID | VARCHAR(9) | No | ID of the candidate. First character indicates office sought - H=House, S=Senate, P=Presidential. Characters 3 & 4 are the state abbreviation for Congressional candidates. Example value (H0AK00105) | |
CAND_NAME | VARCHAR(200) | Yes | '' | Name of candidate. Example Value (Martha Washington) |
CAND_PTY_AFFILIATION | VARCHAR(3) | Yes | '' | The political party affiliation reported by the candidate. Dem=Democrat Rep=Republican full list of party affiliations Example value (Dem) |
CAND_ELECTION_YR | Numeric(4) | No | Year of election. Example value: 2022 | |
CAND_OFFICE_ST | VARCHAR(2) | Yes | '' | Candidate state. House = state of race Example value: (VA) |
CAND_OFFICE | VARCHAR(1) | Yes | '' |
Candidate Office. H = House |
CAND_OFFICE_DISTRICT | VARCHAR(2) | Yes | '' |
Candidate district. Congressional district number Congressional at-large 00 Senate 00 Presidential 00. Example value: 01 |
CAND_ICI | VARCHAR(1) | Yes | '' |
Incumbent challenger status. C = Challenger Example value: I |
CAND_STATUS | VARCHAR(1) | Yes | '' |
Candidate status. C = Statutory candidate Example value: C |
CAND_PCC | VARCHAR(9) | Yes | '' |
Principal campaign committee. The ID assigned by the Federal Election Commission to the candidate's principal campaign committee for a given election cycle. Example value: C00100005 |
CAND_ST1 | VARCHAR(34) | Yes | '' |
Mailing address – street. Example value: 1001 George Washington Hwy |
CAND_ST2 | VARCHAR(34) | Yes | '' |
Mailing address – street2 Example value: Suite 100 |
CAND_CITY | VARCHAR(30) | Yes | '' |
Mailing address – city Example value: Alexandria |
CAND_ST | VARCHAR(2) | Yes | '' |
Mailing address – state Example value: VA |
CAND_ZIP | VARCHAR(9) | Yes | '' |
Mailing address - ZIP code Example value: 22201 |
Primary Key: (CAND_ID) In the case of the candidate table, the CAND_ID column is the most appropriate choice for the primary key because it uniquely identifies each candidate. This is evident from the fact that the CAND_ID column has a unique value for each row in the table. Moreover, it is a unique identifier assigned by the Federal Election Commission (FEC) to each candidate and is used to track the candidate's fundraising and spending activities. Additionally, foreign keys in other tables that reference the candidate table can use the CAND_ID column to establish the relationship between the tables.
Schema:
CREATE TABLE candidate (
CAND_ID VARCHAR(9) NOT NULL PRIMARY KEY,
CAND_NAME VARCHAR(200),
CAND_PTY_AFFILIATION VARCHAR(3),
CAND_ELECTION_YR NUMERIC(4),
CAND_OFFICE_ST VARCHAR(2),
CAND_OFFICE VARCHAR(1),
CAND_OFFICE_DISTRICT VARCHAR(2),
CAND_ICI VARCHAR(1),
CAND_STATUS VARCHAR(1),
CAND_PCC VARCHAR(9),
CAND_ST1 VARCHAR(34),
CAND_ST2 VARCHAR(34),
CAND_CITY VARCHAR(30),
CAND_ST VARCHAR(2),
CAND_ZIP VARCHAR(9)
);
- committee**
Description: The committee contains information about the various political committees involved in the election process. This includes data on the committee’s name, purpose, funding sources, and other relevant information. The committee relation contains the below attributes.
Attributes and Datatypes:
Column Name | Column Type | Can be Null |
Default | Description |
---|---|---|---|---|
CMTE_ID | VARCHAR(9) | No | Committee identification. A 9-character alpha-numeric code assigned to a committee by the Federal Election Commission. Committee IDs are unique and an ID for a specific committee always remains the same. Example value: C00100005 | |
CMTE_NM | VARCHAR(200) | Yes | '' |
Committee name . Example value: Martha Washington for Congress |
TRES_NM | VARCHAR(90) | Yes | '' |
Treasurer's name. The officially registered treasurer for the committee. Example value: Alexander Hamilton |
CMTE_ST1 | VARCHAR(34) | Yes | '' | Mailing address – street.Example value: 1001 George Washington Hwy |
CMTE_ST2 | VARCHAR(34) | Yes | '' |
Mailing address – street2 Example value: Suite 100 |
CMTE_CITY | VARCHAR(30) | Yes | '' |
Mailing address – city Example value: Alexandria |
CMTE_ST | VARCHAR(2) | Yes | '' |
Mailing address – state Example value: VA |
CAND_ZIP | VARCHAR(9) | Yes | '' |
Mailing address - ZIP code Example value: 22201 |
CMTE_DSGN | VARCHAR(1) | Yes | '' |
Committee designation. A = Authorized by a candidate Example value: A |
CMTE_TP | VARCHAR(1) | Yes | '' |
Committee type. List of committee type codes Example value: H |
CMTE_PTY_AFFILIATION | VARCHAR(3) | Yes | '' |
Committee party. List of party codes Example value: NON |
CMTE_FILING_FREQ | VARCHAR(1) | Yes | '' |
Filing frequency. A = Administratively terminated Example value: Q |
ORG_TP | VARCHAR(1) | Yes | '' |
Interest group category. C = Corporation Example value: C |
CONNECTED_ORG_NM | VARCHAR(200) | Yes | '' |
Connected organization's name. Example value: ‘Widgets, Incorporated’ |
CAND_ID | VARCHAR(9) | Yes | '' |
Candidate identification. When a committee has a committee type designation of H, S, or P, the candidate's identification number will be entered in this field. Example value: H1VA01225 |
Primary Key: (CMTE_ID) Using CMTE_ID as the primary key ensures that each row in the table is uniquely identifiable by its committee ID. It also allows for efficient searching and retrieval of data related to a specific committee using the primary key index. Furthermore, since CMTE_ID is a unique identifier assigned by the Federal Election Commission (FEC), it is a reliable and stable identifier that is unlikely to change over time, which makes it a good candidate for the primary key.
Foreign Key: (CAND_ID) This relationship exists because a committee can be associated with a candidate. A committee may support a candidate. Therefore, it is appropriate to include a foreign key constraint in the committee table that references the candidate table's primary key. Additionally, this foreign key relationship allows for easy querying of data related to a candidate's committees and contributions to those committees.
Schema:
CREATE TABLE committee (
CMTE_ID VARCHAR(9) NOT NULL PRIMARY KEY,
CMTE_NM VARCHAR(200),
TRES_NM VARCHAR(90),
CMTE_ST1 VARCHAR(34),
CMTE_ST2 VARCHAR(34),
CMTE_CITY VARCHAR(30),
CMTE_ST VARCHAR(2),
CMTE_ZIP VARCHAR(9),
CMTE_DSGN VARCHAR(1),
CMTE_TP VARCHAR(1),
CMTE_PTY_AFFILIATION VARCHAR(3),
CMTE_FILING_FREQ VARCHAR(1),
ORG_TP VARCHAR(1),
CONNECTED_ORG_NM VARCHAR(200),
CAND_ID VARCHAR(9)
);
- Candidate- committee linkage**
Description: The candidate-committee linkage contains information linking the candidate's information to information about his or her committee**.** The candidate-committee linkage relation contains the bellow attributes.
Attributes and Datatypes:
Column Name | Column Type | Can be Null |
Default | Description |
---|---|---|---|---|
CAND_ID | VARCHAR(9) | No | Candidate identification. A 9-character alpha-numeric code assigned to a candidate by the Federal Election Commission. The candidate ID for a specific candidate remains the same across election cycles as long as the candidate is running for the same office. Example value: C00100005 | |
CAND_ELECTION_YR | Numeric(4) | No | Candidate election year. Example value: 2022 | |
FEC_ELECTION_YR | Numeric(4) | No | FEC election year. Active 2-year period. Example value: 2022 | |
CMTE_ID | VARCHAR(9) | No | Committee identification. A 9-character alpha-numeric code assigned to a committee by the Federal Election Commission. The committee ID for a specific committee always remains the same. Example value: C00100005 | |
CMTE_DSGN | VARCHAR(1) | Yes | '' |
Committee designation. A = Authorized by a candidate Example value: A |
CMTE_TP | VARCHAR(1) | Yes | '' |
Committee type. List of committee type codes Example value: H |
LINKAGE_ID | Numeric(12) | No | Linkage ID. Unique link ID Example value: 123456789012 |
Primary Key: (LINKAGE_ID) The candidate_committee table serves as an association table that links the candidate and committee tables, which already have their own primary keys (CAND_ID and CMTE_ID, respectively). Therefore, the primary key for the candidate_committee table should be a unique combination of CAND_ID, CAND_ELECTION_YR, FEC_ELECTION_YR, CMTE_ID. For every unique combination of CAND_ID, CAND_ELECTION_YR, FEC_ELECTION_YR, CMTE_ID there is a unique LINKAGE_ID which should be the primary key. The CAND_ELECTION_YR and FEC_ELECTION_YR columns represent the year of the candidate's election and the year of the corresponding FEC election respectively.
Foreign Key: (CAND_ID, CMTE_ID). The candidate_committee table contains a link between a candidate and a committee, indicating which candidate has received contributions from which committee. Therefore, the foreign key in this table should reference the primary keys of the candidate and committee tables, which are CAND_ID and CMTE_ID, respectively. These foreign keys ensure that the link between a candidate and a committee is valid and references existing records in the candidate and committee tables.
Schema:
CREATE TABLE candidate_committee (
CAND_ID VARCHAR(9),
CAND_ELECTION_YR Numeric(4),
FEC_ELECTION_YR Numeric(4),
CMTE_ID VARCHAR(9),
CMTE_TP VARCHAR(1),
CMTE_DSGN VARCHAR(1),
LINKAGE_ID Numeric(12),
PRIMARY KEY (LINKAGE_ID),
UNIQUE (CAND_ID, CAND_ELECTION_YR, FEC_ELECTION_YR,CMTE_ID),
FOREIGN KEY (CAND_ID) REFERENCES candidate (CAND_ID),
FOREIGN KEY (CMTE_ID) REFERENCES committee (CMTE_ID)
);
- contributor**
Description: The contributor contains the information of the contributor**.** The contributor relation contains the bellow attributes.
Attributes and Datatypes:
Column Name | Column Type | Can be Null |
Default | Description |
---|---|---|---|---|
CONTRIBUTOR_ID | Varchar(9) | No | Contributor id | |
NAME | Varchar(200) | Yes | Contributor/Lender/Trader Name | |
CITY | Varchar(30) | Yes | City | |
STATE | Varchar(2) | Yes | State | |
ZIP_CODE | Varchar(9) | Yes | Zip Code | |
EMPLOYER | VARCHAR(38) | Yes | Employer | |
OCCUPATION | Varchar(38) | Yes | Occupation |
Primary Key: (CONTRIBUTOR_ID) Using CONTRIBUTOR_ID as the primary key ensures that each row in the table is uniquely identifiable by its contributor ID. It also allows for efficient searching and retrieval of data related to a specific contributor using the primary key index.
Schema:
CREATE TABLE contributor (
CONTRIBUTOR_ID character varying(9),
NAME character varying(200),
CITY character varying(2),
STATE character varying(2),
ZIP_CODE character varying(9),
EMPLOYER character varying(38),
OCCUPATION character varying(38)
)
- Contributionsbyindividuals
Description: The contributions by individuals relation contains information for contributions given by individuals.
Attributes & Datatyes:
Column Name | Column Type | Can be Null |
Default | Description |
---|---|---|---|---|
CMTE_ID | Varchar(9) | No | A 9-character alpha-numeric code assigned to a committee by the Federal Election Commission | |
AMNDT_IND | Varchar(1) | Yes | Indicates if the report being filed is new (N), an amendment (A) to a previous report or a termination (T) report. | |
RPT_TP | Varchar(3) | Yes | Indicates the type of report filed | |
TRANSACTION_PGI | Varchar(5) | Yes | This code indicates the election for which the contribution was made. | |
IMAGE_NUM | Varchar(18) | Yes | 18-digit Image Number Format | |
TRANSACTION_TP | Varchar(3) | Yes | Transaction Type | |
ENTITY_TP | Varchar(3) | Yes | Entity Type | |
CONTRIBUTOR_ID | Varchar(9) | Yes | Contributor Id | |
NAME | Varchar(200) | Yes | Contributor/Lender/Trader Name | |
CITY | Varchar(30) | Yes | City | |
STATE | Varchar(2) | Yes | State | |
ZIP_CODE | Varchar(9) | Yes | Zip Code | |
EMPLOYER | VARCHAR(38) | Yes | Employer | |
OCCUPATION | Varchar(38) | Yes | Occupation | |
TRANSACTION_DT | Date | Yes | Transaction. Date | |
TRANSACTION_AMT | Numeric (14,2) | Yes | Transaction Amount | |
OTHER_ID | Varchar(9) | Yes | For contributions from candidates or other committees this column will contain that contributor's FEC ID. | |
TRAN_ID | Varchar(32) | Yes | Transaction ID | |
FILE_NUM | Numeric(22) | Yes | Unique report id | |
MEMO_CD | Varchar(1) | Yes | Memo code | |
MEMO_TEXT | Varchar(100) | Yes | A description of the activity. Memo Text is available on itemized amounts on Schedules A and B. | |
SUB_ID | Numeric(19) | No | FEC record number |
Primary Key: (SUB_ID) In the case of the contributionbyindividuals table, the SUB_ID is the primary key as it uniquely identifies each contribution made by the individuals. Moreover, it is a unique identifier assigned by the Federal Election Commission (FEC) to record the individual’s contributions to committees.
Foreign Key: (CMTE_ID) The committee’s identification number is used to link the contributions made to the committees
Schema:
CREATE TABLE contributionsbyindividuals (
CMTE_ID character varying(9) NOT NULL,
AMNDT_IND character varying(1),
RPT_TP character varying(3),
TRANSACTION_PGI character varying(5),
IMAGE_NUM character varying(18),
TRANSACTION_TP character varying(3),
ENTITY_TP character varying(3),
CONTRIBUTOR_ID character varying(9),
NAME character varying(200),
CITY character varying(2),
STATE character varying(2),
ZIP_CODE character varying(9),
EMPLOYER character varying(38),
OCCUPATION character varying(38),
TRANSACTION_DT date,
TRANSACTION_AMT numeric(14,2),
OTHER_ID character varying(9),
TRAN_ID character varying(32),
FILE_NUM numeric(22,0),
MEMO_CD character varying(1),
MEMO_TEXT character varying(100),
**
SUB_ID numeric(19,0) NOT NULL PRIMARY KEY,
FOREIGN KEY (CMTE_ID) REFERENCES Committee (CMTE_ID)
)
- anytransactionfromcommitteetoanother
Description: This relation contains each contribution or independent expenditure that one committee gives to another during the two-year election cycle, including PACs, Party committees, Candidate committees or other federal committees
Attributes & Datatyes:
Column Name | Column Type | Can be Null |
Default | Description |
---|---|---|---|---|
CMTE_ID | Varchar(9) | No | A 9-character alpha-numeric code assigned to a committee by the Federal Election Commission | |
AMNDT_IND | Varchar(1) | Yes | Indicates if the report being filed is new (N), an amendment (A) to a previous report or a termination (T) report. | |
RPT_TP | Varchar(3) | Yes | Indicates the type of report filed | |
TRANSACTION_PGI | Varchar(5) | Yes | This code indicates the election for which the contribution was made. | |
IMAGE_NUM | Varchar(18) | Yes | 18-digit Image Number Format | |
TRANSACTION_TP | Varchar(3) | Yes | Transaction Type | |
ENTITY_TP | Varchar(3) | Yes | Entity Type | |
CONTRIBUTOR_ID | Varchar(3) | Yes | Contributor Id | |
NAME | Varchar(200) | Yes | Contributor/Lender/Trader Name | |
CITY | Varchar(30) | Yes | City | |
STATE | Varchar(2) | Yes | State | |
ZIP_CODE | Varchar(9) | Yes | Zip Code | |
EMPLOYER | VARCHAR(38) | Yes | Employer | |
OCCUPATION | Varchar(38) | Yes | Occupation | |
TRANSACTION_DT | Date | Yes | Transaction. Date | |
TRANSACTION_AMT | Numeric (14,2) | Yes | Transaction Amount | |
OTHER_ID | Varchar(9) | Yes | For contributions from candidates or other committees this column will contain that contributor's FEC ID. | |
TRAN_ID | Varchar(32) | Yes | Transaction ID | |
FILE_NUM | Numeric(22) | Yes | Unique report id | |
MEMO_CD | Varchar(1) | Yes | Memo code | |
MEMO_TEXT | Varchar(100) | Yes | A description of the activity. Memo Text is available on itemized amounts on Schedules A and B. | |
SUB_ID | Numeric(19) | No | FEC record number |
Primary Key: : (SUB_ID) In the case of the anytransactionfromonecommitteetoanother table, the SUB_ID is the primary key as it uniquely identifies each contribution made by the individuals. Moreover, it is a unique identifier assigned by the Federal Election Commission (FEC) to record the contributions to committees.
Foreign Key: (CMTE_ID) The committee’s identification number is used to link the contributions made from one committee to another committee
Schema:
CREATE TABLE anytransactionfromonecommitteetoanother (
CMTE_ID character varying(9) NOT NULL,
AMNDT_IND character varying(1),
RPT_TP character varying(3),
TRANSACTION_PGI character varying(5),
IMAGE_NUM character varying(18),
TRANSACTION_TP character varying(3),
ENTITY_TP character varying(3),
CONTRIBUTOR_ID character varying(9),
NAME character varying(200),
CITY character varying(2),
STATE character varying(2),
ZIP_CODE character varying(9),
EMPLOYER character varying(38),
OCCUPATION character varying(38),
TRANSACTION_DT date,
TRANSACTION_AMT numeric(14,2),
OTHER_ID character varying(9),
TRAN_ID character varying(32),
FILE_NUM numeric(22,0),
MEMO_CD character varying(1),
MEMO_TEXT character varying(100),
**
SUB_ID numeric(19,0) NOT NULL PRIMARY KEY,
FOREIGN KEY (CMTE_ID) REFERENCES Committee (CMTE_ID)
)
- currentcampaignforhouseandsenate
Description: The current campaigns for House and Senate relation contains summary financial information for each campaign committee. The relation has one record per House and Senate campaign committee and shows information about the candidate, total receipts, transfers received from authorized committees, total disbursements, transfers given to authorized committees, cash-on-hand totals, loans and debts, and other financial summary information.
Attributes & Datatyes:
Column Name | Column Type | Can be Null |
Default | Description |
---|---|---|---|---|
CAND_ID | Varchar(9) | No | Chandidate identication | |
CAND_NAME | Varchar(200) | Yes | Candidate name | |
CAND_ICI | Varchar(1) | Yes | Incumbent challenger status | |
PTY_CD | Varchar(1) | Yes | Party code | |
CAND_PTY_AFFILIATION | Varchar(3) | Yes | Party affiliation | |
TTL_RECEIPTS | Numeric(14,2) | Yes | Total receipts | |
TRANS_FROM_AUTH | Numeric(14,2) | Yes | Transfers from authorized committees | |
TTL_DISB | Numeric(14,2) | Yes | Total disbursements | |
TRANS_TO_AUTH | Numeric(14,2) | Yes | Transfers to authorized committees | |
COH_BOP | Numeric(14,2) | Yes | Beginning cash | |
COH_COP | Numeric(14,2) | Yes | Ending cash | |
CAND_CONTRIB | Numeric(14,2) | Yes | Contributions from candidate | |
CAND_LOANS | Numeric(14,2) | Yes | Loans from candidates | |
OTHER_LOANS | Numeric(14,2) | Yes | Other loans | |
CAND_LOAN_REPAY | Numeric(14,2) | Yes | Candidate loan repayments | |
OTHER_LOAN_REPAY | Numeric(14,2) | Yes | Other loan repayments | |
DEBTS_OWED_BY | Numeric(14,2) | Yes | Debts owed by | |
TTL_INDIV_CONTRIB | Numeric(14,2) | Yes | Total individuals contributions | |
CAND_OFFICE_ST | Varchar(2) | Yes | Candidate state | |
CAND_OFFICE_DISTRICT | Varchar(2) | Yes | Candidate district | |
SPEC_ELECTION | Varchar(1) | Yes | Special election status | |
PRIM_ELECTION | Varchar(1) | Yes | Primary election status | |
RUN_ELECTION | Varchar(1) | Yes | Runoff election status | |
GEN_ELECTION | Varchar(1) | Yes | General election status | |
GEN_ELECTION_PRECENT | Numeric(7,4) | Yes | General election percentage | |
OTHER_POL_CMTE_CONTRIB | Numeric(14,2) | Yes | Contributions from other political committees | |
POL_PTY_CONTRIB | Numeric(14,2) | Yes | Contributions from party committees | |
CVG_END_DT | Date | Yes | Coverage and date | |
INDIV_REFUNDS | Numeric(14,2) | Yes | Refunds to individuals | |
CMTE_REFUNDS | Numeric(14,2) | Yes | Refunds to committee |
Primary Key: (CAND_ID) This table contains one record which contains information about the candidate, total receipts, total disbursements and other details per house and senate campaign committee. So, here candidate id uniquely identifies each record in the relation.
Foreign Key: (CAND_ID) The candidate’s identification number is used to link the current-campaign-for-house-and-senate with the candidate and it’s committee.
Schema:
CREATE TABLE currentcampaignforhouseandsenate (
CAND_ID character varying(9) NOT NULL PRIMARY KEY,
CAND_NAME character varying(200),
CAND_ICI character varying(1),
PTY_CD character varying(1),
CAND_PTY_AFFILIATION character varying(3),
TTL_RECEIPTS numeric(14,2),
TRANS_FROM_AUTH numeric(14,2),
TTL_DISB numeric(14,2),
TRANS_TO_AUTH numeric(14,2),
COH_BOP numeric(14,2),
COH_COP numeric(14,2),
CAND_CONTRIB numeric(14,2),
CAND_LOANS numeric(14,2),
OTHER_LOANS numeric(14,2),
CAND_LOAN_REPAY numeric(14,2),
OTHER_LOAN_REPAY numeric(14,2),
DEBTS_OWED_BY numeric(14,2),
TTL_INDIV_CONTRIB numeric(14,2),
CAND_OFFICE_ST character varying(2),
CAND_OFFICE_DISTRICT character varying(2),
SPEC_ELECTION character varying(1),
PRIM_ELECTION character varying(1),
RUN_ELECTION character varying(1),
GEN_ELECTION character varying(1),
GEN_ELECTION_PRECENT numeric(7,4),
OTHER_POL_CMTE_CONTRIB numeric(14,2),
POL_PTY_CONTRIB numeric(14,2),
CVG_END_DT date,
INDIV_REFUNDS numeric(14,2),
CMTE_REFUNDS numeric(14,2),
FOREIGN KEY (CAND_ID) REFERENCES Candidate (CAND_ID)
)
Description: This table contains each contribution or independent expenditure made by a:
- PAC
- Party committee
- Candidate committee
- Other federal committee
Attributes and Datatypes:
Column name | Can be Null | Data type | Default | Description |
---|---|---|---|---|
CMTE_ID | N | VARCHAR2 (9) | A 9-character alpha-numeric code assigned to a committee by the Federal Election Commission | |
AMNDT_IND | Y | VARCHAR2 (1) | Indicates if the report being filed is new (N), an amendment (A) to a previous report or a termination (T) report. | |
RPT_TP | Y | VARCHAR2 (3) | Indicates the type of report filed. List of report type codes | |
TRANSACTION_PGI | Y | VARCHAR2 (5) | This code indicates the election for which the contribution was made. EYYYY (election Primary, General, Other plus election year) | |
IMAGE_NUM | Y | VARCHAR2 (11) or VARCHAR2(18) |
11-digit image number format YYOORRRFFFF YY - scanning year OO - office (01 - House, 02 - Senate, 03 - FEC Paper, 90-99 - FEC Electronic) RRR - reel number FFFF- frame number 18-digit image number normat (June 29, 2015) YYYYMMDDSSPPPPPPPP YYYY - scanning year MM - scanning month DD - scanning day SS - source (02 - Senate, 03 - FEC Paper, 90-99 - FEC Electronic) PPPPPPPP - page (reset to zero every year on January 1) |
|
TRANSACTION_TP | Y | VARCHAR2 (3) | Transaction types 10J, 11J, 13, 15J, 15Z, 16C, 16F, 16G, 16R, 17R, 17Z, 18G, 18J, 18K, 18U, 19J, 20, 20C, 20F, 20G, 20R, 22H, 22Z, 23Y, 24A, 24C, 24E, 24F, 24G, 24H, 24K, 24N, 24P, 24R, 24U, 24Z and 29 are included in the OTH file. Beginning with 2016 transaction types 30F, 30G, 30J, 30K, 31F, 31G, 31J, 31K, 32F, 32G, 32J, 32K, 40, 40Z, 41, 41Z, 42 and 42Z are also included in the OTH file. For more information about transaction type codes see this list of transaction type codes |
|
ENTITY_TP | Y | VARCHAR2 (3) | ONLY VALID FOR ELECTRONIC FILINGS received after April 2002. CAN = Candidate CCM = Candidate Committee COM = Committee IND = Individual (a person) ORG = Organization (not a committee and not a person) PAC = Political Action Committee PTY = Party Organization |
|
CONTRIBTOR_ID | Y | VARCHAR2 (9) | Contributor Id | |
NAME | Y | VARCHAR2 (200) | Name | |
CITY | Y | VARCHAR2 (30) | City | |
STATE | Y | VARCHAR2 (2) | State as a part of address | |
ZIP_CODE | Y | VARCHAR2 (9) | ZIP_code as a part of address | |
EMPLOYER | Y | VARCHAR2 (38) | Employer details | |
OCCUPATION | Y | VARCHAR2 (38) | Occupation details | |
TRANSACTION_DT | Y | DATE | Date on which transaction happened | |
TRANSACTION_AMT | Y | NUMERIC(14,2) |
Amount of transaction | |
OTHER_ID | Y | VARCHAR2 (9) | For contributions from individuals this column is null. For contributions from candidates or other committees this column will contain that contributor's FEC ID. | |
CAND_ID | Y | VARCHAR2 (9) | A 9-character alpha-numeric code assigned to a candidate by the Federal Election Commission. The candidate ID for a specific candidate remains the same across election cycles as long as the candidate is running for the same office. | |
TRAN_ID | Y | VARCHAR2 (32) | ONLY VALID FOR ELECTRONIC FILINGS. A unique identifier associated with each itemization or transaction appearing in an FEC electronic file. A transaction ID is unique for a specific committee for a specific report. In other words, if committee, C1, files a Q3 New with transaction SA123 and then files 3 amendments to the Q3 transaction SA123 will be identified by transaction ID SA123 in all 4 filings. | |
FILE_NUM | Y | NUMERIC(14,2) |
Unique report id | |
MEMO_CD | Y | VARCHAR2 (1) | 'X' indicates that the amount is NOT to be included in the itemization total. | |
MEMO_TEXT | Y | VARCHAR2 (100) | A description of the activity. Memo text is available on itemized amounts on Schedules A and B. These transactions are included in the itemization total. | |
SUB_ID | N | NUMERIC(14,2) |
Unique row ID |
Primary Key: : (SUB_ID) In the case of the independentexpenditures table, the SUB_ID is the primary key as it uniquely identifies each expenditure made by the individuals. Moreover, it is a unique identifier assigned by the Federal Election Commission (FEC) to record these expenditures.
Foreign Key: (CMTE_ID) The committee’s identification number is used to link the other tables.
Schema:
create table independentexpenditures(
CMTE_ID varchar(50),
AMNDT_IND varchar(10),
RPT_TP varchar(255),
TRANSACTION_PGI varchar(255),
TRANSACTION_TP varchar(255),
ENTITY_TP varchar(255),
CONTRIBUTOR_ID varchar(9)
NAME varchar(255),
CITY varchar(255),
STATE varchar(255),
ZIP_CODE varchar(100),
EMPLOYER varchar(100),
OCCUPATION varchar(100),
TRANSACTION_DT varchar(100),
TRANSACTION_AMT NUMERIC(255),
OTHER_ID NUMERIC(10),
TRAN_ID NUMERIC(10),
MEMO_CD varchar(50),
SUB_ID NUMERIC(50)) NOT NULL primary key,
FOREIGN KEY(CMTE_ID) REFERENCES COMMITTEE (CMTE_ID)
);
- operatingexpenditures
Description:
The file contains information about the committee making the disbursement, the report where the operating expenditure is disclosed, the entity receiving the disbursement, the disbursement’s date, amount, purpose, and additional information about the operating expenditure (if provided).The end-of-line (EOL) marker is line feed '\n' (LF, 0x0A, 10 in decimal).
Attributes and Datatypes:
Column name | Can be Null | Data type | Default | Description |
---|---|---|---|---|
CMTE_ID | N | VARCHAR2 (9) | Identification number of committee filing report. A 9-character alpha-numeric code assigned to a committee by the Federal Election Commission | |
AMNDT_IND | Y | VARCHAR2 (1) | Indicates if the report being filed is new (N), an amendment (A) to a previous report, or a termination (T) report. | |
RPT_YR | Y | NUMERIC(14,2) |
||
RPT_TP | Y | VARCHAR2 (3) | Indicates the type of report filed. List of report type codes | |
IMAGE_NUM | Y | VARCHAR2 (11) or VARCHAR2(18) | 11-digit Image Number Format YYOORRRFFFF YY - scanning year OO - office (01 - House, 02 - Senate, 03 - FEC Paper, 90-99 - FEC Electronic) RRR - reel number FFFF- frame number 18-digit Image Number Format (June 29, 2015) YYYYMMDDSSPPPPPPPP YYYY - scanning year MM - scanning month DD - scanning day SS - source (02 - Senate, 03 - FEC Paper, 90-99 - FEC Electronic) PPPPPPPP - page (reset to zero every year on January 1) |
|
LINE_NUM | Y | NUMERIC(14,2) |
Indicates FEC form line number | |
FORM_TP_CD | Y | VARCHAR2 (8) | Indicates FEC Form | |
SCHED_TP_CD | Y | VARCHAR2 (8) | Schedule B - Itemized disbursements | |
CONTRIBUTOR_ID | Y | VARCHAR2 (9) |
Contributor Id | |
NAME | Y | VARCHAR2 (200) | Name |
|
CITY | Y | VARCHAR2 (30) | City as a part of address |
|
STATE | Y | VARCHAR2 (2) | State as a part of address |
|
ZIP_CODE | Y | VARCHAR2 (9) | Zip code as a part of address | |
TRANSACTION_DT | Y | DATE |
|
|
TRANSACTION_AMT | Y |
|
Amount of transaction |
|
TRANSACTION_PGI | Y | VARCHAR2 (5) | ||
PURPOSE | Y | VARCHAR2 (100) | Purpose of transaction |
|
CATEGORY | Y | VARCHAR2 (3) | 001-012 and 101-107 | |
CATEGORY_DESC | Y | VARCHAR2 (40) | List of Disbursement Category Codes and their meaning | |
MEMO_CD | Y | VARCHAR2 (1) | 'X' indicates that the amount is NOT to be included in the itemization total. | |
MEMO_TEXT | Y | VARCHAR2 (100) | A description of the activity. Memo Text is available on itemized amounts on Schedule B. These transactions are included in the itemization total. | |
ENTITY_TP | Y | VARCHAR2 (3) | ONLY VALID FOR ELECTRONIC FILINGS received after April 2002. CAN = Candidate CCM = Candidate committee COM = Committee IND = Individual (a person) ORG = Organization (not a committee and not a person) PAC = Political action committee PTY = Party organization |
|
SUB_ID | N | NUMERIC(14,2) |
Unique row ID | |
FILE_NUM | Y | NUMERIC(14,2) |
Unique report id | |
TRAN_ID | VARCHAR2 (32) | ONLY VALID FOR ELECTRONIC FILINGS. A unique identifier associated with each itemization or transaction appearing in an FEC electronic file. A transaction ID is unique for a specific committee for a specific report. In other words, if committee, C1, files a Q3 New with transaction SA123 and then files 3 amendments to the Q3 transaction SA123 will be identified by transaction ID SA123 in all 4 filings. | ||
BACK_REF_TRAN_ID | Y | VARCHAR2 (32) | ONLY VALID FOR ELECTRONIC FILINGS. Used to associate one transaction with another transaction in the same report (using file number, transaction ID and back reference transaction ID). For example, a credit card payment and the subitemization of specific purchases. The back reference transaction ID of the specific purchases will equal the transaction ID of the payment to the credit card company. |
Primary Key: (SUB_ID) In the case of the operating expenditures table, the SUB_ID is the primary key as it uniquely identifies each expenditure made by the committee. Moreover, it is a unique identifier assigned by the Federal Election Commission (FEC) to record the expenditures of the committees.
Foreign Key: (CMTE_ID) The committee’s identification number is used to link the expenditures made from one committee to another committee.
Table Schema:
create table operatingexpenditures(
CMTE_ID varchar(50) ,
AMNDT_IND varchar(10),
RTP_YR NUMERIC(10),
RPT_TP varchar(255),
LINE_NUM NUMERIC(100),
SCHED_TP_CD NUMERIC(10),
CONTRIBUTOR_ID varchar(9),
NAME varchar(255),
CITY varchar(255),
STATE varchar(255),
ZIP_CODE varchar(100),
TRANSACTION_DT varchar(100),
TRANSACTION_AMT varchar(100),
TRANSACTION_PGI varchar(100),
PURPOSE varchar(100),
CATEGORY varchar(100),
CATEGORY_DESC varchar(100),
MEMO_CD varchar(100),
MEMO_TEXT varchar(100),
ENTITY_TP varchar(100),
SUB_ID varchar(100) NOT NULL primary key,
FILE_NUM varchar(100),
TRAN_ID varchar(100),
BACK_REF_TRAN_ID varchar(100),
FOREIGN KEY(CMTE_ID) REFERENCES COMMITTEE (CMTE_ID),
FOREIGN KEY (CONTRIBUTOR_ID) REFERENCES contributor (CONTRIBUTOR_ID)
);
- pacandpartysummary
Description:
The file has one record per PAC and Party committee and shows information about the committee, total receipts and disbursements, receipts and disbursements broken down by type, contributions to other committees, independent expenditures made by the committee, and other financial summary information. The end-of-line (EOL) marker is line feed '\n' (LF, 0x0A, 10 in decimal).
Attributes and Datatypes:
Column name | Null | Data type | Default | Description |
---|---|---|---|---|
CMTE_ID | N | VARCHAR2 (9) | ID of the committee | |
CMTE_NM | Y | VARCHAR2 (200) | Committee Name | |
CMTE_TP | Y | VARCHAR2 (1) | ||
CMTE_DSGN | Y | VARCHAR2 (1) | ||
CMTE_FILING_FREQ | Y | VARCHAR2 (1) | ||
TTL_RECEIPTS | Y | NUMERIC(14,2) | ||
TRANS_FROM_AFF | Y | NUMERIC(14,2) |
||
INDV_CONTRIB | Y | NUMERIC(14,2) |
||
OTHER_POL_CMTE_CONTRIB | Y | NUMERIC(14,2) |
||
CAND_CONTRIB | Y | NUMERIC(14,2) |
Not applicable | |
CAND_LOANS | Y | NUMERIC(14,2) |
Not applicable | |
TTL_LOANS_RECEIVED | Y | NUMERIC(14,2) |
||
TTL_DISB | Y | NUMERIC(14,2) |
||
TRANF_TO_AFF | Y | NUMERIC(14,2) |
||
INDV_REFUNDS | Y | NUMERIC(14,2) |
||
OTHER_POL_CMTE_REFUNDS | Y | NUMERIC(14,2) |
||
CAND_LOAN_REPAY | Y | NUMERIC(14,2) |
Not applicable | |
LOAN_REPAY | Y | NUMERIC(14,2) |
||
COH_BOP | Y | NUMERIC(14,2) |
||
COH_COP | Y | NUMERIC(14,2) |
||
DEBTS_OWED_BY | Y | NUMERIC(14,2) |
||
NONFED_TRANS_RECEIVED | Y | NUMERIC(14,2) |
||
CONTRIB_TO_OTHER_CMTE | Y | NUMERIC(14,2) |
||
IND_EXP | Y | NUMERIC(14,2) |
||
PTY_COORD_EXP | Y | NUMERIC(14,2) |
||
NONFED_SHARE_EXP | Y | NUMERIC(14,2) |
||
CVG_END_DT | Y | DATE(MM/DD/YYYY) | Through date |
Primary Key: (CMTE_ID) In the case of the pacandpartysummary table, the CMTE_ID is the primary key as it uniquely identifies each summary related to a committee. Moreover, it is a unique identifier assigned by the Federal Election Commission (FEC) to record the summary of these transactions.
Table Schema:
create table pacandpartysummary(
CMTE_ID varchar(9) NOT NULL primary key,
CMTE_NM varchar(200),
CMTE_TP varchar(1),
CMTE_DSGN varchar(1),
CMTE_FILING_FREQ varchar(1),
TTL_RECEIPTS NUMERIC(14,2),
TRANS_FROM_AFF NUMERIC(14,2),
INDV_CONTRIB NUMERIC(14,2),
OTHER_POL_CMTE_CONTRIB NUMERIC(14,2),
CAND_CONTRIB NUMERIC(14,2),
CAND_LOANS NUMERIC(14,2),
TTL_LOANS_RECEIVED NUMERIC(14,2),
TTL_DISB NUMERIC(14,2),
TRANF_TO_AFF NUMERIC(14,2),
INDV_REFUNDS NUMERIC(14,2),
OTHER_POL_CMTE_REFUNDS NUMERIC(14,2),
CAND_LOAN_REPAY NUMERIC(14,2),
LOAN_REPAY NUMERIC(14,2),
COH_BOP NUMERIC(14,2),
COH_COP NUMERIC(14,2),
DEBTS_OWED_BY NUMERIC(14,2),
NONFED_TRANS_RECEIVED NUMERIC(14,2),
CONTRIB_TO_OTHER_CMTE NUMERIC(14,2),
IND_EXP NUMERIC(14,2),
PTY_COORD_EXP NUMERIC(14,2),
NONFED_SHARE_EXP NUMERIC(14,2),
CVG_END_DT DATE,
FOREIGN KEY(CMTE_ID) REFERENCES COMMITTEE (CMTE_ID)
);
Note on Foreign keys: When the primary key (that the foreign key refer to) is deleted, the record with that foreign key is deleted (Cascade delete)
Database SQL Statements:
CREATE DATABASE federalelectioncommission;
DROP DATABASE federalelectioncommission;
Loading into Database:
Data for this dataset is present on federal election commission website in form of text files where each line is row separated by comma delimiter. We have written python script to load the data from these text files. Below snippet shows the code which connects to a PostgreSQL database and inserts data from text files into different tables in the database.
Insert statement for candidate:
INSERT INTO candidate (CAND_ID, CAND_NAME, CAND_PTY_AFFILIATION, CAND_ELECTION_YR, CAND_OFFICE_ST, CAND_OFFICE, CAND_OFFICE_DISTRICT, CAND_ICI, CAND_STATUS, CAND_PCC, CAND_ST1, CAND_ST2, CAND_CITY, CAND_ST, CAND_ZIP) VALUES ('C12345678', 'John Smith', 'REP', '2020', 'CA', 'H', '03', 'O', 'C', 'PCC123456', '123 Main Street', '', 'Los Angeles', 'CA', '90001');
Insert statement for committee:
INSERT INTO committee (CMTE_ID, CMTE_NM, TRES_NM, CMTE_ST1, CMTE_CITY, CMTE_ST, CMTE_ZIP, CMTE_DSGN, CMTE_TP, CMTE_PTY_AFFILIATION, CMTE_FILING_FREQ, ORG_TP, CONNECTED_ORG_NM, CAND_ID) VALUES
('C00000001', 'Friends of Jane Doe', 'John Smith', '123 Main St', 'Los Angeles', 'CA', '90001', 'A', 'P', 'DEM', 'Q', 'C', 'ABC Consulting', 'C12345678'),
('C00000002', 'Republicans for John Smith', 'Jane Doe', '456 Lincoln Blvd', 'San Francisco', 'CA', '94102', 'B', 'N', 'REP', 'M', 'P', NULL, 'C98765432'),
('C00000003', 'Independent Committee to Elect Jim Brown', 'Sarah Lee', '555 Independence Ave', 'Washington', 'DC', '20001', 'A', 'O', NULL, 'Q', 'M', 'XYZ Agency', 'C00098765');
Insert statement for candidate-committee:
INSERT INTO candidate_committee (CAND_ID, CAND_ELECTION_YR, FEC_ELECTION_YR, CMTE_ID, CMTE_TP, CMTE_DSGN, LINKAGE_ID) VALUES ('C12345678', 2022, 2022, 'C00000001', 'P', 'D', 34567);
Insert statement for contributor:
INSERT INTO contributor (CONTRIBUTOR_ID, NAME, CITY, STATE, ZIP_CODE, EMPLOYER, OCCUPATION) VALUES ('123456789', 'John Smith', 'NY', 'NY', '10001', 'ABC Inc.', 'Engineer');
Insert statement for contribution-by-individual:
INSERT INTO contributionsbyindividuals (CMTE_ID, AMNDT_IND, RPT_TP, TRANSACTION_PGI, IMAGE_NUM, TRANSACTION_TP, ENTITY_TP, CONTRIBUTOR_ID, TRANSACTION_DT, TRANSACTION_AMT, OTHER_ID, TRAN_ID, FILE_NUM, MEMO_CD, MEMO_TEXT, SUB_ID) VALUES ('C00000001', 'N', 'Q1', 'P', '12345678910111213', '15', 'IND', '123456789', '2022-01-01', '1000.00', NULL, 'ABCDE1234-2022-01-01-1', 1234567890, 'X', 'Donation for campaign expenses', 9876543210987654321);
Insert statement for any-transaction-from-one-committee-to-another: INSERT INTO anytransactionfromonecommitteetoanother (CMTE_ID, AMNDT_IND, RPT_TP, TRANSACTION_PGI,IMAGE_NUM, TRANSACTION_TP, ENTITY_TP, CONTRIBUTOR_ID,TRANSACTION_DT, TRANSACTION_AMT, OTHER_ID, TRAN_ID, FILE_NUM, MEMO_CD, MEMO_TEXT, SUB_ID) VALUES ('C00000001', 'N', 'Q1', 'P', '12345678910111213','15', 'COM', '123456789', '2022-01-01', 1000.00, NULL,'ABCDE1234-2022-01-01-1', 1234567890, 'X', 'Donation for campaign expenses', 9876543210987654321);
Insert statement for current-campaign-for-house-and-senate:
INSERT INTO currentcampaignforhouseandsenate (CAND_ID, PTY_CD, TTL_RECEIPTS, TRANS_FROM_AUTH, TTL_DISB, TRANS_TO_AUTH, COH_BOP, COH_COP, CAND_CONTRIB, CAND_LOANS, OTHER_LOANS, CAND_LOAN_REPAY, OTHER_LOAN_REPAY, DEBTS_OWED_BY, TTL_INDIV_CONTRIB, SPEC_ELECTION, PRIM_ELECTION, RUN_ELECTION, GEN_ELECTION, GEN_ELECTION_PRECENT, OTHER_POL_CMTE_CONTRIB, POL_PTY_CONTRIB, CVG_END_DT, INDIV_REFUNDS, CMTE_REFUNDS) VALUES ('C12345678', 'D', 100000.00, 5000.00, 80000.00, 6000.00, 20000.00, 18000.00, 10000.00, 5000.00, 0.00, 0.00, 0.00, 1000.00, 50000.00, 'N', 'Y', 'N','N', 0.00, 4000.00, 8000.00, '2022-11-03', 0.00, 0.00);
Insert statement for independentexpenditures:
INSERT INTO independentexpenditures (CMTE_ID, AMNDT_IND, RPT_TP, TRANSACTION_PGI, TRANSACTION_TP, ENTITY_TP, CONTRIBUTOR_ID, TRANSACTION_DT, TRANSACTION_AMT, OTHER_ID, TRAN_ID, MEMO_CD, SUB_ID) VALUES ('C00000001', 'N', 'M2', 'P', 'IE', 'ORG', '123456789', '2022-10-15', 5000.00, NULL, 987654321, NULL, 1234567890);
Insert statement for operatingexpenditures:
INSERT INTO operatingexpenditures (CMTE_ID, AMNDT_IND, RTP_YR, RPT_TP, LINE_NUM, SCHED_TP_CD, CONTRIBUTOR_ID, TRANSACTION_DT, TRANSACTION_AMT, TRANSACTION_PGI, PURPOSE, CATEGORY, CATEGORY_DESC,MEMO_CD, MEMO_TEXT, ENTITY_TP, SUB_ID, FILE_NUM, TRAN_ID, BACK_REF_TRAN_ID) VALUES ('C00000001', 'N', 2022, 'M2', 1, 1, '123456789','2022-10-15', 500.00, 'G', 'Test Purpose', 'Test Category', 'Test Category Desc', NULL, NULL, 'ORG', '1234567890', '100', '987654321', NULL);
Insert statement for pacandpartysummary:
INSERT INTO pacandpartysummary (CMTE_ID, TTL_RECEIPTS, TRANS_FROM_AFF, INDV_CONTRIB, OTHER_POL_CMTE_CONTRIB, CAND_CONTRIB, CAND_LOANS, TTL_LOANS_RECEIVED, TTL_DISB, TRANF_TO_AFF, INDV_REFUNDS, OTHER_POL_CMTE_REFUNDS, CAND_LOAN_REPAY, LOAN_REPAY, COH_BOP, COH_COP, DEBTS_OWED_BY, NONFED_TRANS_RECEIVED, CONTRIB_TO_OTHER_CMTE, IND_EXP, PTY_COORD_EXP, NONFED_SHARE_EXP, CVG_END_DT) VALUES ('C12345', 100000, 20000, 50000, 30000, 10000, 2000, 5000, 80000, 3000, 2000, 500, 1000, 200, 300000, 200000, 10000, 4000, 50000, 2000, 1000, 500, '2022-01-01');
Normalization of the tables
Functional Dependencies:
candidates
CAND_ID -> CAND_NAME, CAND_PTY_AFFILIATION, CAND_ELECTION_YR, CAND_OFFICE_ST, CAND_OFFICE, CAND_OFFICE_DISTRICT, CAND_ICI, CAND_STATUS, CAND_PCC, CAND_ST1, CAND_ST2, CAND_CITY, CAND_ST, CAND_ZIP
committees
CTME_ID -> CMTE_NM, TRES_NM, CMTE_ST1, CMTE_ST2, CMTE_CITY, CMTE_ST, CMTE_ZIP, CMTE_DSGN, CMTE_TP, CMTE_PTY_AFFILIATION, CMTE_FILING_FREQ, ORG_TP, CONNECTED_ORG_NM, CAND_ID
candidate_committee
LINKAGE_ID -> CAND_ID, CAND_ELECTION_YR, FEC_ELECTION_YR, CMTE_ID, CMTE_TP, CMTE_DSGN
contributor
CONTRIBUTOR_ID -> NAME, CITY, STATE, ZIP_CODE, EMPLOYER, OCCUPATION
contributionsbyindividuals
SUB_ID -> CMTE_ID, AMNDT_IND, RPT_TP, TRANSACTION_PGI, IMAGE_NUM, TRANSACTION_TP, ENTITY_TP, CONTRIBUTOR_ID, TRANSACTION_DT, TRANSACTION_AMT, OTHER_ID, TRAN_ID, FILE_NUM, MEMO_CD, MEMO_TEXT
CONTRIBUTOR_ID -> NAME, CITY, STATE, ZIP_CODE, EMPLOYER, OCCUPATION
anytransactionfromonecommitteetoanother
SUB_ID -> CMTE_ID, AMNDT_IND, RPT_TP, TRANSACTION_PGI, IMAGE_NUM, TRANSACTION_TP, ENTITY_TP, CONTRIBUTOR_ID, TRANSACTION_DT, TRANSACTION_AMT, OTHER_ID, TRAN_ID, FILE_NUM, MEMO_CD, MEMO_TEXT
CONTRIBUTOR_ID -> NAME, CITY, STATE, ZIP_CODE, EMPLOYER, OCCUPATION
Currentcampaignforhouseandsenate
CAND_ID -> CAND_NAME, CAND_ICI, PTY_CD, CAND_PTY_AFFILIATION, TTL_RECEIPTS, TRANS_FROM_AUTH, TTL_DISB, TRANS_TO_AUTH, COH_BOP, COH_COP, CAND_CONTRIB, CAND_LOANS, OTHER_LOANS, CAND_LOAN_REPAY, OTHER_LOAN_REPAY, DEBTS_OWED_BY, TTL_INDIV_CONTRIB, CAND_OFFICE_ST, CAND_OFFICE_DISTRICT, SPEC_ELECTION, PREM_ELECTION, RUN_ELECTION, GEN_ELECTION, GEN_ELECTION_PERCENT, OTHER_POL_CMTE_CONTRIB, POL_PTY_CONTRIB, CVG_END_DT, INDIV_REFUNDS, CMTE_REFUNDS
Independentexpenditures
SUB_ID -> CMTE_ID, AMNDT_IND, RPT_TP, TRANSACTION_PGI, TRANSACTION_TP, ENTITY_TP, CONTRIBUTOR_ID, TRANSACTION_DT, TRANSACTION_AMT, OTHER_ID, TRAN_ID, MEMO_CD
CONTRIBUTOR_ID -> NAME, CITY, STATE, ZIP_CODE, EMPLOYER, OCCUPATION
Operatingexpenditures
SUB_ID -> CMTE_ID, AMNDT_IND, RTP_YR, RTP_TP, LINE_NUM, SCHED_TP_CD, CONTRIBUTOR_ID, TRANSACTION_DT, TRANSACTION_AMT, TRANSACTION_PGI, PURPOSE, CATEGORY, CATEGORY_DESC, MEMO_CD, MEMO_TEXT, ENTITY_TP, FILE_NUM, TRAN_ID, BACK_REF_TRAN_ID
CONTRIBUTOR_ID -> NAME, CITY, STATE, ZIP_CODE
pacandpartysummary
CMTE_ID -> CMTE_NM, CMTE_TP, CMTE_DESN, CMTE_FILING_FREQ, TTL_RECEIPTS, TRANS_FROM_AFF, INDIV_CONTRIB, OTHER_POL_CMTE_CONTRIB, CAND_CONTRIB, CAND_LOANS, TTL_LOANS_RECEIVED, TTL_DISB, TRANF_TO_AFF, INDV_REFUNDS, OTHER_POL_CMTE_REFUNDS, CAND_LOAN_REPAY, LOAN_REPAY, COH_BOP, COH_COP, DEBTS_OWED_BY, NONFAD_TRANS_RECEIVED, CONTRIB_TO_OTHER_CMTE, IND_EXP, PTY_COORD_EXP, NONFED_SHARE_EXP, CVG_END_DT
The functional dependencies of each of the tables are listed above. We observe the LHS of the functional dependencies is a super key of its relation. All the functional dependencies are also non-trivial dependencies. These conditions imply that the relations are in BCNF. But, found that the relations contain redundancy anomalies. The details of the contributor repeated in all the tables, so we ended up creating a new relation contributors to store these details and add the id the contributor in the respective tables.
Indexing of the tables
Since the dataset was large, while loading the data, we had to do modifications to the data(to fit into our tables) and also since we decomposed the tables, we carried out data file manipulation and then we loaded it into the database. After doing these things and loading the data to the database, while performing the queries, our join operations specifically was taking lot of time for execution, therefore we added indexing for candidate and committee tables since all other tables mostly have cand_id and cmte_id as the foreign key. After incorporating this we could see significant improvement in the performance of the database.
CREATE INDEX idx_candid ON candidate (CAND_ID);
CREATE INDEX idx_comteid ON committee (cmte_id)
Testing the database with SQL queries.
Queries for each inserting, deleting, and updating operation in your dataset and select queries of different types of statements
- Candidate
- Committe
- Candidate_committee
- Contributor
- Contributionbyindividuals
- Anytransactionfromonecommitteetoanother
- Currentcampaignforhouseandsenate
- Independent Expenditures
- Operating Expenditures
- Pacpartyandsummary
- Joining candidate_committee and committe
- Join on candidate_committee and anytransactionsfromonetoanother
- Join on contributor and independent expenditures
- Join on operating expenditures and contributor
Query execution analysis
Problematic queries were:
1.
SELECT c.CAND_NAME, com.CMTE_NM
FROM candidate c
JOIN candidate_committee cc ON cc.CAND_ID = c.CAND_ID
JOIN committee com ON com.CMTE_ID = cc.CMTE_ID
WHERE cc.CAND_ELECTION_YR = 2020
GROUP BY c.CAND_NAME, com.CMTE_NM
ORDER BY c.CAND_NAME ASC, com.CMTE_NM ASC;
Based on the execution plan, it seems that the most expensive operation in the query is the sequential scan of the "committee" table, which has a cost of 24.29.
To improve the performance of the query, we can consider creating an index on the "CMTE_ID" column of the "committee" table, which is used in the join condition with the "candidate_committee" table. This will allow for faster retrieval of the relevant rows from the "committee" table.
CREATE INDEX committee_cmte_id_idx ON committee (CMTE_ID);
2.
SELECT *
FROM operatingexpenditures
LEFT JOIN contributor
ON operatingexpenditures.CONTRIBUTOR_ID = contributor.CONTRIBUTOR_ID;
![Graphical user interface, text, application, email
Description automatically generated](images/Aspose.Words.2c68961b-8eb5-42d1-9c96-a925b9664449.009.png)
Based on the execution pla, it appears that the problematic query involves a left join between the operatingexpenditures and contributor tables on their respective contributor_id columns. The left join operation can be expensive and can consume a lot of resources, especially if the tables are large.
To optimize this query, consider the following options:
- Add an index to the contributor_id column on both tables to speed up the join operation. This can significantly reduce the amount of time required to perform the join by allowing the database to quickly locate matching rows.
- Rewrite the query to use a more efficient join algorithm. For example, a nested loop join may be more efficient than a hash join or a merge join for small tables.
- Use a subquery or a common table expression (CTE) to filter the rows before joining. This can reduce the size of the result set and improve the efficiency of the join operation.
3.
SELECT *
FROM committee
FULL OUTER JOIN candidate_committee
ON committee.CMTE_ID = candidate_committee.CMTE_ID;
![Graphical user interface, text, application
Description automatically generated](images/Aspose.Words.2c68961b-8eb5-42d1-9c96-a925b9664449.010.png)
Based on the execution plan, it appears that the problematic query involves a full join between the candidate_committee and committee tables on their respective cmte_id columns. The full join operation can be expensive and can consume a lot of resources, especially if the tables are large.
To optimize this query, consider the following options:
- Add an index to the cmte_id column on both tables to speed up the join operation. This can significantly reduce the amount of time required to perform the join by allowing the database to quickly locate matching rows.
- Rewrite the query to use a more efficient join algorithm. For example, a nested loop join may be more efficient than a hash join or a merge join for small tables.
- Use a subquery or a common table expression (CTE) to filter the rows before joining. This can reduce the size of the result set and improve the efficiency of the join operation.