/stockslexicon

Class-based stocks lexicon

Primary LanguagePython

Table of Contents

  1. Stocks Lexicon
  2. Installation
  3. Examples
  4. Details

Stocks Lexicon

The main idea for this project is to provide a way to generalize a company's name/ticker to its size (market cap) and industry category. This repository provides methods that map a stock's name/ticker to its market capitalization and its industry category. The market capitalization of a company is categorized depending on how big the company is. The market cap categories and industry categories are listed below:

Market Capitalization (in US$) Category
Less than $50 million Nano
Between $50 million and $300 million   Micro
Between $300 million and $2 billion Small
Between $2 billion and $10 billion Mid
Between $10 billion and $200 billion Large
Grater than $200 billion Mega

Industry Categories    
Automotive Consumer Durables   Food & Beverage
Material & Construction   Insurance Banking
Wholesale Metals & Mining Computer Software & Services
Chemicals Leisure Transportation
Telecommunications Media Electronics
Health Services Utilities Tobacco
Financial Services Diversified Services Internet
Retail Specialty Retail Conglomerates
Computer Hardware Aerospace/Defense Real Estate
Drugs Energy Consumer NonDurables
Manufacturing   Empty String (none available)

Installation

First, clone the repository to your project folder:

cd Project
git clone https://github.com/Salompas/stockslexicon.git
# creates a new folder named stockslexicon

The module defines a single class Stock that has methods to interact with the data. You can import the class directly:

from stockslexicon.stocks import Stocks
stocks = Stocks()                # instantiate the class

All methods of the class are documented and can be accessed via help(Stocks).

Examples

To recover the market size of a firm:

stocks.size('AAPL')             # market size in 2018 (default)
stocks.size('AAPL', '2007')       # market size in 2007

To recover the name of a stock given its ticker:

stocks('AAPL', 'name')          # recovers name of stock with ticker AAPL
stocks('AAPL', 'legal_name')    # legal name of stock with ticker AAPL

To recover the market capitalization in dollars:

stocks('AAPL', '2007')          # market cap in dollars on 2007
stocks('AAPL', '2018')          # market cap in dollars on 2018

To recover the company's industry category:

stocks.industry('AAPL')         # industry category for AAPL
stocks.industryFromName('Apple')  # industry category for Apple

To generalize a company's name or ticker:

stocks.generalizeTicker('GOOG', '2007')  # ('Large', 'Internet')
stocks.generalizeName('Alphabet', 2018)  # ('Mega', 'Internet')

Details

The market capitalization data covers (almost) all stocks listed in the United States from 2007 to 2018 (inclusive).

For a stock to be covered, it has to be available in 2018. If the company went bankrupt in 2017, then the company's stock will not show up in the data. If a stock that is available in 2018 (in October to be precise) and is not in the data, then it is most likely the result of a bug (please report it).

The market capitalization is computed at the end of each year, except in 2018 which computes the market capitalization in some date in October (this was when I first collected the data).

The market capitalization when not available for a company in some year is replaced by a zero (0).

There are two names available for each company, a common name (denoted by name) and its legal name (denoted by legal_name). To obtain the common name we take each legal name and remove the words: Inc, Corp and Group.

The industry categories are not always available (for about 150 stocks). This happens for exchange traded funds (ETFs), over-the-counter contracts (OTCs) and some other stocks. When a category is not available an empty string is returned instead.