/uspto

Package to parse and analyze trademark data from the United States Patent and Trademark Office

Primary LanguagePythonMIT LicenseMIT

USPTO - Trademark Parser

This application helps to parse XML files from the USPTO trademark public data that it is available in bulk form. From the XML files this packages generates python dictionaries that can be easily analyze or create CSV files to be work with other analytical tools. USPTO searchable data is viewable through a search interface on the Open Data site.

https://developer.uspto.gov/product/trademark

Installing the package

System requirements

  • Python 3

Python Hard Dependencies

  • xml
  • zipfile
  • gzip
  • bz2

To install the package located source file on your system then run:

python setup install

USPTO Notebook

With this notebook and the uspto package you can parse the XML raw trademark data from the provided by USPTO.

Loading packages

import pandas as pd
import uspto as pto

Open USPTO File

# Path to data
path = "data/apc161231-56_sample.xml"
data = pto.openUSPTO(path)

Get XML root

Getting the root might take a couple of minutes depending on size of the XML file and the RAM of your machine.

data = pto.openUSPTO(path)
root = data.getroot()

File Description

With the pto.getDetails(root) function we can extract useful information about the XML file also the volume of the trademark applications on the file.

details = pto.getDetails(root)
pd.DataFrame.from_dict(details,orient='index')
0
version-no 2.0
creation-datetime 201702250716
version-date 20041108
file-segment TRMK
action-key TX
case-files-vol 40382

Extracting and Creating tables

Case File Header

Extract the case file header data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.

file_header = pto.getFileHeader(root)
table = pd.DataFrame.from_dict(file_header, orient='index')
table.head()
location-date use-application-currently-in amended-to-itu-application-in filing-basis-filed-as-44d-in collective-trademark-in section-8-accepted-in standard-characters-claimed-in drawing-3d-filed-in foreign-priority-in color-drawing-current-in ... filing-date attorney-name attorney-docket-number employee-name law-office-assigned-location-code published-for-opposition-date domestic-representative-name abandonment-date amend-to-register-date registration-date
87252004 20161205 T F F F F F F F T ... 20161130 NaN NaN NaN NaN NaN NaN NaN NaN NaN
87252005 20161205 F F F F F F F F F ... 20161130 Julie A. Hopkins 100859.1.7 NaN NaN NaN Julie A. Hopkins NaN NaN NaN
87252006 20161205 F F F F F T F F F ... 20161130 Paul R. Fransway 73285-2 NaN NaN NaN Paul R. Fransway NaN NaN NaN
87252007 20161205 T F F F F F F F T ... 20161130 Christopher J. Woods 1010933 NaN NaN NaN Christopher J. Woods NaN NaN NaN
87252008 20161205 F F F F F F F F F ... 20161130 Julie A. Hopkins 100859.1.7 NaN NaN NaN Julie A. Hopkins NaN NaN NaN

5 rows × 64 columns

table.to_csv("casefileHeader.csv")

Case File Classification

Extract the case file classification data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.

classifications = pto.getClassifications(root)
data = []
for k in classifications.keys():
    for d in classifications[k]:
        data.append(classifications[k][d])
table = pd.DataFrame(data)
table.head()
first-use-anywhere-date first-use-in-commerce-date international-code international-code-total-no primary-code serial-number status-code status-date us-code us-code-total-no
0 0 0 042 1 042 87326720 6 20170210 100,101 2
1 0 0 025 1 025 87331869 6 20170216 022,039 2
2 0 0 009 1 009 87326722 6 20170210 021,023,026,036,038 5
3 0 0 016 1 016 87326722 6 20170210 002,005,022,023,029,037,038,050 8
4 0 0 036 1 036 87326722 6 20170210 100,101,102 3
table.to_csv("classifications.csv")

Case File Classification Codes

Extract the case file classification codes from the XML file, this table can also be obtanied from the classification table. This function creates a dictionary that can be transform as a table using Pandas.

classification_codes = pto.getClassificationCodes(root)
data = []
for k in classification_codes.keys():
    for d in classification_codes[k]:
        data.append(classification_codes[k][d])
table = pd.DataFrame(data)
table.head()
international-code serial-number us-code
0 042 87326720 100,101
1 025 87331869 022,039
2 009 87326722 021,023,026,036,038
3 016 87326722 002,005,022,023,029,037,038,050
4 036 87326722 100,101,102
table.to_csv("classification_codes.csv")

Case File Design Search

Extract the case file Design Search data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.

design = pto.getDesignSearch(root)
data = []
for k in design.keys():
    for d in design[k]:
        data.append(design[k][d])
table = pd.DataFrame(data)
table.head()
code serial-number
0 031519 87326722
1 031524 87326722
2 031525 87326722
3 260121 87326722
4 021108 87277572
table.to_csv("designSearch.csv")

Case File Owners

Extract the case file owners data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.

owners = pto.getFileOwners(root)
data = []
for k in owners.keys():
    for d in owners[k]:
        data.append(owners[k][d])
table = pd.DataFrame(data)
table.head()
address-1 address-2 city composed-of-statement country dba-aka-text entity-statement entry-number legal-entity-type-code nationality other party-name party-type postcode serial-number state
0 637 W 58th St NaN Kansas City NaN NaN NaN NaN 1 16 {'state': 'MO'} NaN MSMJ 10 64113 87326720 MO
1 12243 Washington Ave NaN Blue Island NaN NaN NaN NaN 1 01 {'country': 'US'} NaN Greg English 10 60406 87331869 IL
2 5100 South I-35 Service Rd NaN Oklahoma City NaN NaN NaN chartered bank 1 99 {'state': 'OK'} NaN Frontier State Bank 10 73129 87326722 OK
3 P.O. Box 943 1621 East Electric Avenue McAlester NaN NaN NaN NaN 1 03 {'state': 'OK'} NaN Big V Feeds, Inc. 10 74502 87326723 OK
4 6900 Interbay Blvd NaN Tampa NaN NaN NaN NaN 1 16 {'state': 'FL'} NaN LJ Avalon LLC 10 33616 87320958 FL
table.to_csv("fileOwners.csv")

Case File Statements

Extract the case file statements data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.

statements = pto.getFileStatements(root)
data = []
for k in statements.keys():
    for d in statements[k]:
        data.append(statements[k][d])
table = pd.DataFrame(data)
table.head()
serial-number text type-code
0 87326720 Inspecting buildings for the existence of mold GS0421
1 87331869 Athletic apparel, namely, headwear; headwear GS0251
2 87331869 MASTER KICK MAN PM0001
3 87326722 The color(s) blue, white, and grey is/are clai... CC0000
4 87326722 The mark consists of a white soaring eagle wit... DM0000
table.to_csv("fileStatements.csv")

Case File Foreign Applications

Extract the case file Foreign Applications data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.

foreign = pto.getForeignApplications(root)
data = []
for k in foreign.keys():
    for d in foreign[k]:
        data.append(foreign[k][d])
table = pd.DataFrame(data)
table.head()
application-number country entry-number filing-date foreign-priority-claim-in other registration-date registration-expiration-date registration-number registration-renewal-date serial-number
0 569192 PT 1 20160812 T NaN NaN NaN NaN NaN 87330826
1 015719925 EM 1 20160803 T NaN NaN NaN NaN NaN 87322637
2 302016033472 DE 1 20161124 T NaN NaN NaN NaN NaN 87322641
3 016181281 EU 1 20161219 T NaN NaN NaN NaN NaN 87273490
4 1777139 AU 1 20160616 T NaN NaN NaN NaN NaN 87262553
table.to_csv("foreignApplications.csv")

Case File Prior Applications

Extract the case file Prior Applications data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.

prior = pto.getPriorApplications(root)
data = []
for k in prior.keys():
    for d in prior[k]:
        data.append(prior[k][d])
table = pd.DataFrame(data)
table.head()
number other-related-in prior-registration-application relationship-type serial-number
0 3487431 F 2 0 87261195
1 4739670 F 2 0 87261195
2 1186117 F 3 0 87273474
3 3053476 F 3 0 87273474
4 4447492 F 3 0 87273474
table.to_csv("priorApplications.csv")

Case File Events

Extract the case file events data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.

events = pto.getFileEvent(root)
data = []
for k in events.keys():
    for d in events[k]:
        data.append(events[k][d])
table = pd.DataFrame(data)
table.head()
code date description-text number serial-number type
0 NWOS 20170210 NEW APPLICATION OFFICE SUPPLIED DATA ENTERED I... 2 87326720 I
1 NWAP 20170210 NEW APPLICATION ENTERED IN TRAM 1 87326720 I
2 MPMK 20170217 NOTICE OF PSEUDO MARK E-MAILED 3 87331869 E
3 NWOS 20170216 NEW APPLICATION OFFICE SUPPLIED DATA ENTERED I... 2 87331869 I
4 NWAP 20170214 NEW APPLICATION ENTERED IN TRAM 1 87331869 I
table.to_csv("fileEvent.csv")

Case File Correspondent

Extract the case file correspondent data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.

correspondent = pto.getCorrespondent(root)
data = []
for k in correspondent.keys():
        data.append(correspondent[k])
table = pd.DataFrame(data)
table.head()
address-1 address-2 address-3 address-4 address-5 serial-number
0 MSMJ 637 W 58TH ST KANSAS CITY, MO 64113 NaN NaN 87326720
1 KELLY A. DONAHUE VERRILL DANA, LLP ONE PORTLAND SQUARE PORTLAND, ME 04112-0586 NaN 87325322
2 BARBOSA, JAIME 15921 SW 61 STREET DAVIE, FL 33331 NaN NaN 87326721
3 SCOTT NYMAN NYMAN IP LLC 20 NORTH WACKER DRIVE, SUITE 1200 CHICAGO, IL 60606 NaN 87331869
4 JASON GOLDSMITH GOLDSMITH ASSOCIATES, PLLC P.O. BOX 140091 P.O. BOX 140091 DALLAS, TX 75214 87326722
table.to_csv("correspondent.csv")

Case File Madrid Filing

Extract the case file Madrid Filing data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.

madrid_filing = pto.getMadridFiling(root)
data = []
for k in madrid_filing.keys():
    data.append(madrid_filing[k])
table = pd.DataFrame(data)
table.head()
entry-number international-registration-date international-registration-number international-renewal-date international-status-code international-status-date irregularity-reply-by-date madrid-history-events original-filing-date-uspto reference-number serial-number
0 1 NaN NaN NaN 403 20170213 NaN 3 20170210 A0064942 87322369
1 1 NaN NaN NaN 403 20170214 NaN 3 20170213 A0064960 87328683
2 1 NaN NaN NaN 403 20170213 NaN 3 20170210 A0064942 87322372
3 1 NaN NaN NaN 403 20170216 NaN 3 20170214 A0064995 87330276
4 1 NaN NaN NaN 403 20170213 NaN 3 20170210 A0064942 87322374
table.to_csv("madridFiling.csv")

Case File Madrid Events

Extract the case file Madrid Events data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.

madrid_events = pto.getMadridEvents(root)
data = []
for k in madrid_events.keys():
    for d in madrid_events[k]:
        data.append(madrid_events[k][d])
table = pd.DataFrame(data)
table.head()
code date description-text entry-number serial-number
0 NEWAP 20170210 NEW APPLICATION FOR IR RECEIVED 1 87322369
1 MCERT 20170213 MANUALLY CERTIFIED 2 87322369
2 APPST 20170213 IR CERTIFIED AND SENT TO IB 3 87322369
3 NEWAP 20170213 NEW APPLICATION FOR IR RECEIVED 1 87328683
4 MCERT 20170214 MANUALLY CERTIFIED 2 87328683
table.to_csv("madridEvents.csv")

Tables

The following table schema diagram from 2015 is a good example of what you can expect to be on the USPTO trademark data.

case files schema high level 2015