USPTO - Trademark Parser

This application helps to parse XML files from the USPTO trademark public data that it is available in bulk form. From the XML files this packages generates python dictionaries that can be easily analyze or create CSV files to be work with other analytical tools. USPTO searchable data is viewable through a search interface on the Open Data site.

https://developer.uspto.gov/product/trademark

Installing the package

System requirements

Python 3

Python Hard Dependencies

xml
zipfile
gzip
bz2

To install the package located source file on your system then run:

python setup install

USPTO Notebook

With this notebook and the uspto package you can parse the XML raw trademark data from the provided by USPTO.

Loading packages

import pandas as pd
import uspto as pto

Open USPTO File

# Path to data
path = "data/apc161231-56_sample.xml"
data = pto.openUSPTO(path)

Get XML root

Getting the root might take a couple of minutes depending on size of the XML file and the RAM of your machine.

data = pto.openUSPTO(path)
root = data.getroot()

File Description

With the pto.getDetails(root) function we can extract useful information about the XML file also the volume of the trademark applications on the file.

details = pto.getDetails(root)
pd.DataFrame.from_dict(details,orient='index')

	0
version-no	2.0
creation-datetime	201702250716
version-date	20041108
file-segment	TRMK
action-key	TX
case-files-vol	40382

Extracting and Creating tables

Case File Header

Extract the case file header data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.

file_header = pto.getFileHeader(root)

table = pd.DataFrame.from_dict(file_header, orient='index')
table.head()

	location-date	use-application-currently-in	amended-to-itu-application-in	filing-basis-filed-as-44d-in	collective-trademark-in	section-8-accepted-in	standard-characters-claimed-in	drawing-3d-filed-in	foreign-priority-in	color-drawing-current-in	...	filing-date	attorney-name	attorney-docket-number	employee-name	law-office-assigned-location-code	published-for-opposition-date	domestic-representative-name	abandonment-date	amend-to-register-date	registration-date
87252004	20161205	T	F	F	F	F	F	F	F	T	...	20161130	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
87252005	20161205	F	F	F	F	F	F	F	F	F	...	20161130	Julie A. Hopkins	100859.1.7	NaN	NaN	NaN	Julie A. Hopkins	NaN	NaN	NaN
87252006	20161205	F	F	F	F	F	T	F	F	F	...	20161130	Paul R. Fransway	73285-2	NaN	NaN	NaN	Paul R. Fransway	NaN	NaN	NaN
87252007	20161205	T	F	F	F	F	F	F	F	T	...	20161130	Christopher J. Woods	1010933	NaN	NaN	NaN	Christopher J. Woods	NaN	NaN	NaN
87252008	20161205	F	F	F	F	F	F	F	F	F	...	20161130	Julie A. Hopkins	100859.1.7	NaN	NaN	NaN	Julie A. Hopkins	NaN	NaN	NaN

5 rows × 64 columns

table.to_csv("casefileHeader.csv")

Case File Classification

Extract the case file classification data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.

classifications = pto.getClassifications(root)

data = []
for k in classifications.keys():
    for d in classifications[k]:
        data.append(classifications[k][d])

table = pd.DataFrame(data)
table.head()

	international-code	international-code-total-no	primary-code	serial-number	status-code	status-date	us-code	us-code-total-no
0	042	1	042	87326720	6	20170210	100,101	2
1	025	1	025	87331869	6	20170216	022,039	2
2	009	1	009	87326722	6	20170210	021,023,026,036,038	5
3	016	1	016	87326722	6	20170210	002,005,022,023,029,037,038,050	8
4	036	1	036	87326722	6	20170210	100,101,102	3

table.to_csv("classifications.csv")

Case File Classification Codes

Extract the case file classification codes from the XML file, this table can also be obtanied from the classification table. This function creates a dictionary that can be transform as a table using Pandas.

classification_codes = pto.getClassificationCodes(root)

data = []
for k in classification_codes.keys():
    for d in classification_codes[k]:
        data.append(classification_codes[k][d])

table = pd.DataFrame(data)
table.head()

	international-code	serial-number	us-code
0	042	87326720	100,101
1	025	87331869	022,039
2	009	87326722	021,023,026,036,038
3	016	87326722	002,005,022,023,029,037,038,050
4	036	87326722	100,101,102

table.to_csv("classification_codes.csv")

Case File Design Search

Extract the case file Design Search data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.

design = pto.getDesignSearch(root)

data = []
for k in design.keys():
    for d in design[k]:
        data.append(design[k][d])

table = pd.DataFrame(data)
table.head()

	code	serial-number
0	031519	87326722
1	031524	87326722
2	031525	87326722
3	260121	87326722
4	021108	87277572

table.to_csv("designSearch.csv")

Case File Owners

Extract the case file owners data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.

owners = pto.getFileOwners(root)

data = []
for k in owners.keys():
    for d in owners[k]:
        data.append(owners[k][d])

table = pd.DataFrame(data)
table.head()

	address-1	address-2	city	composed-of-statement	country	dba-aka-text	entity-statement	entry-number	legal-entity-type-code	nationality	other	party-name	party-type	postcode	serial-number	state
0	637 W 58th St	NaN	Kansas City	NaN	NaN	NaN	NaN	1	16	{'state': 'MO'}	NaN	MSMJ	10	64113	87326720	MO
1	12243 Washington Ave	NaN	Blue Island	NaN	NaN	NaN	NaN	1	01	{'country': 'US'}	NaN	Greg English	10	60406	87331869	IL
2	5100 South I-35 Service Rd	NaN	Oklahoma City	NaN	NaN	NaN	chartered bank	1	99	{'state': 'OK'}	NaN	Frontier State Bank	10	73129	87326722	OK
3	P.O. Box 943	1621 East Electric Avenue	McAlester	NaN	NaN	NaN	NaN	1	03	{'state': 'OK'}	NaN	Big V Feeds, Inc.	10	74502	87326723	OK
4	6900 Interbay Blvd	NaN	Tampa	NaN	NaN	NaN	NaN	1	16	{'state': 'FL'}	NaN	LJ Avalon LLC	10	33616	87320958	FL

table.to_csv("fileOwners.csv")

Case File Statements

Extract the case file statements data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.

statements = pto.getFileStatements(root)

data = []
for k in statements.keys():
    for d in statements[k]:
        data.append(statements[k][d])

table = pd.DataFrame(data)
table.head()

	serial-number	text	type-code
0	87326720	Inspecting buildings for the existence of mold	GS0421
1	87331869	Athletic apparel, namely, headwear; headwear	GS0251
2	87331869	MASTER KICK MAN	PM0001
3	87326722	The color(s) blue, white, and grey is/are clai...	CC0000
4	87326722	The mark consists of a white soaring eagle wit...	DM0000

table.to_csv("fileStatements.csv")

Case File Foreign Applications

Extract the case file Foreign Applications data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.

foreign = pto.getForeignApplications(root)

data = []
for k in foreign.keys():
    for d in foreign[k]:
        data.append(foreign[k][d])

table = pd.DataFrame(data)
table.head()

	application-number	country	entry-number	filing-date	foreign-priority-claim-in	other	registration-date	registration-expiration-date	registration-number	registration-renewal-date	serial-number
0	569192	PT	1	20160812	T	NaN	NaN	NaN	NaN	NaN	87330826
1	015719925	EM	1	20160803	T	NaN	NaN	NaN	NaN	NaN	87322637
2	302016033472	DE	1	20161124	T	NaN	NaN	NaN	NaN	NaN	87322641
3	016181281	EU	1	20161219	T	NaN	NaN	NaN	NaN	NaN	87273490
4	1777139	AU	1	20160616	T	NaN	NaN	NaN	NaN	NaN	87262553

table.to_csv("foreignApplications.csv")

Case File Prior Applications

Extract the case file Prior Applications data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.

prior = pto.getPriorApplications(root)

data = []
for k in prior.keys():
    for d in prior[k]:
        data.append(prior[k][d])

table = pd.DataFrame(data)
table.head()

	number	other-related-in	prior-registration-application	serial-number
0	3487431	F	2	87261195
1	4739670	F	2	87261195
2	1186117	F	3	87273474
3	3053476	F	3	87273474
4	4447492	F	3	87273474

table.to_csv("priorApplications.csv")

Case File Events

Extract the case file events data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.

events = pto.getFileEvent(root)

data = []
for k in events.keys():
    for d in events[k]:
        data.append(events[k][d])

table = pd.DataFrame(data)
table.head()

	code	date	description-text	number	serial-number	type
0	NWOS	20170210	NEW APPLICATION OFFICE SUPPLIED DATA ENTERED I...	2	87326720	I
1	NWAP	20170210	NEW APPLICATION ENTERED IN TRAM	1	87326720	I
2	MPMK	20170217	NOTICE OF PSEUDO MARK E-MAILED	3	87331869	E
3	NWOS	20170216	NEW APPLICATION OFFICE SUPPLIED DATA ENTERED I...	2	87331869	I
4	NWAP	20170214	NEW APPLICATION ENTERED IN TRAM	1	87331869	I

table.to_csv("fileEvent.csv")

Case File Correspondent

Extract the case file correspondent data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.

correspondent = pto.getCorrespondent(root)

data = []
for k in correspondent.keys():
        data.append(correspondent[k])

table = pd.DataFrame(data)
table.head()

	address-1	address-2	address-3	address-4	address-5	serial-number
0	MSMJ	637 W 58TH ST	KANSAS CITY, MO 64113	NaN	NaN	87326720
1	KELLY A. DONAHUE	VERRILL DANA, LLP	ONE PORTLAND SQUARE	PORTLAND, ME 04112-0586	NaN	87325322
2	BARBOSA, JAIME	15921 SW 61 STREET	DAVIE, FL 33331	NaN	NaN	87326721
3	SCOTT NYMAN	NYMAN IP LLC	20 NORTH WACKER DRIVE, SUITE 1200	CHICAGO, IL 60606	NaN	87331869
4	JASON GOLDSMITH	GOLDSMITH ASSOCIATES, PLLC	P.O. BOX 140091	P.O. BOX 140091	DALLAS, TX 75214	87326722

table.to_csv("correspondent.csv")

Case File Madrid Filing

Extract the case file Madrid Filing data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.

madrid_filing = pto.getMadridFiling(root)

data = []
for k in madrid_filing.keys():
    data.append(madrid_filing[k])

table = pd.DataFrame(data)
table.head()

	entry-number	international-registration-date	international-registration-number	international-renewal-date	international-status-code	international-status-date	irregularity-reply-by-date	madrid-history-events	original-filing-date-uspto	reference-number	serial-number
0	1	NaN	NaN	NaN	403	20170213	NaN	3	20170210	A0064942	87322369
1	1	NaN	NaN	NaN	403	20170214	NaN	3	20170213	A0064960	87328683
2	1	NaN	NaN	NaN	403	20170213	NaN	3	20170210	A0064942	87322372
3	1	NaN	NaN	NaN	403	20170216	NaN	3	20170214	A0064995	87330276
4	1	NaN	NaN	NaN	403	20170213	NaN	3	20170210	A0064942	87322374

table.to_csv("madridFiling.csv")

Case File Madrid Events

Extract the case file Madrid Events data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.

madrid_events = pto.getMadridEvents(root)

data = []
for k in madrid_events.keys():
    for d in madrid_events[k]:
        data.append(madrid_events[k][d])

table = pd.DataFrame(data)
table.head()

	code	date	description-text	entry-number	serial-number
0	NEWAP	20170210	NEW APPLICATION FOR IR RECEIVED	1	87322369
1	MCERT	20170213	MANUALLY CERTIFIED	2	87322369
2	APPST	20170213	IR CERTIFIED AND SENT TO IB	3	87322369
3	NEWAP	20170213	NEW APPLICATION FOR IR RECEIVED	1	87328683
4	MCERT	20170214	MANUALLY CERTIFIED	2	87328683

table.to_csv("madridEvents.csv")

Tables

The following table schema diagram from 2015 is a good example of what you can expect to be on the USPTO trademark data.

case files schema high level 2015

jlroo/uspto