A Python module for parsing, analyzing, and manipulating GEDCOM files.
GEDCOM files contain ancestry data. The parser is currently supporting the GEDCOM 5.5 format which is detailed here.
For the latest changes please have a look at the
CHANGELOG.md
file.The current development process can be tracked in the develop branch.
The module can be installed via pip.
Run pip<version> install python-gedcom
to install or pip<version> install python-gedcom --upgrade
to upgrade to the newest version uploaded to the PyPI repository.
If you want to use the latest pre-release of the python-gedcom
package,
simply append the --pre
option to pip
: pip<version> install python-gedcom --pre
For more examples: Please have a look at the test files found in the
tests/
directory.
When successfully installed you may import the gedcom
package and use it like so:
from gedcom.element.individual import IndividualElement
from gedcom.parser import Parser
# Path to your `.ged` file
file_path = ''
# Initialize the parser
gedcom_parser = Parser()
# Parse your file
gedcom_parser.parse_file(file_path)
root_child_elements = gedcom_parser.get_root_child_elements()
# Iterate through all root child elements
for element in root_child_elements:
# Is the `element` an actual `IndividualElement`? (Allows usage of extra functions such as `surname_match` and `get_name`.)
if isinstance(element, IndividualElement):
# Get all individuals whose surname matches "Doe"
if element.surname_match('Doe'):
# Unpack the name tuple
(first, last) = element.get_name()
# Print the first and last name of the found individual
print(first + " " + last)
Large sites like Ancestry and MyHeritage (among others) don't always produce perfectly formatted GEDCOM files. If you encounter errors in parsing, you might consider disabling strict parsing which is enabled by default:
from gedcom.parser import Parser
file_path = '' # Path to your `.ged` file
gedcom_parser = Parser()
gedcom_parser.parse_file(file_path, False) # Disable strict parsing
Disabling strict parsing will allow the parser to gracefully handle the following quirks:
- Multi-line fields that don't use
CONC
orCONT
- Handle the last line not ending in a CRLF (
\r\n
)
Note: At a later state the documentation may be outsourced into individual, automatically generated wiki pages. (Makes things a little bit easier.)
The Parser
class represents the actual parser. Use this class to parse a GEDCOM file.
Note: May be imported via
from gedcom.parser import Parser
.
Method | Parameters | Returns | Description |
---|---|---|---|
invalidate_cache |
Empties the element list and dictionary to cause get_element_list() and get_element_dictionary() to return updated data |
||
get_element_list |
list of Element |
Returns a list containing all elements from within the GEDCOM file | |
get_element_dictionary |
dict of Element |
Returns a dictionary containing all elements, identified by a pointer, from within the GEDCOM file | |
get_root_element |
RootElement |
Returns a virtual root element containing all logical records as children | |
get_root_child_elements |
list of Element |
Returns a list of logical records in the GEDCOM file | |
parse_file |
str file_path, bool strict |
Opens and parses a file, from the given file path, as GEDCOM 5.5 formatted data | |
get_marriages |
IndividualElement individual |
tuple : (str date, str place) |
Returns a list of marriages of an individual formatted as a tuple (str date, str place) |
get_marriage_years |
IndividualElement individual |
list of int |
Returns a list of marriage years (as integers) for an individual |
marriage_year_match |
IndividualElement individual, int year |
bool |
Checks if one of the marriage years of an individual matches the supplied year. Year is an integer. |
marriage_range_match |
IndividualElement individual, int from_year, int to_year |
bool |
Check if one of the marriage years of an individual is in a given range. Years are integers. |
get_families |
IndividualElement individual, str family_type = gedcom.tags.GEDCOM_TAG_FAMILY_SPOUSE |
list of FamilyElement |
Return family elements listed for an individual |
get_ancestors |
IndividualElement individual, str ancestor_type = "ALL" |
list of Element |
Return elements corresponding to ancestors of an individual |
get_parents |
IndividualElement individual, str parent_type = "ALL" |
list of IndividualElement |
Return elements corresponding to parents of an individual |
find_path_to_ancestor |
IndividualElement descendant, IndividualElement ancestor, path = None |
object |
Return path from descendant to ancestor |
get_family_members |
FamilyElement family, str members_type = FAMILY_MEMBERS_TYPE_ALL |
list of IndividualElement |
Return array of family members: individual, spouse, and children |
print_gedcom |
Write GEDCOM data to stdout | ||
save_gedcom |
IO open_file |
Save GEDCOM data to a file |
An element represents a line from within the parsed GEDCOM file.
May be imported via from gedcom.element.element import Element
.
Method | Parameters | Returns | Description |
---|---|---|---|
get_level |
int |
Returns the level of this element from within the GEDCOM file | |
get_pointer |
str |
Returns the pointer of this element from within the GEDCOM file | |
get_tag |
str |
Returns the tag of this element from within the GEDCOM file | |
get_value |
str |
Returns the tag of this element from within the GEDCOM file | |
set_value |
str value |
str |
Sets the value of this element |
get_multi_line_value |
str |
Returns the value of this element including concatenations or continuations | |
set_multi_line_value |
str value |
str |
Sets the value of this element, adding concatenation and continuation lines when necessary |
get_child_elements |
list of Element |
Returns the direct child elements of this element | |
new_child_element |
str tag, str pointer = "" , str value = "" |
Element |
Creates and returns a new child element of this element |
add_child_element |
Element child |
Element |
Adds a child element to this element |
get_parent_element |
Element |
Returns the parent element of this element | |
set_parent_element |
Element parent |
Adds a parent element to this element. There's usually no need to call this method manually, add_child_element() calls it automatically. |
|
get_individual |
str |
DEPRECATED: As of version v1.0.0 use to_gedcom_string() method instead. |
|
to_gedcom_string |
bool recursive = False |
str |
Formats this element and optionally all of its sub-elements into a GEDCOM conform string |
Casting an
Element
to a string will internally call theto_gedcom_string()
method.
May be imported via from gedcom.element.family import FamilyElement
.
Method | Parameters | Returns | Description |
---|---|---|---|
is_family |
bool |
Checks if this element is an actual family |
May be imported via from gedcom.element.file import FileElement
.
Method | Parameters | Returns | Description |
---|---|---|---|
is_file |
bool |
Checks if this element is an actual file |
Represents a person from within the parsed GEDCOM file.
May be imported via from gedcom.element.individual import IndividualElement
.
Method | Parameters | Returns | Description |
---|---|---|---|
is_individual |
bool |
Checks if this element is an actual individual | |
is_deceased |
bool |
Checks if this individual is deceased | |
is_child |
bool |
Checks if this element is a child of a family | |
is_private |
bool |
Checks if this individual is marked private | |
get_name |
tuple : (str given_name, str surname) |
Returns an individual's names as a tuple: (str given_name, str surname) |
|
surname_match |
str surname_to_match |
bool |
Matches a string with the surname of an individual |
given_name_match |
str given_name_to_match |
bool |
Matches a string with the given names of an individual |
get_gender |
str |
Returns the gender of a person in string format | |
get_birth_data |
tuple : (str date, str place, list sources) |
Returns the birth data of a person formatted as a tuple: (str date, str place, list sources) |
|
get_birth_year |
int |
Returns the birth year of a person in integer format | |
get_death_data |
tuple : (str date, str place, list sources) |
Returns the death data of a person formatted as a tuple: (str date, str place, list sources) |
|
get_death_year |
int |
Returns the death year of a person in integer format | |
get_burial_data |
tuple : (str date, str place, list sources) |
Returns the burial data of a person formatted as a tuple: (str date, str´ place, list` sources) |
|
get_census_data |
list of tuple : (str date, str place, list sources) |
Returns a list of censuses of an individual formatted as tuples: (str date, str´ place, list` sources) |
|
get_last_change_date |
str |
Returns the date of when the person data was last changed formatted as a string | |
get_occupation |
str |
Returns the occupation of a person | |
birth_year_match |
int year |
bool |
Returns True if the given year matches the birth year of this person |
birth_range_match |
int from_year, int to_year |
bool |
Checks if the birth year of an individual lies within the given range |
death_year_match |
int year |
bool |
Returns True if the given year matches the death year of this person |
death_range_match |
int from_year, int to_year |
bool |
Returns True if the given year matches the death year of this person |
criteria_match |
str criteria |
bool |
Checks if this individual matches all of the given criteria. Full format for criteria : surname=[name]:given_name=[given_name]:birth[year]:birth_range=[from_year-to_year] |
May be imported via from gedcom.element.object import ObjectElement
.
Method | Parameters | Returns | Description |
---|---|---|---|
is_object |
bool |
Checks if this element is an actual object |
Virtual GEDCOM root element containing all logical records as children.
I suggest using pyenv for local development.
- Run
pip<version> install --no-cache-dir -r requirements.txt
to install dependencies - Run tests with tox (
tox
in your console)- For Python 2.7 run
tox -e py27
(you need to have Python 2.7 installed) - For Python 3.4 run
tox -e py34
(you need to have Python 3.4 installed) - For Python 3.5 run
tox -e py35
(you need to have Python 3.5 installed) - For Python 3.6 run
tox -e py36
(you need to have Python 3.6 installed)
- For Python 2.7 run
- Run
pip<version> install --no-cache-dir -r requirements.txt
to install dependencies - Run
python<version> setup.py sdist bdist_wheel
to generate distribution archives - Run
twine upload --repository-url https://test.pypi.org/legacy/ dist/*
to upload the archives to the Test Python Package Index repository
When the package is ready to be published to the real Python Package Index the
repository-url
ishttps://upload.pypi.org/legacy/
.
Please have a look at the CHANGELOG.md
file.
This module was originally based on a GEDCOM parser written by Daniel Zappala at Brigham Young University (Copyright (C) 2005) which was licensed under the GPL v2 and then continued by Mad Price Ball in 2012.
The project was taken over by Nicklas Reincke in 2018. Together with Damon Brodie a lot of changes were made and the parser was optimized.
Licensed under the GNU General Public License v2
Python GEDCOM Parser
Copyright (C) 2018 Damon Brodie (damon.brodie at gmail.com)
Copyright (C) 2018-2019 Nicklas Reincke (contact at reynke.com)
Copyright (C) 2016 Andreas Oberritter
Copyright (C) 2012 Madeleine Price Ball
Copyright (C) 2005 Daniel Zappala (zappala at cs.byu.edu)
Copyright (C) 2005 Brigham Young University
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.