openscriptures/HebrewLexicon

Complete text of BDB?

Opened this issue · 26 comments

Nice work! I would like to see the complete text of BDB, including the introduction. Is that something you would consider? Does the answer depend on who does the work?

We have the front matter. It just hasn't made it into the release. The
full text is pretty much beyond the capability of one individual, so it
does depend on who does the work. I had a quirky PHP app for editing
the lexicon, that Daniel and I used to get it into its present form.
I've made some progress in updating it to the current format, and making
it somewhat more stable. Then so many other things came along, it
remains in limbo.

On 12/3/2013 10:22 AM, biblicalhumanities wrote:

Nice work! I would like to see the complete text of BDB, including the
introduction. Is that something you would consider? Does the answer
depend on who does the work?


Reply to this email directly or view it on GitHub
#3.

Hi @DavidTroidl,

How much of the text BDB is currently posted in BrownDriverBriggs.xml? Is there a rough estimate of how much remains a work in progress? How can people help with getting it completed?

thank you,
Razi

Hi,

Brown, Driver, Briggs is a huge work. We have all the entries
represented. Some of the shorter ones are complete. Most of the others
have the "most significant" information included. We don't really have
a user-friendly method of contributing, but anybody who wants to extend
the work is free to do so. It's really hard to say how much we have
completed. A very uneducated guess would be maybe 35%?

Peace,

David

On 2/26/2015 4:11 PM, Razi Shaban wrote:

Hi @DavidTroidl https://github.com/DavidTroidl,

How much of the text BDB is currently posted in BrownDriverBriggs.xml?
Is there a rough estimate of how much remains a work in progress? How
can people help with getting it completed?

thank you,
Razi


Reply to this email directly or view it on GitHub
#3 (comment).


This email has been checked for viruses by Avast antivirus software.
http://www.avast.com

Have you given any thought to scraping a website that has the BDB posted? e.g. http://biblehub.com/hebrew/776.htm

I'm not sure how the terms of use for the BDB are, but as the BDB is in the public domain, I don't see a reason why scraping the digital version there might not be allowed. The attribution given there is as follows:

"Brown-Driver-Briggs Hebrew and English Lexicon, Unabridged, Electronic Database.
Copyright © 2002, 2003, 2006 by Biblesoft, Inc.
All rights reserved. Used by permission. BibleSoft.com"

Judging by a quick look at that entry, their database is abridged. I would think that what we have already at least has as much as that one and is unencumbered by their copyright assertions.

@DavidTroidl this is a wonderful resource! I stumbled across it looking for some lexical information that I was not able to get at through the Accordance UI, and was able to export exactly what I needed using a simple XML parser. I see that "all entries are represented" from your comments above, but I was just wondering if you know for sure if all stems are present for those entries?

I just came across an entry recently that seemed to need its senses
expanded. There may in fact be some verbs that don't have all their
stems represented. I have just uploaded the latest revision.

On 3/7/2016 2:14 AM, Laney Stroup wrote:

@DavidTroidl https://github.com/DavidTroidl this is a wonderful
resource! I stumbled across it looking for some lexical information
that I was not able to get at through the Accordance UI, and was able
to export exactly what I needed using a simple XML parser. I see that
"all entries are represented" from your comments above, but I was just
wondering if you know for sure if all stems are present for those entries?


Reply to this email directly or view it on GitHub
#3 (comment).


This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

http://www.ericlevy.com/Revel/BDB/BDB/main.htm

This version of the BDB appears to be complete, although I have seen a few minor errors - numbering of senses being off, in particular. It looks to be parseable, with some effort.

Wow, that is an impressive piece of work, thanks for the link. I wonder if he would make his source files available.

From the looks of it, R. Eric Levy copied it from biblecentre.net, which is no longer online. I reached out to R. Levy, but haven't yet heard back. It's relatively easy to download the entire html of the website. Then it's just a small matter of parsing. :)

The base text is in the public domain, but some of the emendations here make me wonder if this was digitized from a newer version that someone may try to assert rights over. In any case, the core material is squarely in the public domain, and no one could protest if the core work of the BDB were parsed and redistributed from here.

Ah, well that is disappointing. I'm not terribly surprised, though.

Do we have any idea who the proper originator of the BDB data is? I'd love to have a conversation with them. Perhaps there's a way we can get it released into the commons legitimately.

Oooh, best not to mess with that.

Here's a gift!
https://github.com/jackweinbender/bdb_parse

https://liberalarts.utexas.edu/mes/news/article.php?id=6768
A team at UTexas Austin got a NEH grant to create an online Lexicon based on the BDB. The grant wasn't renewed, but they got as far as digitizing the public domain DBD printing. I swapped emails with them, and their view is that since public money paid for the work, the resulting data is public property. They gave their blessing to carry the project forward in whatever ways we can.

It's a bit rough, the data - it needs to be converted from its current form into proper unicode. There's some node/js code that does some setup, but doesn't go so far as parsing the data.

Even so - this seems like a great bounty of data.

The key map for Bwhebb is at Bible Works Fonts. This should help in constructing a search and replace script for the Hebrew. The consonants appear in reverse order, but each is followed by its vowel: bybia' means אָבִיב

There is a macro for Word 2003 that converts BibleWorks fonts to unicode. It's in the "OLE and DDE" section of the help file (towards the end: section 58 in BWks 9). It includes this guidance:

To implement them just copy the blue text below into the Word Macro editor. If you want to use a different Unicode font you will need to edit the font names in the calling routines below. In other words, change "Ezra SIL" and "Arial Unicode MS" to the names of the fonts you want to use. BibleWorks ships with "SBL Greek" and "SBL Hebrew", as well as "Ezra SIL".

I have put the macro itself in a Gist, if that helps. But anyone with BibleWorks (for many versions back) will have this already.

All,

A few things about this data.

  1. I wrote a crosswalk and converter for the legacy > Unicode conversion.

  2. There is one major issue with the Hebrew, namely, that all non-final Tsades without dagesh, for some reason, has been encoded as a het. I.e there’s not a straight forward way of knowing whether any particular “het” should actually be a tsade. You may be able to infer them based on their position in the Lexicon (all the words that start with het, obviously, are together; root aleph-het would show up before aleph-tsade, if it even exists [in which case a dictionary of BH roots could Ben helpful]).

Here’s the transcoder (it was private, sry).
https://github.com/jackweinbender/bdb_transcoder
I wrote it in Elixir, for a reason I don’t recall. I’ve stopped working on BDB stuff for the present while I finish my dissertation.

@jackweinbender This is great. Thank you.
I'd been working on a transcoder independently, over here - https://github.com/Sefaria/bdb_parse
Still have some dangling issues - could be that your work will help.

FWIW; the JSON file in the transcoder should be exhaustive.

Is there a plan to encode this as a TEI document? I’ve also got a simple digital site to display the BDB by page like (http://jastrow.semitics-archive.org), if I can find it. I’ve been playing with some computer vision stuff to split up the images into entries/paragraphs that might make transcription (or perhaps corrected OCR?) easier.

I’m going to try to keep up with these projects; I’d like to help. I was very disappointed when our NEH grant was not renewed. The BDB is such a fantastic work of scholarship, it is tragic that there isn’t a complete, open, digital edition f it yet.

@jackweinbender said:

I’ve also got a simple digital site to display the BDB by page like (http://jastrow.semitics-archive.org), if I can find it.

I hope you can find it! That would be valuable, although something the GKC on Wikisource would be remarkable. But please ping me if you mount your digi-BDB! Thanks.

I will. I’m out of town this week, but i’ll post a link whenever I get it deployed.

I actually reimplemented my BDB site using the data from this repo's XML file, since the former iteration used the buggy one referenced above. Everything seems to still work, so... as promised:

http://bdb.semitics-archive.org/

It probably sucks on mobile, FWIW.