ZiegleIt

Ever wanted to be able to rip through books like your name was Alex Ziegler? Well, now you can ZiegleIt! With our app you can quickly generate a summary of any article on the web to expedite your learning. When you're short on time, ZiegleIt.

##MVP Our MVP will have the following features:

Scrape web URLs and copy content with Nokogiri
Take scraped content and generate a summary based on a fixed(?) compression ratio
Deliver content to a txt file
At this point in time we expect our algorithm to be optimized for Wikipedia articles

##Document Structure

Title (depth: 1)
- Chapter (2)
- Chapter (2)
- Chapter (2)
  - Section (3)
    - Sub Section (4)
      - Paragraph (5)
        
        Sentence (6)
        
        Word (6.content)
      - Paragraph
      - Paragraph
    - Sub Section
    - Sub Section
  - Section
  - Section
- Chapter

##Parsing Rules v1

There are a lot of blank elements (I'm guessing closing tags?) so first and foremost we need a guard clause that prevents nodes with blank inner_xml from making their way into the content:

if node.inner_xml != ""

Break when See also</span> is reached in the current Nokogiri node inner xml. This is checked by matching a RegExp:

break if node.inner_xml.match(/(See also<\/span>)/)

The current section's text is no longer meaningful if a </span> tag has been reached. This is checked via RegExp as well:

puts "Section: #{node.inner_xml.match(/[ \w]+(?=<\/span>)/)}"

The table of contents is ignored (we are building our own afterall). We achieve this by excluding any node that has an h2 parent with inner xml of Contents. We add it to our guard clause.

if (node.inner_xml != "") && (node.inner_xml != "Contents")

##Algorithm

##Next Steps

Start calculating some word scores

##Resources

pdwittig/ZiegleIt

ZiegleIt