
Dicionário Histórico Biográfico da Primeira República

This dictionary has six directories:

Holds the original html files for each dictionary entry.
Holds the markdown files from of each html file entry.
Holds the processed meta data information compiled in the first paragraph of each dictionary entry. Each meta file holds the following fields:
  • title
  • each position is comprised of 5 different fields (represented by a list, each position entry means an item of list for each field): cargoss (position), cargos_esp (detail of the position), datas_ini (initial date), datas_fim (end date), estados (state where the position was taken).
  • autor (author of the entry)
Holds the reference text in each dictionary entry.
Holds the text body in each dictionary entry.
final documents

Automatic Correct Files (under construction)

The library cl-yaml has a bug! The code below isn’t completed because of that. The strings are not double-quoted in the flow style.

(ql:quickload :optima)
(ql:quickload :cl-ppcre)
(ql:quickload :cl-yaml)

(defpackage :test
  (:use :optima :cl-ppcre :cl-yaml :cl))

(in-package :test)

(defmacro consume (flag)
     (format t "DEBUG [~a] : ~a~%" lineno line)
     (incf lineno)
     (if ,flag
	 (push line lines))))

(defun make-head (lines)
  (let ((tb (cl-yaml:parse (format nil "~{~a~^~%~}" (reverse lines)))))
    (dolist (k '("cargoss" "datas_ini" "datas_fim" "estados" "cargos_esp") tb)
      (remhash k tb))))

(defun read-file (filename)
  (with-open-file (stream filename)
    (let ((state 'start)
	  (lineno 0)
      (do ((line (read-line stream nil nil)
		 (read-line stream nil nil)))
	  ((null line)
	   (cons head (format nil "~{~a~^~%~}" (reverse lines))))
	(match (list line state lineno)
	  ((list "---" 'start _)
	   (push line lines)
	   (setf state 'head))
	  ((list "---" 'head _)
	   (push line lines)
	   (setf state 'text)
	   (setf head (make-head lines))
	   (setf lines nil))
	  ((list _ 'head _)
	   (push line lines))
	  ((list _ 'text 1)
	   (multiple-value-bind (m vals)
	       (cl-ppcre:scan-to-strings "^\\*+([^\\*]*)\\*+$" (string-trim '(#\Space) line))
	     (if m
		   (assert (equal (aref vals 0) (gethash "title" head)))
		   (consume nil))
		 (consume t))))
	  ((list _ 'text 3)
	   (multiple-value-bind (m vals)
	       (cl-ppcre:scan-to-strings "^\\\\\\*[ ]*(.+)$" (string-trim '(#\Space #\.) line))
	     (if m
		   (setf (gethash "cargos" head) (cl-ppcre:split ";[ ]*" (aref vals 0)))
		   (consume nil))
		 (consume t))))
	  ((list _ 'text _)
	   (consume t)))))))

Possible alternative with Python:

import yaml
print yaml.dump(yaml.load(open("1.tmp")), default_flow_style=False)

Ou usando haskell, idéia preliminar:

import Data.Yaml
decodeFile "text/1.tmp" :: IO (Maybe Value)