/tagextract

Python solution to extract tagged lines from text files with markdown or zim wiki syntax

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

tagextract

Python solution to extract tagged lines from text files with markdown or zim wiki syntax

What is this?

A simple experimental tool to extract lines that are tagged by a @tag from a text file.

The text file is supposed to be composed in either markdown or zim wiki syntax, although only some certain syntax features are actually parsed (headings and link definitions so far).

The extraction works by grouping lines according to their tab levels, so it is important to write the text files following relatively strict indention rules (much like a python script).

The purpose of this tagged information retrieving is that you can take notes in markdown or zim, tag a block or a number of blocks underneath the block (blocks) as you go. Then use this to retrieve all blocks that are tagged/decorated by a certain tag (e.g. @math). This is like the within-note equivalent to the tag-filtering in for example Evernote, zim (with a tag plugin) or some other tools. The concept is the same except it allows you to resolve deeper into a single note.

And the price to this deeper resolving is strict indention. I've yet to find a more clever way to relax this requirement.

Concept illustrated

Below is a sample dummy text in markdown:

# heading 1

this is block 0 this is block 0
this is block 0 this is block 0
this is block 0 this is block 0

## heading 1.1                                         ===
                                                        |
this is block 1 this is block 1 this is block 1         |
this is block 1 this is block 1 this is block 1         |
                                                        |
	- @tag1, @tag2                                 ===
                                                        
this is block 2 this is block 2 this is block 2
this is block 2 this is block 2 this is block 2
this is block 2 this is block 2 this is block 2

this is block 3 this is block 3 this is block 3
this is block 3 this is block 3 this is block 3
this is block 3 this is block 3 this is block 3

	- @tag2, @tag3

![img](~/Pictures/figure.png)

	this is block 4 this is block 4
	this is block 4 this is block 4
	this is block 4 this is block 4

		- @tag3; @tag4

		this is block 5 this is block 5
		this is block 5 this is block 5

			- @tag5
                                                       ===
	this is block 6 this is block 6                 |
	this is block 6 this is block 6                 |
	this is block 6 this is block 6                 |
                                                        |
		@tag1                                  ===

The output of extracting @tag1 is:

#  Summary of tag: @tag1 # 

## heading 1.1

this is block 1 this is block 1 this is block 1
this is block 1 this is block 1 this is block 1

	- @tag1, @tag2


	this is block 6 this is block 6
	this is block 6 this is block 6
	this is block 6 this is block 6

		@tag1

Those are the lines denoted by markers on the right.

The "territory" of a tag includes:

  • The line where @tag is found.
  • Searching upwards, immediate lines that belong to the same tag definition block if your tags run more than one line.
  • Searching upwards, immediate blocks that are one tab-level higher, untill hitting a header line, or another tag definition line.
  • Searching downwards, immediate lines that belong to the same tag definition block if your tags run more than one line.
  • Searching downwards, all blocks with same or lower tab-levels, untill hitting a header line, or a block with higher tab-level.
  • Link definition lines (for instance if you are processing note files generated by zim, an image link starts at the beginning of line) are included always.

Following the above rules, filtering by @tag2 will give:

#  Summary of tag: @tag2 # 

## heading 1.1

this is block 1 this is block 1 this is block 1
this is block 1 this is block 1 this is block 1

	- @tag1, @tag2

this is block 2 this is block 2 this is block 2
this is block 2 this is block 2 this is block 2
this is block 2 this is block 2 this is block 2

this is block 3 this is block 3 this is block 3
this is block 3 this is block 3 this is block 3
this is block 3 this is block 3 this is block 3

	- @tag2, @tag3

![img](~/Pictures/figure.png)

	this is block 4 this is block 4
	this is block 4 this is block 4
	this is block 4 this is block 4

		- @tag3; @tag4

		this is block 5 this is block 5
		this is block 5 this is block 5

			- @tag5

	this is block 6 this is block 6
	this is block 6 this is block 6
	this is block 6 this is block 6

		@tag1

And the output of @tag5 is:

#  Summary of tag: @tag5 # 


		this is block 5 this is block 5
		this is block 5 this is block 5

			- @tag5

Usage

python tagextract.py -m|-z file tag [-o] outfile

where:

-m: file is in markdown. -z: file is in zim. file: input file path. tag: tag to extract. -o: optional output file name, default to "_tag-.txt"