使用 doc-merger 可以对两个文档中的内容进行比较和分析,然后在文档一的基础上,将文档二中的数据覆盖到文档一中对应的部分,输出合并结果并筛选出只存在于文档二中的数据。
假设您有两个文本文件,doc1.txt
和 doc2.txt
,这两个文件中都包含一些电视剧集的信息,但是信息都不完整,您希望使用 doc2.txt
补充和覆盖 doc1.txt
中的部分内容,并提取出只存在于 doc2.txt
中的数据。
doc1.txt
的内容如下:
1;2023-01-01;45;The first episode
2;2023-01-08;45;The second episode
3;2023-01-15;45;The third episode
doc2.txt
的内容如下:
2023-01-01;45;The first episode;This is the first episode of the show.
2023-01-08;45;The second episode;This is the second episode of the show.
2023-01-22;45;The special episode;This is the special episode of the show.
运行 doc-merger.py
脚本后,将生成一个名为 result.txt
的新文件,其中包含合并后的数据:
1;2023-01-01;45;The first episode;This is the first episode of the show.
2;2023-01-08;45;The second episode;This is the second episode of the show.
3;2023-01-15;45;The third episode;
此外,脚本还会生成一个名为 doc2_only.txt
的文件,其中包含只存在于 doc2.txt
中的数据:
2023-01-22;45;The special episode;This is the special episode of the show.
脚本还会打印一些统计信息:
Merged: 2
Doc1 only: 1
Doc2 only: 1
- 请确保您的系统上安装了 Python 3.0 或更高版本。
- 请确保文本文件中的数据格式符合脚本中定义的格式。例如,
doc1.txt
中的每一行应该包含四个由分号分隔的字段,分别表示剧集编号、日期、时长和标题;doc2.txt
中的每一行应该包含四个由分号分隔的字段,分别表示日期、时长、标题和描述。
- 将仓库克隆或下载到计算机上的一个目录中。
- 修改
start.command (Mac)
或start.bat (Win)
中的路径,以指向您存放doc-merger.py
脚本的目录。 - 将要处理的文本分别保存为
doc1.txt
和doc2.txt
文件,并放在与脚本相同的目录中。 - 双击运行
start.command
或start.bat
脚本以执行doc-merger.py
脚本。 - 结果将写入到同一目录下名为
result.txt
和doc2_only.txt
的文件中。
With doc-merger, you can compare and analyze the contents of two documents, then overlay the data from document two onto the corresponding part of document one based on document one, output the merged results and filter out data that only exists in document two.
Suppose you have two text files, doc1.txt
and doc2.txt
, both of which contain some information about TV episodes, but the information is incomplete. You want to use doc2.txt
to supplement and overwrite some of the content in doc1.txt
, and extract data that only exists in doc2.txt
.
The content of doc1.txt
is as follows:
1;2023-01-01;45;The first episode
2;2023-01-08;45;The second episode
3;2023-01-15;45;The third episode
The content of doc2.txt
is as follows:
2023-01-01;45;The first episode;This is the first episode of the show.
2023-01-08;45;The second episode;This is the second episode of the show.
2023-01-22;45;The special episode;This is the special episode of the show.
After running the doc-merger.py
script, a new file named result.txt
will be generated, which contains the merged data:
1;2023-01-01;45;The first episode;This is the first episode of the show.
2;2023-01-08;45;The second episode;This is the second episode of the show.
3;2023-01-15;45;The third episode;
In addition, the script will also generate a file named doc2_only.txt
, which contains data that only exists in doc2.txt
:
2023-01-22;45;The special episode;This is the special episode of the show.
The script will also print some statistical information:
Merged: 2
Doc1 only: 1
Doc2 only: 1
- Make sure you have Python 3.0 or higher installed on your system.
- Make sure that the data format in the text files conforms to the format defined in the script. For example, each line in
doc1.txt
should contain four fields separated by semicolons, representing the episode number, date, duration, and title; each line indoc2.txt
should contain four fields separated by semicolons, representing the date, duration, title, and description.
- Clone or download the repository to a directory on your computer.
- Modify the path in
start.command (Mac)
orstart.bat (Win)
to point to the directory where you store thedoc-merger.py
script. - Save the text to be processed as
doc1.txt
anddoc2.txt
files respectively and place them in the same directory as the script. - Double-click
start.command
orstart.bat
to execute thedoc-merger.py
script. - The result will be written to files named
result.txt
anddoc2_only.txt
in the same directory.