attardi/wikiextractor

Template not correctly expanded

dnk8n opened this issue · 2 comments

dnk8n commented

Note in the below example, in paragraph 2, sentence 1, how "its founding president was Luigi Vittorio Bertarelli." was not correctly captured. Instead It was truncated to "its founding president was ."

Original article: https://en.wikipedia.org/wiki?curid=3917542&oldid=1034257382

Wikiextract text output:

The Touring Club Italiano (TCI) (Italian Touring Club or Touring Club of Italy) is the major Italian national tourist organization.
The Touring Club Ciclistico Italiano (TCCI) was founded on 8 November 1894 by a group of bicyclists to promote the values of cycling and travel; its founding president was . It published its first maps in 1897. By 1899, it had 16,000 members. With the new century, it promoted tourism in all its forms – including auto tourism – and the appreciation of the natural and urban environments. Under fascism, starting in 1937, it was forced to Italianize its name to the Consociazione Turistica Italiana.
Through the years, it has produced a wide variety of maps, guidebooks, and more specialized studies, and is known for its high standard of cartography. Its detailed road maps of Italy are published at 1:200,000, one per region.
Publishing activity.
Its most prestigious guidebooks are the "Guide Rosse" (not to be confused with the Michelin Red Guides), which cover Italy in 23 highly detailed volumes printed on bible paper; the TCI also produces a wide variety of other guides to Italy. During the Fascist period, the red guides were also extended to cover Italian colonies and overseas territories.
Among many other publications the Touring Club Italiano, along with Club Alpino Italiano, published between 1908 and 2013 the "Guida dei Monti d'Italia" (in english "Guidebook to the Italian mountains"), a series of guidebooks covering all the mountain ranges of Italy.
The TCI also publishes translations of foreign guidebooks such as the French Guide Bleu.

I have come across the same issue. Seems any words that are tagged as page does not exists will be skipped automatically. Is there any way to solve it and keep the words?

I'm having the same problem. I lost most of the page's context; only the first or second words appear, and the rest is gone. Is there a way to fix it?