delph-in/docs

Page not imported?

Opened this issue · 18 comments

It looks like this page didn't get imported:
http://moin.delph-in.net/wiki/CambridgeSEM-I

It's world readable, so I wonder if the problem is that the page name is a bit odd (has a hyphen) and if so, if there might be other pages that weren't imported.

@arademaker can you import it and also see if maybe there are others?

It also looks like links to the page will need to be updated. I discovered it was missing by looking here:

https://github.com/delph-in/docs/wiki/RmrsDiscussions

I have manually copied and converted to markdown the page https://github.com/delph-in/docs/wiki/CambridgeSEM-I. You are right, my code should have missed that page because of the character that looks like an hyphen in the name or because in MoinMoin we had two very similar pages:

CambridgeSEM\(28\)2d\(29\)I/  CambridgeSEM\(2d\)I/

The first was deleted, so maybe I could have made something wrong! thank you for open the issue. I fixed the links in https://github.com/delph-in/docs/wiki/CambridgeSchedule and https://github.com/delph-in/docs/wiki/RmrsDiscussions (needs some edition to improve format).

and also see if maybe there are others?

Not sure at this stage how I can check that. In the dump from MoinMoin I have 1266 pages:

% ls | wc -l
    1266

In the current wiki we have 1057 pages:

% ls | wc -l
    1057

But many MoinMoin pages were intentionally removed:

  1. personal pages
  2. system pages
  3. restricted pages that @oepen listed to me

We do have some weird names in the new wiki, but the content looks right:

% ls | rg "[^a-zA-Z0-9_.]"
CambridgeSEM-I.md
LtgOslo_Hank(c3b8).md
LtgOslo_Hank(c3b8)Retreat.md
MatrixDoc_Nominalized(20)Clauses.md
Saabr(c3bc)ckenTop.md
Saarbr(c3bc)ckenTop.md
SideSt(c3b8)rrelse.md
Singapore(20)Top.md
TheAbbey_Chrysalis2014PpAttachment(5d).md
ToolsTop_converter(2e)html.md
Usability_ease(20)of(20)set(2d)up.md

In the MoinMoin dump we have

% ls | rg "[^a-zA-Z0-9_.]"
(28)c396(29)nskadeSidor
(28)c396(29)vergivnaSidor
(28)c398(29)nskedeSider
(c396)nskadeSidor
(c396)vergivnaSidor
(c398)nskedeSider
4(28)2d(29)16_Meeting_Notes
4(2d)16_Meeting_Notes
Aktuelle(28)c384(29)nderungen
Aktuelle(c384)nderungen
Anv(28)c3a4(29)ndarInst(28)c3a4(29)llningar
Anv(c3a4)ndarInst(c3a4)llningar
CambridgeSEM(28)2d(29)I
CambridgeSEM(2d)I
ChangementsR(28)c3a9(29)cents
ChangementsR(c3a9)cents
ClarinoTop(2f)RelatedWork
ClarinoTop(2f)RequirementsSurvey
ClarinoTop(2f)TechnologySurvey
Climb(2f)GClimb
Climb(2f)GClimb(2f)German
DeepBank(2f)OneOne
DeepBank(2f)OneZero
DelphinTutorial(2f)Distributions
DelphinTutorial(2f)Formalisms
DelphinTutorial(2f)Grammars
DelphinTutorial(2f)Processing
ErgProcessing(2f)ExportExample
ErgProcessing(2f)SampleExport
ErgSemantics(2f)Apposition
ErgSemantics(2f)Basics
ErgSemantics(2f)Ccs
ErgSemantics(2f)Comparatives
ErgSemantics(2f)Compounding
ErgSemantics(2f)Conditionals
ErgSemantics(2f)ControlRelations
ErgSemantics(2f)Conventions
ErgSemantics(2f)Coordination
ErgSemantics(2f)Design
ErgSemantics(2f)Discovery
ErgSemantics(2f)Ellipsis
ErgSemantics(2f)Essence
ErgSemantics(2f)ForeignExpressions
ErgSemantics(2f)Fragments
ErgSemantics(2f)Fundamentals
ErgSemantics(2f)HowToCite
ErgSemantics(2f)IdentityCopulae
ErgSemantics(2f)Imperatives
ErgSemantics(2f)ImplicitLocatives
ErgSemantics(2f)ImplicitNominals
ErgSemantics(2f)ImplicitQuantifiers
ErgSemantics(2f)InstrumentalRelatives
ErgSemantics(2f)Interface
ErgSemantics(2f)Internals
ErgSemantics(2f)Inventory
ErgSemantics(2f)MeasurePhrases
ErgSemantics(2f)Nominalization
ErgSemantics(2f)NonAdverbialClausalModifiers
ErgSemantics(2f)NonScopalModifiers
ErgSemantics(2f)Notes
ErgSemantics(2f)NumberSequences
ErgSemantics(2f)Parentheticals
ErgSemantics(2f)Partitives
ErgSemantics(2f)Predicates
ErgSemantics(2f)PropositionalArguments
ErgSemantics(2f)Quantification
ErgSemantics(2f)QuasiModalInfinitivals
ErgSemantics(2f)RelationalNouns
ErgSemantics(2f)RunOnConstruction
ErgSemantics(2f)Template
ErgSemantics(2f)Terminology
ErgSemantics(2f)TimeExpressions
ErgSemantics(2f)ToDo
ErgSemantics(2f)Vocatives
ErgTokenization(2f)ComplexExample
EventStats(2f)HitCounts
EventStats(2f)UserAgents
F(28)c3b6(29)r(28)c3a4(29)ldrarl(28)c3b6(29)saSidor
F(c3b6)r(c3a4)ldrarl(c3b6)saSidor
FeforPlenum(2f)Formalism
FeforPlenum(2f)LexicalAcquisitionImmatureGrammars
For(28)c3a6(29)ldrel(28)c3b8(29)seSider
For(c3a6)ldrel(c3b8)seSider
ItsdbTreebanking(2f)ItsdbAnnotation
ItsdbTreebanking(2f)ItsdbExporting
ItsdbTreebanking(2f)ItsdbModeling
ItsdbTreebanking(2f)ItsdbTrouble
ItsdbTreebanking(2f)ItsdbUpdating
ItsdbTsdb(2f)ProcessingRelations
JimWhite(2f)StarSemTokenTabulation
KyotoSchedule(2f)InterDelphinNotes
KyotoTop(2f)InterWiki
LapDevelopment(2f)Abel
LapDevelopment(2f)Accounting
LapDevelopment(2f)Annotations
LapDevelopment(2f)Blog
LapDevelopment(2f)DKPROCompilation
LapDevelopment(2f)Deployment
LapDevelopment(2f)DkPro
LapDevelopment(2f)Environment
LapDevelopment(2f)Giellatekno
LapDevelopment(2f)Hackathons
LapDevelopment(2f)Internals
LapDevelopment(2f)Library
LapDevelopment(2f)MongoDB
LapDevelopment(2f)Production
LapDevelopment(2f)Schedule
LapDevelopment(2f)ServerDeployment
LapDevelopment(2f)SeverDeployment
LapDevelopment(2f)Status
LapDevelopment(2f)Tasks
LapDevelopment(2f)Tests
LapDevelopment(2f)ToE
LapDevelopment(2f)Tree
LogonInstallation(2f)CvsBasics
LogonInstallation(2f)InstallationBasics
LogonMrs(2f)InformationStructure
LogonMrs(2f)MessageRelations
LogonProcessing(2f)BatchGeneration
LogonProcessing(2f)BatchParsing
LogonProcessing(2f)BatchTranslation
LogonTest(2f)BenchmarkingSuite
LtgOslo(2f)BibTeX
LtgOslo(2f)Cristin
LtgOslo(2f)Delphin
LtgOslo(2f)EndreAalrust
LtgOslo(2f)Goals
LtgOslo(2f)Hank(28)c3b8(29)
LtgOslo(2f)Hank(c3b8)
LtgOslo(2f)LaTeX
LtgOslo(2f)Linux
LtgOslo(2f)MSc
LtgOslo(2f)MajaBuljan
LtgOslo(2f)MarteSvalastoga
LtgOslo(2f)Norsk
LtgOslo(2f)Oscarsborg
LtgOslo(2f)TechTalks
LtgOslo(2f)TechTalks16
LtgOslo(2f)TechTalksH2016
LtgOslo(2f)WorkDuties
MatrixDoc(2f)AdnominalPossession
MatrixDoc(2f)ArgumentOptionality
MatrixDoc(2f)Case
MatrixDoc(2f)ClausalComplements
MatrixDoc(2f)ClausalModifiers
MatrixDoc(2f)Coordination
MatrixDoc(2f)DirectInverse
MatrixDoc(2f)Evidentials
MatrixDoc(2f)Gender
MatrixDoc(2f)GeneralInfo
MatrixDoc(2f)ImportToolboxLexicon
MatrixDoc(2f)InformationStructure
MatrixDoc(2f)Lexicon
MatrixDoc(2f)Morphology
MatrixDoc(2f)NominalizedClauses
MatrixDoc(2f)Number
MatrixDoc(2f)OtherFeatures
MatrixDoc(2f)Person
MatrixDoc(2f)SententialNegation
MatrixDoc(2f)TenseAspectMood
MatrixDoc(2f)TestByGeneration
MatrixDoc(2f)TestSentences
MatrixDoc(2f)WhQ
MatrixDoc(2f)WordOrder
MatrixDoc(2f)YesNoQ
MoinMoin(2f)InstallDocs
MoinMoin(2f)InstallationsAnleitung
MoinMoin(2f)TextFormatting
MtJaen(2f)MtJaenTanaka
OpenissuesTop(2f)GrammarMatrixClitic
OpenissuesTop(2f)GrammarMatrixSerialVerbConstructions
OpenissuesTop(2f)GrammarMatrixTenseAspect
PageAl(28)c3a9(29)atoire
PageAl(c3a9)atoire
PagesAbandonn(28)c3a9(29)es
PagesAbandonn(c3a9)es
PagesSouhait(28)c3a9(29)es
PagesSouhait(c3a9)es
PhonologyTop(2f)FrenchPhonemes
PhonologyTop(2f)InterWiki
Pr(28)c3a9(29)f(28)c3a9(29)rencesUtilisateur
Pr(c3a9)f(c3a9)rencesUtilisateur
S(28)c3b6(29)kSida
S(c3b6)kSida
SeitenGr(28)c3b6c39f(29)e
SeitenGr(c3b6c39f)e
Senaste(28)c384(29)ndringar
Senaste(c384)ndringar
SideSt(28)c3b8(29)rrelse
SideSt(c3b8)rrelse
Singapore(20)Top
Singapore(28)20(29)Top
SynSem(2f)Activities
SynSem(2f)Activities(2f)AnnotationConsistency
SynSem(2f)Activities(2f)ControlRaising
SynSem(2f)Activities(2f)Coordination
SynSem(2f)Activities(2f)DependentDimensions
SynSem(2f)Activities(2f)ExtrinsicParserEvaluation
SynSem(2f)Activities(2f)Gapping
SynSem(2f)Activities(2f)GramRel
SynSem(2f)Activities(2f)IdentitySyntax
SynSem(2f)Activities(2f)PcdrtEllipsis
SynSem(2f)Activities(2f)PcdrtEllipsis(2f)10Oct2017
SynSem(2f)Activities(2f)PcdrtEllipsis(2f)25Sept2017
SynSem(2f)Activities(2f)PolymorphicVariadicPredicates
SynSem(2f)Activities(2f)UdMeaningConstruction
SynSem(2f)Candidates
SynSem(2f)Impressions
SynSem(2f)Launch
SynSem(2f)LysebuResources
SynSem(2f)MeaningConstruction
SynSem(2f)MeaningRepresentation
SynSem(2f)Planning
SynSem(2f)Problems
SynSem(2f)Problems(2f)ERGQuantification
SynSem(2f)Problems(2f)ScopalNonScopal
SynSem(2f)Problems(2f)UDDeterminers
TheAbbey(2f)Chrysalis2014
TheAbbey(2f)Chrysalis2014Arity
TheAbbey(2f)Chrysalis2014BindingTheory
TheAbbey(2f)Chrysalis2014DeverbalNouns
TheAbbey(2f)Chrysalis2014Nominalization
TheAbbey(2f)Chrysalis2014OpenEndedPredicates
TheAbbey(2f)Chrysalis2014PossessiveIdioms
TheAbbey(2f)Chrysalis2014PpAttachment
TheAbbey(2f)Chrysalis2014ProperNouns
TheAbbey(2f)Chrysalis2014ProperNounsGeneration
TheAbbey(2f)Chrysalis2014SchrodingerMrs
TheAbbey(2f)Chrysalis2014Terminology
TheAbbey(2f)Chrysalis2014WhatsThePoint
Tilf(28)c3a6(29)ldigSide
Tilf(c3a6)ldigSide
ToolsTop(2f)converter(28)2e(29)html
ToolsTop(2f)converter(2e)html
Tu(28)e1baa5(29)nAnhL(28)c3aa(29)
Tu(e1baa5)nAnhL(c3aa)
TuanAnhLe(2f)GramEng4Dummies
WeSearch(2f)Adaptation
WeSearch(2f)Adaptation(2f)Background
WeSearch(2f)AnalysisCatalog
WeSearch(2f)Berlin
WeSearch(2f)Ccs
WeSearch(2f)CcsDayOne
WeSearch(2f)CcsDayThree
WeSearch(2f)CcsDayTwo
WeSearch(2f)ChartPruning
WeSearch(2f)DataCollection
WeSearch(2f)Demonstrator
WeSearch(2f)DescriptiveStatistics
WeSearch(2f)DesignPrinciples
WeSearch(2f)DocumentParsing
WeSearch(2f)FeforTopics
WeSearch(2f)Hank(28)c3b8(29)Schedule
WeSearch(2f)Hank(28)c3b8(29)TheRest
WeSearch(2f)Hank(c3b8)Schedule
WeSearch(2f)Hank(c3b8)TheRest
WeSearch(2f)ICONS
WeSearch(2f)Interface
WeSearch(2f)LexicalFiltering
WeSearch(2f)ParserAdaptation
WeSearch(2f)ParserEvaluation
WeSearch(2f)PestExamples
WeSearch(2f)QueryLanguage
WeSearch(2f)Rdf
WeSearch(2f)ReadingGroup
WeSearch(2f)RealisticTextParsing
WeSearch(2f)Resa
WeSearch(2f)ScopalArgCoord
WeSearch(2f)SentenceSegmentation
WeSearch(2f)StarSem
WeSearch(2f)StarSem(2f)MrsCrawling
WeSearch(2f)StarSem(2f)MrsCrawlingEvaluation
WeSearch(2f)StarSem(2f)MrsCrawlingOracle
WeSearch(2f)StarSem(2f)MrsReadingGroup
WeSearch(2f)StarSem(2f)UiO
WeSearch(2f)SuperTagging
WeSearch(2f)SuperTagging(2f)Setup
WeSearch(2f)Tokenization
WeSearch(2f)TripleStores
WeSearch(2f)UberTagging
WeSearch(2f)UnderspecifedAttachment
WeSearch(2f)UnderspecifiedPreds
WeSearch(2f)VariablePropertySharing
WikiSandL(28)c3a5(29)da
WikiSandL(c3a5)da
https(3a2f2f)students(2e)washington(2e)edu(2f)olzama(2f)ge
venue(28)2d(29)map(28)2e(29)png
venue(2d)map(2e)png

But some are garbage in MoinMoin, see the last two. The content is an image. Many pages were correctly imported by renamed from MatrixDoc(2f)Lexicon to https://github.com/delph-in/docs/wiki/MatrixDoc_Lexicon (because MoinMoin had support to subpages). Many pages under WeSearch prefixed were protected and not imported.

One more case similar to the CambridgeSEM-I page:

ToolsTop(2f)converter(28)2e(29)html
ToolsTop(2f)converter(2e)html

I have just manually create https://github.com/delph-in/docs/wiki/ToolsTop_converter.

Pages

LtgOslo(2f)Hank(28)c3b8(29)
LtgOslo(2f)Hank(c3b8)

the first was deleted, the second is protected in MoinMoin. So I removed them from here:

0e28907d5461412626da14ea103680c31f7ea951 (HEAD -> master, origin/master) Destroyed LtgOslo_Hank(c3b8)Retreat (markdown)
:100644 000000 4df770ab 00000000 D      LtgOslo_Hank(c3b8)Retreat.md
63049d2ec3ed739c829debb7623cb210ca533027 Destroyed LtgOslo_Hank(c3b8) (markdown)
:100644 000000 4df770ab 00000000 D      LtgOslo_Hank(c3b8).md

Help needed! Can someone see any important page in the lists above that is not in the current wiki?

Pages

Saabr(c3bc)ckenTop.md
Saarbr(c3bc)ckenTop.md

were duplicated (related to #25), I fixed the name and merged the contents in https://github.com/delph-in/docs/wiki/SaabruckenTop.

I think all those (2f) kinds of things are when whatever converter you used tried to escape the punctuation. They are hexadecimal values for ASCII characters (illustrated in Python (sorry) below):

>>> chr(int('2d', 16))  # convert base-16 int to character
'-'
>>> chr(int('2f', 16))
'/'

Although the one for SaarbrückenTop is strange:

>>> chr(int('c3bc', 16))
'쎼'
>>> hex(ord('ü'))  # going the other way
'0xfc'

Then the CambridgeSEM\(28\)2d\(29\)I/ vs CambridgeSEM\(2d\)I/ thing is because those escapes were, themselves, escaped:

>>> chr(int('28', 16))
'('
>>> chr(int('29', 16))
')'

It looks like all the ones with only (2f) (/) used _ instead and are imported already. The ones with dashes (CambridgeSEM-I) are presented in the browser with the dash as a space (see here). With this in mind I whittled down your list a bit. I don't have the Moin dump so I copied your file list above as moin.txt, then I created two normalized lists of files like this:

$ cat moin.txt | sed -e 's/(2f)/_/g' -e 's/(2d)/-/g' -e 's/$/.md/' > moin-norm.txt
$ ls | grep "[^a-zA-Z0-9.]" | sort > current.txt

Then I can find which ones are not already ported:

$ comm -2 -3 moin-norm.txt current.txt  # find lines in common, only show unique in moin-norm.txt

It produces the following list, which I have manually sorted and annotated:

# System pages (I'm just guessing for the non-English titles)
(28)c396(29)nskadeSidor.md
(28)c396(29)vergivnaSidor.md
(28)c398(29)nskedeSider.md
Aktuelle(28)c384(29)nderungen.md
Aktuelle(c384)nderungen.md
Anv(28)c3a4(29)ndarInst(28)c3a4(29)llningar.md
Anv(c3a4)ndarInst(c3a4)llningar.md
(c396)nskadeSidor.md
(c396)vergivnaSidor.md
(c398)nskedeSider.md
ChangementsR(28)c3a9(29)cents.md
ChangementsR(c3a9)cents.md
F(28)c3b6(29)r(28)c3a4(29)ldrarl(28)c3b6(29)saSidor.md
F(c3b6)r(c3a4)ldrarl(c3b6)saSidor.md
For(28)c3a6(29)ldrel(28)c3b8(29)seSider.md
For(c3a6)ldrel(c3b8)seSider.md
MoinMoin_InstallationsAnleitung.md
MoinMoin_InstallDocs.md
MoinMoin_TextFormatting.md
PageAl(28)c3a9(29)atoire.md
PageAl(c3a9)atoire.md
PagesAbandonn(28)c3a9(29)es.md
PagesAbandonn(c3a9)es.md
PagesSouhait(28)c3a9(29)es.md
PagesSouhait(c3a9)es.md
Pr(28)c3a9(29)f(28)c3a9(29)rencesUtilisateur.md
Pr(c3a9)f(c3a9)rencesUtilisateur.md
S(28)c3b6(29)kSida.md
S(c3b6)kSida.md
SeitenGr(28)c3b6c39f(29)e.md
SeitenGr(c3b6c39f)e.md
Senaste(28)c384(29)ndringar.md
Senaste(c384)ndringar.md
SideSt(28)c3b8(29)rrelse.md
Tilf(28)c3a6(29)ldigSide.md
Tilf(c3a6)ldigSide.md
WikiSandL(28)c3a5(29)da.md
WikiSandL(c3a5)da.md

# Personal pages or accidental (?) pages
https(3a2f2f)students(2e)washington(2e)edu_olzama_ge.md
LtgOslo_Cristin.md
Tu(28)e1baa5(29)nAnhL(28)c3aa(29).md
Tu(e1baa5)nAnhL(c3aa).md
venue(28)2d(29)map(28)2e(29)png.md
venue-map(2e)png.md
Singapore(28)20(29)Top.md  # see SingaporeTop

# Other duplicates from bad escaping
4(28)2d(29)16_Meeting_Notes.md
CambridgeSEM(28)2d(29)I.md
LtgOslo_Hank(28)c3b8(29).md
ToolsTop_converter(28)2e(29)html.md
WeSearch_Hank(28)c3b8(29)Schedule.md
WeSearch_Hank(28)c3b8(29)TheRest.md

# Potentially good pages; some already converted
4-16_Meeting_Notes.md
ClarinoTop_RelatedWork.md
ClarinoTop_RequirementsSurvey.md
ClarinoTop_TechnologySurvey.md
ErgProcessing_ExportExample.md
ErgSemantics_Fundamentals.md
ErgSemantics_NonScopalModifiers.md
ErgSemantics_RunOnConstruction.md
ItsdbTreebanking_ItsdbTrouble.md
KyotoTop_InterWiki.md
LapDevelopment_Abel.md
LapDevelopment_SeverDeployment.md
LapDevelopment_Tasks.md
LogonInstallation_CvsBasics.md
LogonInstallation_InstallationBasics.md
LogonMrs_InformationStructure.md
LogonMrs_MessageRelations.md
LtgOslo_Hank(c3b8).md  # LtgOslo/Hankø
MatrixDoc_WhQ.md
ToolsTop_converter(2e)html.md  # wiki actually had ".html" in the title; already imported as ToolsTop_converter
WeSearch_Berlin.md
WeSearch_Demonstrator.md
WeSearch_FeforTopics.md
WeSearch_Hank(c3b8)Schedule.md  # WeSearch/HankøSchedule
WeSearch_Hank(c3b8)TheRest.md  # WeSearch/HankøTheRest
WeSearch_Interface.md
WeSearch_PestExamples.md
WeSearch_RealisticTextParsing.md
WeSearch_StarSem_MrsCrawling.md
WeSearch_SuperTagging.md
WeSearch_Tokenization.md
WeSearch_TripleStores.md
WeSearch_UberTagging.md

Thank you @goodmami , yes / were converted to _ and - GitHub magically translates to space. The parenthesis are agly but do not cause any harm. But the reason for duplications (see #25) is still not clear to me. Some duplications are already in the dump, so not an error in the migration. The encoding may have caused some error in the migration but we now have a list. I am attaching the list of all pages in the dump that I got from @oepen:

moin.txt

As you noticed, many of the cases above I already fixed.

(edited)

The case of ErgSemantics_NonScopalModifiers.md is interesting. It looks like an important page that we may have lost, but http://moin.delph-in.net/wiki/ErgSemantics/NonScopalModifiers. This page was deleted in MoinMoin. Actually, it was ErgSemantics(2f)RelativeClauses renamed to ErgSemantics(2f)NonScopalModifiers and later deleted:

See ErgSemantics\(2f\)NonScopalModifiers/edit-log:

1382547133360546	00000001	SAVENEW	ErgSemantics(2f)RelativeClauses	75.146.63.242	75-146-63-242-Washington.hfc.comcastbusiness.net	1101511421.47.55017		
1382549060521974	00000002	SAVE	ErgSemantics(2f)RelativeClauses	75.146.63.242	75-146-63-242-Washington.hfc.comcastbusiness.net	1101511421.47.55017		
1382549150509747	00000003	SAVE	ErgSemantics(2f)RelativeClauses	75.146.63.242	75-146-63-242-Washington.hfc.comcastbusiness.net	1101511421.47.55017		
1382602650699190	00000004	SAVE	ErgSemantics(2f)RelativeClauses	93.206.0.159	p5DCE009F.dip0.t-ipconnect.de	1098876287.95.17133		
1405018957939738	00000005	SAVE	ErgSemantics(2f)RelativeClauses	87.162.226.112	p57A2E270.dip0.t-ipconnect.de	1098876287.95.17133		
1415232274472863	00000006	SAVE	ErgSemantics(2f)RelativeClauses	75.146.63.242	75-146-63-242-Washington.hfc.comcastbusiness.net	1101511421.47.55017		
1433437370469623	00000007	SAVE	ErgSemantics(2f)RelativeClauses	75.146.63.242	75-146-63-242-Washington.hfc.comcastbusiness.net	1101511421.47.55017		
1450478308547806	00000008	SAVE	ErgSemantics(2f)RelativeClauses	174.21.159.201	174-21-159-201.tukw.qwest.net	1101511421.47.55017		A first attempt at talking about intersective modification as a `phenomenon'
1450478469201052	00000009	SAVE	ErgSemantics(2f)RelativeClauses	174.21.159.201	174-21-159-201.tukw.qwest.net	1101511421.47.55017		Noting references I haven't yet looked through
1450733435563410	00000010	SAVE/RENAME	ErgSemantics(2f)NonScopalModifiers	193.157.186.127	1x-193-157-186-127.uio.no	1098876287.95.17133	ErgSemantics/RelativeClauses	per ESD decision
1450734224521613	00000011	SAVE	ErgSemantics(2f)NonScopalModifiers	174.21.159.201	174-21-159-201.tukw.qwest.net	1101511421.47.55017		
1453307751475650	00000012	SAVE	ErgSemantics(2f)NonScopalModifiers	174.21.160.48	174-21-160-48.tukw.qwest.net	1101511421.47.55017		Typographic conventions
1453307888518481	00000013	SAVE	ErgSemantics(2f)NonScopalModifiers	174.21.160.48	174-21-160-48.tukw.qwest.net	1101511421.47.55017		
1453308154249605	00000014	SAVE	ErgSemantics(2f)NonScopalModifiers	174.21.160.48	174-21-160-48.tukw.qwest.net	1101511421.47.55017		Revise to fully embrace ‘non-scopal modifiers’ as the phenomenon name
1453308395419146	00000015	SAVE	ErgSemantics(2f)NonScopalModifiers	174.21.160.48	174-21-160-48.tukw.qwest.net	1101511421.47.55017		Further edits based on notes from last ESD meeting
1453308624544911	00000016	SAVE	ErgSemantics(2f)NonScopalModifiers	174.21.160.48	174-21-160-48.tukw.qwest.net	1101511421.47.55017		And one last thing
1453318301298743	00000017	SAVE	ErgSemantics(2f)NonScopalModifiers	174.21.160.48	174-21-160-48.tukw.qwest.net	1101511421.47.55017		
1453822209279284	00000018	SAVE	ErgSemantics(2f)NonScopalModifiers	193.157.184.226	1x-193-157-184-226.uio.no	1098876287.95.17133		per request by emily

That last comment looks like one by @oepen and at a guess we decided to delete the page/merge the content elsewhere.

One more crazy page is 4-16_Meeting_Notes.md, In the original dump I have 4\(2d\)16_Meeting_Notes/ but during the migration, I had to instantiate a local MoinMoin in a docker that was the endpoint for another script to get the contents and produce the markdown files for this new wiki. It looks like this new instance created the empty 4\(28\)2d\(29\)16_Meeting_Notes/ file.

This new file is not a big problem, it is empty and even if it generate an empty page here, we can easily delete. The original page 4\(2d\)16_Meeting_Notes/ was deleted in MoinMoin: the current version is 00000004 but the least revision with content is 00000003. But the log says nothing

1177188397000000	00000001	SAVENEW	4(2d)16_Meeting_Notes	71.35.116.39	71-35-116-39.tukw.qwest.net	1176767071.26.36927		
1177188634000000	00000002	SAVE	4(2d)16_Meeting_Notes	71.35.116.39	71-35-116-39.tukw.qwest.net	1176767071.26.36927		
1177188730000000	00000003	SAVE	4(2d)16_Meeting_Notes	71.35.116.39	71-35-116-39.tukw.qwest.net	1176767071.26.36927		
1281376152000000	00000004	SAVE	4(2d)16_Meeting_Notes	84.208.94.211	cm-84.208.94.211.getinternet.no	1098876287.95.17133		

So for me, nothing wrong here, the page does not exist in http://moin.delph-in.net/wiki/OsloScopalNonScopal?action=fullsearch&context=180&value=notes&titlesearch=Titles, one extra clue that it was deleted. Content of the rev 0000003 looks like a draft anyway:

== slot/proto-morpheme whatchamacallit ==

'''Content'''
   * morphosyntactic categories
      * portmanteau
      * range of values
      * unhandled "dummy"

'''Context'''
   * order
   * dependencies
      * category missing (e.g. don't mark person on infinitives)
      * dependent choices (e.g. neg gets different mood)
   * optionality
      * easy
      * multipaths
   * iterability
      * *fix

But them in the current wiki I found https://github.com/delph-in/docs/wiki/notes, the name is not very informative and it looks duplicated from https://github.com/delph-in/docs/wiki/OsloScopalNonScopal. But they are not identical. So I found in the git whatchanged:

887179315996ba05848a18c2b35506eee8c4f61b Rough notes, speakers are encouraged to read & edit.
:000000 100644 00000000 dc378ff3 A      notes.md

and this same message in the history of the http://moin.delph-in.net/wiki/OsloScopalNonScopal?action=info. So notes was the initial name for OsloScopalNonScopal. It looks like the migration process had trouble with pages that were renamed during the process to retrieve the page history. As we can see in the git log

ar@tenis docs.wiki % git log --format='%H %an %s' -- notes.md
887179315996ba05848a18c2b35506eee8c4f61b EmilyBender Rough notes, speakers are encouraged to read & edit.

ar@tenis docs.wiki % git log --format='%H %an %s' -- OsloScopalNonScopal.md
c4abe4a1952ce48117e77d6a5b8dfadc2ca02f96 EmilyBender Adding notes from SIG
51056273c26c3393d306220ced63070810ee8b7e GlennSlayden Add/Update OsloScopalNonScopal.md
683a77b7585d8b4cf6e3917100b4dcc1d4d796d6 StephanOepen Add/Update OsloScopalNonScopal.md
ef4ab6cb5451ab4fbf23bc0b9fed6f9599c23241 StephanOepen per request by emily

During the process, to preserve the history of the changes, the migration process created the notes.md. But this file was renamed and instead of delete the notes.md, the migration just create a new file with the new name. I have delete the notes.md now.

I realize that it was a mistake from my side to not detected all these details during the migration. I am sorry for that. But no content was lost, I do have the dump, we do have MoinMoin in ready-only mode running. I still believe that for the majority of the pages, the final result is fine. So maybe we just need to be aware of those problems and try to solve the issues as we find them?

The migration is such a huge job, @arademaker ! Thank you for taking it on.

I think that notes.md file was indeed spurious, and I see that OsloScopalNotScopal has survived the transition. It's too bad that the 'delete' actions aren't apparent (at least as far as I can tell) in the migrated data.

The deletion of notes.md was done by me now, locally:

% pwd
/Users/ar/hpsg/documentation/docs.wiki
% git whatchanged
8a29e9c8da582a0f71793895e11b0b2eaafaf545 (HEAD -> master, origin/master) deleted file that was renamed. See #18
:100644 000000 dc378ff3 00000000 D      notes.md
...

The good news is that we do have a way to know all pages in MoinMoin that we renamed:

find . -name edit-log | xargs awk '$3 ~ /RENAME/ {print FILENAME,$2,"new: " $4,"old: " $8}'
  • ./ErgSemantics(2f)QuasiModalInfinitivals/edit-log 00000011 new: ErgSemantics(2f)QuasiModalInfinitivals old: ErgSemantics/IsTo
  • ./SynSem(2f)Activities(2f)PolymorphicVariadicPredicates/edit-log 00000038 new: SynSem(2f)Activities(2f)PolymorphicVariadicPredicates old: SynSem/PolymorphicVariadicPredicates
  • ./SingaporeTeachingWithLKB/edit-log 00000003 new: SingaporeTeachingWithLKB old: TeachingWithLKB
  • ./ErgSemantics(2f)Basics/edit-log 00000004 new: ErgSemantics(2f)Basics old: ErgSemantics/Basic
  • ./LtgOslo(2f)TechTalks/edit-log 00000023 new: LtgOslo(2f)TechTalks old: LtgTalks
  • ./ErgSemantics(2f)Predicates/edit-log 00000003 new: ErgSemantics(2f)Predicates old: LapDevelopment/Predicates
  • ./WeSearch(2f)PestExamples/edit-log 00000002 new: WeSearch(2f)PestExamples old: WeSearch/PESTExamples
  • ./MatrixDoc(2f)NominalizedClauses/edit-log 00000003 new: MatrixDoc(2f)NominalizedClauses old: MatrixDoc/Nominalized
  • ./DeepBank(2f)OneZero/edit-log 00000007 new: DeepBank(2f)OneZero old: DeepBank/Inventory
  • ./WeSearch(2f)Hank(c3b8)Schedule/edit-log 00000011 new: WeSearch(2f)Hank(c3b8)Schedule old: WeSearch/MrsStandardization
  • ./LtgOslo(2f)Delphin/edit-log 00000005 new: LtgOslo(2f)Delphin old: LtgOslo/DelphinUpdates
  • ./DelphinTutorial(2f)Formalisms/edit-log 00000017 new: DelphinTutorial(2f)Formalisms old: DelphinTutorial/Formalism
  • ./WeSearch(2f)Adaptation/edit-log 00000006 new: WeSearch(2f)Adaptation old: WeSearch/parser_adaptation
  • ./SynSem(2f)Candidates/edit-log 00000018 new: SynSem(2f)Candidates old: SynSem/CasCandidates
  • ./SaarlandTop/edit-log 00000002 new: Saarbr(c3bc)ckenTop old: SaabrückenTop
  • ./SaarlandTop/edit-log 00000008 new: SaarlandTop old: SaarbrückenTop
  • ./SynSem(2f)Activities(2f)DependentDimensions/edit-log 00000010 new: SynSem(2f)Activities(2f)DependentDimensions old: SynSem/DependentDimensions
  • ./DelphinTutorial(2f)Distributions/edit-log 00000004 new: DelphinTutorial(2f)Distributions old: DelphinTutorial/Distribution
  • ./LtgOslo(2f)Hank(c3b8)/edit-log 00000020 new: LtgOslo(2f)Hank(c3b8) old: LtgOslo/HankøRetreat
  • ./SofiaIcons/edit-log 00000002 new: SofiaIcons old: SofiaICONS
  • ./ErgSemantics(2f)ImplicitLocatives/edit-log 00000004 new: ErgSemantics(2f)ImplicitLocatives old: ErgSemantics/NpAdverbials
  • ./SynSem(2f)MeaningConstruction/edit-log 00000005 new: SynSem(2f)MeaningConstruction old: SynSem/Gabelshus
  • ./OsloScopalNonScopal/edit-log 00000003 new: OsloScopalNonScopal old: notes
  • ./DelphinApplications/edit-log 00000006 new: DelphinApplications old: DelpinApplications
  • ./LapDevelopment(2f)Deployment/edit-log 00000007 new: LapDevelopment(2f)Local old: LapDevelopment/LocalGalaxy
  • ./LapDevelopment(2f)Deployment/edit-log 00000027 new: LapDevelopment(2f)Deployment old: LapDevelopment/Local
  • ./EdsTop/edit-log 00000031 new: EdsTop old: RmrsEds
  • ./RestfulTop/edit-log 00000002 new: RestfulTop old: DelphinRestApi
  • ./SynSem(2f)Activities(2f)UdMeaningConstruction/edit-log 00000002 new: SynSem(2f)Activities(2f)UdMeaningConstruction old: SynSem/Activities/UniversalMeaningConstruction
  • ./SingaporeMRSMatrixTestsuites/edit-log 00000004 new: SingaporeMRSMatrixTestsuites old: MRSMatrixTestsuites
  • ./PaperCuts/edit-log 00000002 new: PaperCuts old: Papercuts
  • ./TomarParseRanking/edit-log 00000004 new: TomarParseRanking old: TomarScheduleParseRanking
  • ./WeSearch(2f)StarSem/edit-log 00000007 new: WeSearch(2f)StarSem old: StarSEM
  • ./SingaporeTemporalPronouns/edit-log 00000002 new: SingaporeTemporalPronouns old: TemporalPros
  • ./SynSem(2f)Activities(2f)ControlRaising/edit-log 00000011 new: SynSem(2f)Activities(2f)ControlRaising old: Activities/ControlRaising
  • ./WeSearch(2f)ParserAdaptation/edit-log 00000011 new: WeSearch(2f)ParserAdaptation old: WeSearch/DomainAdaptation
  • ./ErgSemantics(2f)NonScopalModifiers/edit-log 00000010 new: ErgSemantics(2f)NonScopalModifiers old: ErgSemantics/RelativeClauses
  • ./SaarlandUseability/edit-log 00000004 new: SaarlandUseability old: Usability/ease
  • ./LapDevelopment(2f)Library/edit-log 00000003 new: LapDevelopment(2f)Library old: LapDevelopment/Store
  • ./ErgSemantics(2f)ImplicitNominals/edit-log 00000006 new: ErgSemantics(2f)ImplicitNominals old: ErgSemantics/OneAnaphora
  • ./ErgSemantics(2f)PropositionalArguments/edit-log 00000004 new: ErgSemantics(2f)PropositionalArguments old: ErgSemantics/ClausalComplements
  • ./ErgSemantics(2f)Ccs/edit-log 00000010 new: ErgSemantics(2f)Ccs old: ErgSemantics/CcsGuidedTour
  • ./TheAbbey(2f)Chrysalis2014PpAttachment/edit-log 00000003 new: TheAbbey(2f)Chrysalis2014PpAttachment old: TheAbbey/Chrysalis2014PpAttachment]
  • ./ErgSemantics(2f)ImplicitQuantifiers/edit-log 00000005 new: ErgSemantics(2f)ImplicitQuantifiers old: ErgSemantics/ImplicitQuantification

For

./SynSem(2f)Activities(2f)DependentDimensions/edit-log 00000010 new: SynSem(2f)Activities(2f)DependentDimensions old: SynSem/DependentDimensions

image

I just deleted the second one in the screenshot above. The old one that was renamed.

Is it possible to tell which pages were deleted during the MoinMoin days, though?

Hum, yes. For pages that are actually deleted, MoinMoin represents deletion by increasing the version number without creating an actually revision in the proper subdirectory. Each page is represented as:

% tree MatrixDocTop
MatrixDocTop
├── cache
│   └── pagelinks
├── current
├── edit-log
└── revisions
    ├── 00000001
    ├── 00000002
    ├── 00000003
    ├── 00000004
    ├── 00000005
    ├── 00000006
    ├── 00000007
    ├── 00000008
     ....

ar@tenis pages % cat MatrixTop/current
00000042

So if a page is deleted, the content of the current file will be a number that does not correspond to any file in the revisions subfolder. See http://moinmo.in/HelpOnPageDeletion

So the list of pages DELETED in MoinMoin are below. The renamed ones are not here:

venue(2d)map(2e)png
SuquamishCommunityHouse
StandingTop
StandingGroup
ShortCLIMB
PgAccess
PetEvolution
PestTop
ParisCards
ParallelCorp
MWEs_and_Idiomatic_Expressions
LogonMrs(2f)MessageRelations
LogonMrs(2f)InformationStructure
LogonInstallation(2f)InstallationBasics
LogonInstallation(2f)CvsBasics
LkbSmaf
LkbLexDbPsqlInitialize
LkbLexDbInitialize
LkbDownload
LicensingChoices
LexDbPgAccess
LexDB_Internals
LapDevelopment(2f)Tasks
LapDevelopment(2f)SeverDeployment
LapDevelopment(2f)Abel
KyotoFutureSummitSuggestions
ItsdbTreebanking(2f)ItsdbTrouble
Initialize_LexDB
ErgSemanticsTemplate
ErgSemantics(2f)RunOnConstruction
ErgSemantics(2f)NonScopalModifiers
ErgSemantics(2f)Fundamentals
ErgProcessing(2f)ExportExample
Deepbank
ClarinoTop
ClarinoTop(2f)TechnologySurvey
ClarinoTop(2f)RequirementsSurvey
ClarinoTop(2f)RelatedWork
BarcelonaWishlist
4(2d)16_Meeting_Notes

I see ErgSemantics(2f)NonScopalModifiers there, confirming our decision to delete it in the github wiki.

ah, I now see your point @emilymbender. my #18 (comment) was wrong (I just edited). The page ErgSemantics(2f)RelativeClauses was renamed to ErgSemantics(2f)NonScopalModifiers and this one later deleted.

The page http://moin.delph-in.net/wiki/LkbLexDb

last edited 2011-10-08 21:12:12 by localhost

But page https://github.com/delph-in/docs/wiki/LkbLexDB

StephanOepen edited this page on Jan 13, 2009

this is very weird since the page in this wiki is older than the page in the original frozen MoinMoin installation. Contents differ too. In the dump, the current file points to version 00000009 but this page in the MoinMoin has 00000035 as the last revision.

% cat dump/ltg/moin/delphin/data/pages/LkbLexDb/current
00000009
% ls dump/ltg/moin/delphin/data/pages/LkbLexDb/revisions
00000001	00000005	00000009	00000013	00000017	00000021	00000025	00000029	00000033
00000002	00000006	00000010	00000014	00000018	00000022	00000026	00000030	00000034
00000003	00000007	00000011	00000015	00000019	00000023	00000027	00000031	00000035
00000004	00000008	00000012	00000016	00000020	00000024	00000028	00000032