szcf-weiya/ESL-CN

url or identifier for disqus thread

szcf-weiya opened this issue · 9 comments

I just realized that I forgot to set the this.page.url and this.page.identifier after using the Disqus for mainland China, and then it will create a new thread for Disqus for every new page, such as treating

https://esl.hohoweiya.xyz/05-Basis-Expansions-and-Regularization/5.2-Piecewise-Polynomials-and-Splines/index.html

and

https://esl.hohoweiya.xyz/05-Basis-Expansions-and-Regularization/5.2-Piecewise-Polynomials-and-Splines/index.html?from=singlemessage&isappinstalled=0

as different pages, but they are the same.

Without manually specifying the url or identifier, then Disqus will take window.location.href as the url (https://help.disqus.com/en/articles/1717084-javascript-configuration-variables). Then in that case, if there are comments under the first page, it will not appear on the second page and each page will keep its own comments. That is too bad!!

In Disqus for mainland China, if the identifier is empty, then it set as the url,
https://github.com/fooleap/disqus-php-api/blob/88dd45a767b9205f9a599ee8de9148aba2af8944/src/iDisqus.js#L280
and then if no found existed thread, it will require to create a thread, such as this one I run on localhost,
image
so the key point is that assigning a unique url for each page, and also not violate the current setting, i.e., do not to use different idenfitier.

The moderate panel logs also support the above guess, some new threads are created even though there are some existed threads.
image

another related problem is that the sidebar latest comments have not been updated
image
and I found that some links are outdated, i.e., point to different urls, and may belong to different threads, then I follow https://mycyberuniverse.com/how-delete-discussion-threads-incorrect-url-disqus.html, and try to correct several outdated links by url mapping (attached is the maps), and to see if it can update more frequently.

urlmaps-2020-02-27.txt

It works immediately!!

replace // to /

there are some historical links contain a double slash, firstly get such records,

grep -E "z//" "esl-hohoweiya-xyz-2020-02-27T05:05:22.623114-links.csv" | sed "s/\r//"  > raw_double_slash.txt 

where sed is to remove ^M at the end of the line, refer to Text file with ^M on each line and How to remove CTRL-M (^M) characters from a file in Linux
then convert them, and write as the format old_url, new_url required by Disqus.

grep -E "z//" "esl-hohoweiya-xyz-2020-02-27T05:05:22.623114-links.csv" | sed "s/z\/\//z\//" > fixed_double_slash.txt
paste -d', ' raw_double_slash.txt fixed_double_slash.txt > double_slash_maps.txt

double_slash_maps.txt

end with /index.html not /

extract the records

sed -n "s/\/\r/\//p" "esl-hohoweiya-xyz-2020-02-27T05:05:22.623114-links.csv" > no_index.txt

then modify them

sed -n "s/\/\r/\/index.html/p" "esl-hohoweiya-xyz-2020-02-27T05:05:22.623114-links.csv" > fix_no_index_maps.txt

and then write as maps,

paste -d ',' no_index.txt fix_no_index_maps.txt > no_index_maps.txt

one more step is that to replace the %20 with -,

sed -i "s/\%20/-/g" no_index_maps.txt

no_index_maps.txt

replace empty space with -

extract records,

sed -n "/\%20/p" "esl-hohoweiya-xyz-2020-02-27T07:42:29.756937-links.csv" | sed "s/\r//" > space_delim.txt

then

sed 's/\%20/-/g' space_delim.txt > fix_space_delim.txt

and some particular fixings,

sed -i 's/,//g' fix_space_delim.txt
sed -i 's/\%2C//g' fix_space_delim.txt
sed 's/\/$/index.html/' fix_space_delim.txt

it is necessary to note that the last one only works when the end line is not \r, i.e., sed "s/\r//" is necessary.

space_delim_maps.txt

some special fixes

sed -n '/?from/p' "esl-hohoweiya-xyz-2020-02-27T09:25:59.917885-links.csv" | sed 's/\r//' > specials.txt 
sed -n 's/?from.*\r//p' "esl-hohoweiya-xyz-2020-02-27T09:25:59.917885-links.csv" | sed 's/\r//'> fix_specials.txt
paste -d',' specials.txt fix_specials.txt > special_maps.txt

and

sed -n '/\%2520/p' "esl-hohoweiya-xyz-2020-02-27T09:25:59.917885-links.csv" | sed 's/\r//' > specials1.txt
sed -n 's/\%2520/-/gp' "esl-hohoweiya-xyz-2020-02-27T09:25:59.917885-links.csv" | sed 's/\r//' > fix_specials1.txt
sed -i 's/,//g' fix_specials1.txt
paste -d',' specials1.txt fix_specials1.txt > special_maps1.txt

then combine by cat
sepcial_maps_all.txt

Now the URLs indeed are updated by checking the url list, but the exported comments seem no change, and the Edit Discussions panel is still not changed. Maybe need 24 hours as it said.

I guess the migration will take effect only if the pages are visited since I export the comments every day and found that there are indeed some fixes, but still some remains.