MoritzStefaner/ach-ingen-zell

General discussion

Opened this issue · 33 comments

General discussion

(I hope this is a good place to post this)

Based on the very interesting code, I got curious and had to do some experiments...

FWIW, I tried to improve the performance (initial load) a bit, using the current structure of the app (that is, same components, same order, just slightly different data flow), hoping to speed things up.

Find my results here:
https://github.com/bgrsquared/ach-ingen-zell/tree/performance/regexp (performance/regexp branch!)
(live online at: http://bgrsquared.com/AIZ/)
(this is experimental, code is not linted, etc.)

What did I change?

  • using regexp (instead of lodash) to match strings (more flexibility for a possible upcoming interactive version)
  • less (nested) loops (by creating an empty hexbin and then overwriting with filtered results)
  • removing business logic from plotting function (but now it is in the main.jsx, not much better, eh? )

--> performance is slightly better (about 20-30%, I'd guess, after data load) but still slow. I think this is mostly due to the fact that react is tracking all the circles, and well, there are many. In other words, I guess this is probably not the best use case for react (static app with lots of elements). Correct me if I am wrong (very probably, I am very wrong!)!
(I didn't try/check to render the thing with good old d3 directly, did anybody do that?)

--> did I miss anything? I'd be very curious to learn!

I am planning to write something a bit more interactive based on @MoritzStefaner 's great example, using redux (business logic/data), smart/dumb components, live filters. If wanted, I can give a heads-up when I get some results...

Thanks, very interesting! I was thinking about a few of the optimizations, too. I like how you did the RegEx, simple and clean, will test if it saves any time. I stayed aways from precomputing the data, mostly for code cleanliness/elegance reasons, but yes, one pass through the data should be enough. I don't think react is slower than d3 on a single render, in fact it could be faster (shadow DOM…). I will keep this repo now pretty much as is (maybe with a few punctual improvements). It could be worthwhile to test if polygons are actually fast than circles. (maybe the rendering is actually the bottleneck?) Anyways, thanks for investigating and keep me updated!

Ok, cool! I thought so, so I kept stuff in my fork :) I did try rect instead of circles, not much to gain there (if anything).

I don't think the virtual DOM is of much help here, as you just add the circles once (so you have vDOM setup plus the plain DOM actions). I might investigate this aspect a little further...

And I totally agree on the elegance aspect!

Anyway, great approach, interesting "read"! (& please let me know if you find out a performance bottleneck, too!)

Very nice visualization, thanks!
FWIW, I'd expect that using -grün would yield a more interesting pattern than, say, -graben.

In our neck of the woods, -ham is very often. Maybe you could add that as well?

Nice post! Do you think there might be a temporal element in the name patterns too?

Extremely cool! But I wonder why -ow also includes -au but -au doesn't include -ow. Just as remark ;-)

@silverpool yep, fixed it, thanks :)
@rosuda definitely! It's a mixture of geographic features, but also language and cultural, temporal, … influences. That makes it so interesting :)
@cmarqu added -grün for you :)
screen shot 2016-01-05 at 17 19 19

@MoritzStefaner Awesome, thanks!

@MoritzStefaner @cmarqu question, regarding the Grün example: I noticed that you are using RegExp (yay, good bye lodash!) now to match the strings, case sensitive. Thus you don't match e.g. "Grün" with capital G (there is such a town) itself.

I guess case in-sensitive matching makes more sense. I can push a PR if you want me to!

Yes, good point… PR welcome!

— M

On 05.01.2016, at 17:45, Chris Roth notifications@github.com wrote:

@MoritzStefaner https://github.com/MoritzStefaner @cmarqu https://github.com/cmarqu question, regarding the Grün example: I noticed that you are using RegExp (yay, good bye lodash!) now to match the strings, case sensitive. Thus you don't match e.g. "Grün" (there is such a town) itself.

I guess case in-sensitive matching makes more sense. I can push a PR if you want me to!


Reply to this email directly or view it on GitHub #1 (comment).

Also, I added title elements by now, to have some basic tooltips.

PR sent, #6

As mentioned yesterday, I built a slightly more interactive version based on this great repo. Obviously, it's a different use case...

So for those interested:

Check out the current live example here: http://bgrsquared.com/townsGermany/ (still "experimental")
-> note: this is not optimized (yet) for mobile; it works, but you'd rather want to check it out on a desktop/laptop.
(I hope it's more or less self-explanatory, see the well known examples at the bottom)

  • redux based data handling
  • smart/dumb component structure
  • live filters
  • Prefixes, suffixes, "infixes" (-> now you can e.g. browse all towns with a "y" or that have "Berg" anywhere in the name, etc.)
  • etc.

Todos:

  • clean up
  • performance (immutablejs, shouldComponentUpdate)
  • Layout...

--> Feedback very welcome!
--> Source will follow soon (need to do some clean up first :) )

@MoritzStefaner: please let me know/contact me if that is ok with you, as I cannot find a license here :) )

Nice work! Looks promising! Totally OK for me if you keep the reference to my project. Thanks for carrying this further!

Hi, first of all: Wow. The coolest stuff sometimes is the simple one. But one remark (out of your own remarks): Aren't all "-bach" in "-ach"? Therefore, aren't the two charts almost the same as the intersection is quite dominant? I think, a "-ach" minus "-bach" chart would be interesting, too. Kind regards

Yes, good point, this is exactly where a linguistically more sophisticated approach would be great.

On 06.01.2016, at 15:24, AndreasBretschneider notifications@github.com wrote:

Hi, first of all: Wow. The coolest stuff sometimes is the simple one. But one remark (out of your own remarks): Aren't all "-bach" in "-ach"? Therefore, aren't the two charts almost the same as the intersection is quite dominant? I think, a "-ach" minus "-bach" chart would be interesting, too. Kind regards


Reply to this email directly or view it on GitHub #1 (comment).

Well, I'm not a Linguist, just an interested beginner. Therefore, the next comment isn't sophisticated, either :-) But, as I understood, "-ach" ist a thousands year old, early-germanic word and typically adjusted to extrem old settlements, while "-bach" is a very modern word (12th century) and is typically adjusted to modern settlements. Stripping one from the other would give you the opportunity to illustrate settlement history.

Hi, yes, I can see how that would make sense. It could be achieved fairly easily by using regular expressions instead of lists of suffixes. Maybe worth looking into.

/.*[^b]ach$/i might work :)

Finally, I sorta finished a first version of the "interactive" tool I was mentioning above:

Live:
http://bgrsquared.com/placeNames/

Source:
https://github.com/bgrsquared/placeNames

I hope it's stable, I wrote it in quite a hurry...

(added a few more features, such as Switzerland (yay!), live RegExp editing, etc.)

Feedback very appreciated!

Thanks again @MoritzStefaner for the great idea & source!

Nice work, great to see this grow!

This is very nice! I like the aesthetic of the hexagonal binning. I tried something similar with a few others, but for UK place names here (repo). It was fun, but the web implementation is a little rough and only seems well via Chrome (I don't know JS well enough).

What was your experience with the Geonames dataset? I had some trouble with certain large cities only being entered in as counties; I think I ended up manually inputting the largest 100-200 UK cities in my dataset.

Thanks, I should check that.

— M

Nice. Personally, I always enjoy to see some geo data visualized so neatly. Good work!

Would be cool if you could add ‘-fehn’, ‘-moor’ and probably also ‘-deich’ (‘-diek’).

@MoritzStefaner Nice project! There are some locations in it, which aren't municipals or "stand-alone" towns, like "Benninghauser Heide". There's also a mixture with other toponyms. Maybe you should switch to OpenStreetMap as a datasource for towns or even include other toponyms - but where to draw the line?

@tobwen : Interesting input! I tried to use OSM as a source in my fork of this nice project

Currently, it's still experimental as I didn't really find much time to double check the data, but you can get a glimpse here: http://bgrsquared.com/placeNamesExp/ (as opposed to http://bgrsquared.com/placeNames )

I am no expert on OSM, so I used Overpass Turbo to get the nodes, see here: https://gist.github.com/chroth7/43ca48597a3a28ef3dbe

I set up 3 different data sets, a large one (including all "places"), and two filtered ones.
What do you think about this approach?

@chroth7 Thanks for the link. Actually, there's no German extract of OSM-data, so it's hard for me to have a look. Maybe you could create one? I'll have a look on your Overpass queries in the next days. Maybe we could start a discussion about it on the German OSM mailinglist or even the forum. There are much Overpass gurus, which are into borders and related stuff.

Maybe we should / maybe you could also include "non-free", but data under an open license. The German Federal Agency for Cartography and Geodesy (Bundesamt für Kartographie an Geodäsie) provides all the geonames for use in such applications (they've published a CSV file beside other spatial data formats):
http://www.geodatenzentrum.de/geodaten/gdz_rahmen.gdz_div?gdz_spr=eng&gdz_akt_zeile=5&gdz_anz_zeile=1&gdz_unt_zeile=20&gdz_user_id=0

Also, affixes in the name might be a problem at the current approach. Some muncipal areas do have names near rivers, like (constructed example): Beispieldorf am Rhein. So Rhein will distort the result, since the interesting term is in first part of the string. So exploding spaces and other seperators to json arrays {'Beispieldorf', 'am', 'Rhein'} might be an approach to find the suffix -dorf in the first or any other part: foreach $foo of $array do check(). Here's a better example from real life: Zehnhausen bei Rennerod. Rennerod is a bigger town near Zehnhausen, but Zehnhausen is more important for our analysis. The we need to do better a preprocession of the data.

I've developed a tool to analyse streetnames and their parts. I'm using it for an intelligent geocoder and to find clusters all over Germany. Since I developed it for a corporation, I didn't publish a paper. I could modify the script and run it on German settlement names. That way, we'll aösp get a more complete list of endings available. The solution of @chroth7 is better of course, since the user can enter any ending. There there also might be users, who are interested in the "Top50" (I think, we've already covered the Top20).

@tobwen Sounds interesting! Using the Overpass queries, I was trying to get some sort of an extract, on various levels. But I'd be happy to discuss if my approach is sane on a dedicated mailing-list, which one do you suggest?

Please note that I am Swiss, so I don't have that many insights into German towns, hamlets, isolated_places, ... :)

@chroth7 Trust me, better use the forum. The mailing-list takes getting used to. Can you read the German language? Then you could have a look into TagWatch (the statistics, what mappers actually do) and into the Wiki (some kind of documentation):
http://wiki.openstreetmap.org/wiki/DE:Key:place
http://wiki.openstreetmap.org/wiki/DE:Grenze

@tobwen Thanks for the input - I am a bit busy these days, sorry for slow replies. In order not to "spam" this thread, can we maybe continue this discussion on twitter or somewhere? I guess your know-how could surely help me! (and if @MoritzStefaner is interested, we can give him updates if we get some good data... hopefully)

Good idea - feel free to start a separate issue here on github for this…