wakatime/wakatime-cli

Language detection for files with no extension

IgnisDa opened this issue ยท 6 comments

As of the current version of wakatime cli, files which do not have an extension are tagged as "Unknown language". But often these files have shebangs inside them (eg: #!/usr/bin/env bash, #!/usr/bin/python etc) which can be used to detect the language of the file.

Would you consider a feature wherein the cli additionally checks for the shebang for language and then tags it as "Unknown language" if nothing is detected.

Yes, we used to do that with the legacy Python wakatime-cli via the Pygments library:

https://github.com/wakatime/legacy-python-cli/blob/e8deb156f1c2d26e5cf874da97f7b4354b3f5d20/wakatime/packages/py27/pygments/util.py#L125

We can add that to wakatime-cli too.

@alanhamlett Can I make a PR for this?

@alanhamlett Can I make a PR for this?

Sure go ahead. If it's your first time contributing please read our guidelines.

muety commented

In addition, it might be helpful to have a way of doing such mappings "manually", either in ~/.wakatime.cfg or .wakatime-project, for files that don't even have a shebang. For example, all files matching a configured regex could get mapped to a specific language. Could also be used to "override" the standard language detection (for whatever reason you might want to do that...).

In addition, it might be helpful to have a way of doing such mappings "manually", either in ~/.wakatime.cfg or .wakatime-project, for files that don't even have a shebang. For example, all files matching a configured regex could get mapped to a specific language. Could also be used to "override" the standard language detection (for whatever reason you might want to do that...).

We already have a [projectmap] section for regex patterns, we could add a [language_map] section. The regex should run against the full file path, but can always only match the file name or extension of course.

Maybe relying on Chroma would do the dirty job for us. Chroma has a function that analyzes the content of a file and can detect even if there's no filename. I already started working in a possible solution.