Auto language detector!
Closed this issue · 9 comments
We can have a function that takes the [FILE]
name and returns the programming language that the scrip file is written in based on the following LANGUAGES_TYPES
:
LANGUAGES_TYPES = {
'bash': ['.sh', ..],
'c': ['.c', ..],
...
}
We remove the default value for --language
and check for any user-made values. If there is no value for this field, detect_language(type)
will be called.
This would be so cool, I also recommend using this package and not using the file extension.
@Farhaduneci, Check out the issues. It looks like some people having issues installing that package!
Hey @lnxpy Check this out :
>>> from pygments.lexers import guess_lexer, guess_lexer_for_filename
>>> guess_lexer('#!/usr/bin/python\nprint "Hello World!"')
<pygments.lexers.PythonLexer>
>>> guess_lexer_for_filename('test.py', 'print "Hello World!"')
<pygments.lexers.PythonLexer>
I got it from here, I think this is a great way of guessing codes, This even creates the potential of showing the codes with lexing via rich.
This is a list of all available lexers in the pygments
library
@SepehrRasouli That's awesome but we don't have pygments
as a dependency and since this package's main functionality is code highlighting, that might not be a good solution using this package for this purpose.
@lnxpy, Your reasoning for not using this library is correct, But that's the best choice, The library @Farhaduneci introduced has problems and dictionaries might not be efficent and get too big. Do you still think we should not use this ? This also creates the potential of showing the code to the user, with highlighting.
@SepehrRasouli I prefer the old-school way like detecting based on the file type.
@lnxpy Then we should either gather the extensions of many programming languages' file types, Or only cover important and popular ones.
Which one do you think we should do ? And, Can I work on this ?
Hi @lnxpy, Based on what we talked about last week, I want to work on this issue by the method you said.
We remove the default value for --language and check for any user-made values. If there is no value for this field, detect_language(type) will be called.
Can you please elaborate a little bit more ? I can't understand what you mean by this.
@SepehrRasouli, there is a positional argument called args.file
that takes the file. You have access to the file name so you can trim out the type using os.path
or pathlib
. Now, you can set a value for args.language
.