Dates that are incomplete still return a valid date but not as expected.
Closed this issue · 4 comments
- If I parse "tell bob on Fri 24th" given that today is Saturday 11th March, next Friday is 17th and Friday week is the 24th... Sherlock assumes I meant next Friday and returns Friday 17th.
- Additionally. If I parse "tell bob on Monday 24th" given that it's 11th March (as above) then the next Monday 24th or April.
- If I mistakenly give a date that can't exist like Wednesday 12 March 2017 it will return the correct day for 11th March (Saturday 11th March) but my mistake may have been the date but the day.
To further the robustness of the script you consider includeing an array of 'Did you mean...' options for possible correct dates.
i.e...
Given any two/three (of four) parts to a date there would be a number of possible options returned.
"Wednesday 12th March" could return the options: "Sunday 12th March 2017"; "Wednesday 15th March 2017" and "Wednesday 12th April".
If my string is "tell bob on Wed 12th" the options would be "Sunday 12th"; "Wednesday 15th"
If my string is "tell bob on Wed 12th next year" or "tell bob on Wed 12th 2018" the options would be "Wed 12th September"; "Wed 12th September December"
Users could then present the options as a popup selector to my users.
I really do think you on to an incredible and increasingly relevant idea with Sherlock/Watson and would love it to be made even more robust.
Thanks for the script Neil.
The reason you're seeing that bug is because Sherlock treats "24th" as an ambiguous date. Absent any other information, it'll use that number to mean the day of the month, but it's not explicit, so Sherlock prefers relying on other info. In this case, it sees "Fri" and uses that. For example, if you wrote "I placed 24th on fri", Sherlock would do the right thing. As I'm sure you discovered, if you drop "fri" from your string, Sherlock will return March 24th. It would be great to combine the 2 tokens and look ahead to see if Friday the 24th is a valid date, but that could result in a non-trivial performance hit to cover all cases. Sadly, English is an ambiguous language.
Similarly for case 3, Sherlock is using the strongest signal. Just like "fri" is a better signal than just "24th," "12 March 2017" is a better signal than "Wednesday" so it uses that.
Returning an array of possibilities when given bad user input is a great idea, but not one I have time to tackle any time soon. I think it's doable, but my concern would be the performance implications of finding all possibilities and returning a confidence score for each possibility.
You can implement that extra validation fairly trivially via Watson by preprocessing the string for any numeric matches (the following regex /((?:[1-2]\\d|3[0-1]|0?[1-9])(?:st|nd|rd|th)?)/
should work), and then in postprocessing, check if that number substring is in Sherlocked.title
, and if not, it was used for date matching and you can change the validated
property accordingly.
Recurring dates is something I'd love to add but it's a pretty major feature to do correctly and I unfortunately don't have time to tackle that right now.