Enhancements Issues while using FindandTransformDates
Closed this issue · 12 comments
The function FindandTransformDates doesn't consider
-
single digits for day or month in date e.g. 1/12/2017 or 25/1/2017. Is it because of using the standard R date formats. This becomes a problem while reading dates from text files
-
first letter of month in lowercase e.g. "01/jan/2017", "01/feb/2017", "01/mar/2017", again I think same reason.
Also any reason why it reads the dates as character instead of as factors, because if it reads as factors then aggregation by dates becomes easier post reading them which is not so easy if read as character
Hello,
Thanks for the remarks, i guess they could be implemented quite easly.
Could you provide more information on your last point concerning factor and characters? I'm not sure to get it
Thanks
Hi,
Both points 1 and 2 have been integrated in #27
Concerning your last point, i guess you wanted to be able to transform factors: it as been intergated. If it's not that, please specify your request.
Modifications will be available in next CRAN release.
Thanks for your remarks,
I close for now.
Hello,
Sorry to reopen this again, to continue on the same thread. Had few more Qs around the update
We are using dataPreparation is 0.3.2 and have created a dummy file with all possible date formats. Have attached the file for your reference only change i our file are in csv format rather than the attached xls format. Below are our observations
- Column 2 & 3 are in same format but column 2 has been converted to date format successively and column 3 didn’t
- Column 4 has been converted to date format but “24-Jan-2016” has been converted to “2024-01-16”
- Column 5 & 7 has not been converted. Difference between these two is first letter is caps lock in column 5.
- Column 8 has combination of \ and -
Could you pls revert with your findings as well. Thanks
Sorry it doesn't work yet.
I will look into it.
I reopen
Hi,
I started to look into it.
First of all, the dates_test.xlsx file you provided doesn't math with the description you gave, I see:
date1 | date2 | date3 | date4 | date5 | date6 | date7 | date8 |
---|---|---|---|---|---|---|---|
14/01/2016 | 01/01/2016 | 01/01/2016 | 24-janv-16 | Jan 12,2016 | 01/05/2016 | jan-12,2016 | 01/01/2016 |
14/02/2016 | 01/02/2016 | 01/02/2016 | 24-févr-16 | Feb 15,2016 | 01/08/2016 | feb-12,2016 | 18/01/2016 |
14/03/2016 | 01/03/2016 | 10/03/2016 | 24-mars-16 | Mar 20,2016 | 1/16/2016 | mar-12,2016 | 23/01/2016 |
14/04/2016 | 01/04/2016 | 14/04/2016 | 24-avr-16 | Apr 13,2016 | 1/24/2016 | apr-17,2016 | 02/01/2016 |
14/05/2016 | 01/05/2016 | 17/05/2016 | 24-mai-16 | May 14,2016 | 02/02/2016 | may-18,2016 | 14/02/2016 |
14/06/2016 | 01/06/2016 | 18/06/2016 | 24-juin-16 | Jun 12,2016 | 2/15/2016 | jun-18,2016 | 03/01/2016 |
14/07/2016 | 01/07/2016 | 23/07/2016 | 24-juil-16 | Jul 23,2016 | 2/28/2016 | jul-21,2016 | 17/03/2016 |
14/08/2016 | 01/08/2016 | 23/08/2016 | 24-août-16 | Aug 18,2016 | 03/01/2016 | aug-10,2016 | 23/03/2016 |
14/09/2016 | 01/09/2016 | 24/09/2016 | 24-sept-16 | Sep 19,2016 | 3/19/2016 | sep-5,2016 | 28/03/2016 |
14/10/2016 | 01/10/2016 | 11/10/2016 | 24-oct-16 | Oct 11,2016 | 10/25/2016 | oct-19,2016 | 03/03/2016 |
14/11/2016 | 01/11/2016 | 23/11/2016 | 24-nov-16 | Nov 21,2016 | 10/30/2016 | nov-5,2016 | 04/03/2016 |
14/12/2016 | 01/12/2016 | 19/12/2016 | 24-déc-16 | Dec 29,2016 | 12/25/2016 | dec-1-2016 | 14/04/2016 |
14/01/2017 | 01/01/2017 | 12/01/2017 | 24-janv-17 | Jan 12,2017 | 01/10/2017 | jan-5-2017 | 22/05/2016 |
14/02/2017 | 01/02/2017 | 23/02/2017 | 24-févr-17 | Feb 15,2017 | 1/18/2017 | feb-28-2017 | 24/06/2016 |
Also to make your examples reproductible could you provide me the results of:
Sys.getlocale("LC_TIME")
Sys.getenv("LC_TIME")
I can start to answer by:
- Regarding the mixing of separator (aka not always the same separator in the column), it is not a case I implemented because I haven't met this example. Could you tell me where you encountered this types of problems?
- Same remark on date6: you have some times 0 and some other times not 0, it is not an harmonious format, why would it happen?
- Regarding the different types of separator in a data (aka: Jan-15, 2016) it is not a case I handled yet, but it would be a nice to have. I implemented it in v0.3.3 cf PR #32.
Sys.getlocale("LC_TIME")
[1] "English_United States.1252"
Sys.getenv("LC_TIME")
[1] ""
• Regarding the mixing of separator (aka not always the same separator in the column), it is not a case I implemented because I haven't met this example. Could you tell me where you encountered this types of problems?
Column8 is the only column where both separators / and - are used. E.g. 2/1/2016 and 17-03-2016.
• Same remark on date6: you have some times 0 and some other times not 0, it is not an harmonious format, why would it happen?
In my csv file Column 1 is considering all double digits even if we pass single digits but column 6 will consider single as single and double as double. So here in my system format is harmonious.
Ok, it seems that I encounter some issues reading your .xlsx
Could you provide it as a csv file? (save as csv (macintosh)) in excel.
Thanks
github is not allowing to share csv file, can you share an email id where I can send. Thanks
Hi,
here it is:
Thanks,
So I kept investigeting:
- Date1: Is detected
- Date2: Is detected
- Date3: Format is not harmonious: some time sthere is the 0 for month and some times not... In which case do you encounter that? (NB: I did had a piece of code to detect presence/absence of 0 for days and month)
- Date4: Has an ambiguous format: It is not possible to guess if 24 is a day or a year... To solve this issue, either provide the format by hand, or provide more examples.
- Date5: has a "vicious" white space here: "Jul 23,2016 ". I had a piece of code to supress leading and tailing white spaces: will be detected in V0.3.3
- Date6: Is detected
- Date7: Format is not harmonious: at the beginning format is Month-day,year at the end it is Month-day-year
- Date8: Format is not harmonious: it is sometimes day/month/year and sometimes day-month-year. You should consider changoing all "/" into "-" before performing any casting algorithm.
Concerning not harmonious format, to you have examples of when it would happen?
Hope I answered your questions.
Thanks.
- Date 3 -Strangely the excel in our place is misbehaving and converting few months to single and retaining others as double digit. I agree this format is very unusual
- Date 4 - I agree, we knew it to be dd-Mon-YY and it would be tough for the function to know
- Date 5 - Thanks for taking care of leading and trailing whitespaces
- Date 7 - Only concern there was lower and upper case which we tried on 0.3.2 and corrected the format and made it harmonious like Month-day,year but have month in lower case. 0.3.2 doesnt detect as date. Pls confirm
- Date 8 - I agree, we will change to uniform separators before running the function
Concerning date7: lower case is handled in V0.3.2.
But there is also a mix in separators: at the begining MMM-dd,YYYY and at the end MMM-dd-YYYY
Anyway, thanks for those remarks, it helped to handle new cases:
- Lower case
- Missing 0
- Different format in same dates
- Leading and trailing white spaces