Insufficient data: No recent data available
jcheger opened this issue · 9 comments
I did install the plugin on 2 sites. One did work as expected, but the second one is stuck.
- the plugin was installed about 6 months ago (should be long enough)
- Training data statistics: So far the app has captured 30013533 logins (including client connections), of which 37 are distinct (IP, UID) tuples.
- php -f occ suspiciouslogin:train => Not enough data, try again later (Insufficient data: No recent data available)
Any help how to get out of this would be welcome. Any file or db table to delete ?
Could you try again? Do you have both ipv4 and ipv6 data? What version of the app do you use?
Nextcloud 16.0.5
Suspicious Login 1.0.0
IPv4 only
Still the same result: Not enough data, try again later (Insufficient data: No recent data available)
That is strange. Could you run an SQL query to count the number of rows in oc_login_address_aggregated
that have a first_seen
larger than the unix timestamp from a week ago?
The only case where you might not have new IPs for the last week is when your IPs never change. But that seems unlikely.
MariaDB [nextcloud]> SELECT id,seen,
-> DATE_FORMAT(FROM_UNIXTIME(first_seen),'%Y-%m-%dT%TZ') as first_seen,
-> DATE_FORMAT(FROM_UNIXTIME(last_seen),'%Y-%m-%dT%TZ') as last_seen
-> FROM oc_login_address_aggregated
-> WHERE first_seen>DATE_SUB(NOW(), INTERVAL 1 WEEK);
Empty set, 44 warnings (0.00 sec)
I don't know what the records in this table mean. However, I did logout/login in a web browser, and restarted the client on a machine, without any modification in this table (neither the last_seen column).
FYI, I use TOTP on my own, but I also have a Synology that syncs in webdav. One of my colleague also syncs his Synology, but not sure he use the client. Users are also authed in LDAP (Active Directory).
If you have a doubt on my request, here is the content of the table:
MariaDB [nextcloud]> SELECT id,seen,
-> DATE_FORMAT(FROM_UNIXTIME(first_seen),'%Y-%m-%dT%TZ') as first_seen,
-> DATE_FORMAT(FROM_UNIXTIME(last_seen),'%Y-%m-%dT%TZ') as last_seen
-> FROM oc_login_address_aggregated;
+----------+----------+----------------------+----------------------+
| id | seen | first_seen | last_seen |
+----------+----------+----------------------+----------------------+
| 1 | 29307778 | 2019-06-04T22:34:36Z | 2019-10-09T06:41:36Z |
| 648 | 30123 | 2019-06-04T22:37:11Z | 2019-10-09T02:55:32Z |
| 215970 | 18 | 2019-06-05T15:43:41Z | 2019-09-27T15:13:51Z |
| 461456 | 3 | 2019-06-06T12:37:05Z | 2019-06-06T13:38:26Z |
| 564536 | 4 | 2019-06-07T21:49:51Z | 2019-06-07T21:57:41Z |
| 1537240 | 4 | 2019-06-11T11:59:12Z | 2019-06-11T11:59:13Z |
| 2160305 | 4 | 2019-06-14T09:52:23Z | 2019-06-14T10:40:49Z |
| 4678419 | 10 | 2019-06-23T19:45:52Z | 2019-06-25T19:41:16Z |
| 4884910 | 532 | 2019-06-24T10:17:24Z | 2019-10-08T08:59:55Z |
| 6286938 | 22 | 2019-06-28T14:21:34Z | 2019-07-06T13:25:16Z |
| 6664333 | 1317 | 2019-06-29T17:52:47Z | 2019-06-29T19:29:01Z |
| 6932598 | 26 | 2019-06-30T12:06:36Z | 2019-06-30T12:06:55Z |
| 8461734 | 104 | 2019-07-12T10:14:57Z | 2019-10-07T19:57:56Z |
| 9462170 | 2 | 2019-07-15T15:37:51Z | 2019-07-15T15:37:51Z |
| 9491559 | 2 | 2019-07-30T16:57:41Z | 2019-07-30T16:57:41Z |
| 9865499 | 2 | 2019-07-31T17:33:21Z | 2019-07-31T17:33:21Z |
| 12189113 | 3 | 2019-08-07T16:30:22Z | 2019-09-03T11:16:40Z |
| 12433925 | 4 | 2019-08-08T09:38:24Z | 2019-09-03T15:31:29Z |
| 13613275 | 2 | 2019-08-12T10:10:24Z | 2019-08-12T10:10:24Z |
| 13982567 | 3 | 2019-08-13T10:14:15Z | 2019-08-13T15:43:00Z |
| 14338698 | 3 | 2019-08-14T10:04:00Z | 2019-09-05T19:11:05Z |
| 14446679 | 2 | 2019-08-14T22:08:39Z | 2019-08-14T22:08:39Z |
| 14491331 | 2 | 2019-08-18T18:51:33Z | 2019-08-18T18:51:33Z |
| 14775786 | 2 | 2019-08-19T17:08:13Z | 2019-08-19T17:08:13Z |
| 15064891 | 3 | 2019-08-20T13:36:23Z | 2019-08-20T13:43:03Z |
| 15105664 | 6 | 2019-08-20T16:16:07Z | 2019-08-26T17:42:29Z |
| 17149344 | 2 | 2019-08-26T11:37:13Z | 2019-08-26T11:37:13Z |
| 17244033 | 2 | 2019-08-26T17:50:48Z | 2019-08-26T17:50:48Z |
| 18222581 | 7 | 2019-08-29T13:04:13Z | 2019-09-23T10:13:30Z |
| 19597374 | 2 | 2019-09-02T10:14:29Z | 2019-09-02T10:14:29Z |
| 19996955 | 4 | 2019-09-06T09:05:14Z | 2019-09-10T08:24:08Z |
| 20025304 | 79 | 2019-09-06T15:17:35Z | 2019-10-09T03:28:32Z |
| 20057593 | 2 | 2019-09-06T22:13:21Z | 2019-09-06T22:13:21Z |
| 20561952 | 3 | 2019-09-12T13:20:51Z | 2019-09-13T12:54:36Z |
| 20659650 | 3 | 2019-09-13T10:53:53Z | 2019-09-13T11:02:03Z |
| 21006513 | 2 | 2019-09-16T14:00:32Z | 2019-09-16T14:00:32Z |
| 21118706 | 5 | 2019-09-17T13:34:24Z | 2019-09-18T13:55:06Z |
| 22025968 | 2 | 2019-09-25T13:52:14Z | 2019-09-25T13:52:14Z |
| 22028864 | 2 | 2019-09-25T14:31:02Z | 2019-09-25T14:31:02Z |
| 22129515 | 2 | 2019-09-26T14:26:17Z | 2019-09-26T14:26:17Z |
| 22190039 | 5 | 2019-09-27T07:34:53Z | 2019-09-27T23:42:03Z |
| 22203054 | 2 | 2019-09-27T10:43:51Z | 2019-09-27T10:43:51Z |
| 22571308 | 2 | 2019-10-01T14:33:50Z | 2019-10-01T14:33:50Z |
| 22596178 | 2 | 2019-10-01T22:13:25Z | 2019-10-01T22:13:25Z |
+----------+----------+----------------------+----------------------+
44 rows in set (0.00 sec)
I don't know what the records in this table mean. However, I did logout/login in a web browser, and restarted the client on a machine, without any modification in this table (neither the last_seen column).
The login data is not directly fed into that table. It first goes into oc_login_address
and a background job updates the oc_login_address_aggregated
asynchronously.
If you have a doubt on my request, here is the content of the table:
That is indeed strange. Do you use some sort of proxy in front of Nextcloud? Does Nextcloud even see the client IPs?
I don't know what the records in this table mean
It's basically a compressed version of oc_login_address
, in which every login is stored as a row. The aggregated data uses a counter to groups identic (uid,ip) tupes. The timestamps show when a (uid,ip) was used first and last. In your case this compressed 30M entries into <50 rows ;)
This instance of nextcloud is the only one I have without a reverse proxy. Instead, I have a NAT 1:1 configured in a pfsense (means that there is a dedicated IP address for this service, which is also used for outgoing traffic).
The 50 rows are not such a surprise. We are only few users, usually connecting from the same IP addresses.
The problem here is: the current logic tries to split collected data into two sets: training data and validation data. Validation data is the IPs that have only been seen in the last week. The idea behind this is to give a metric of how well the model reacts to historically new data. If your IPs hardly ever change, there won't be anything new recently.
This is a conceptual problem. I'm not sure if this is solvable easily.
So basically, your saying that the use of this app is irrelevant in case the instance is safe and only used by a few users?
What if there is one big attacker in these early stages of the nextcloud instance?
Honestly, I believe hackers have better to do than target ultra-small teams, so if this add-on is not useful in that particular case, I'd rather disable it to avoid Warnings in the log section.
It keeps telling me that the models are not present (Could not predict suspiciousness: No models found) or that there is not enough data.