dwyl/hits

Review hits changeset and insert

SimonLab opened this issue · 14 comments

While searching why the hits count doesn't increase (see: #131 (comment))
I noticed the insert function seems incorrect.
The attrs values should be passed as the second argument to the changeset however the pipe in the insert places it as first argument:

hits/lib/hits/hit.ex

Lines 14 to 26 in e046a91

def changeset(hit, attrs) do
hit
|> cast(attrs, [])
|> validate_required([])
end
def insert(attrs) do
attrs |> changeset(%{}) |> Hits.Repo.insert()
# see: github.com/dwyl/hits/issues/71
attrs.repo_id
|> get_aggregate_query()
|> Repo.aggregate(:count, :id)
end

I think this should instead be:

  def insert(attrs) do
    changeset(%{}, attrs) |> Hits.Repo.insert()
    # see: github.com/dwyl/hits/issues/71
    attrs.repo_id
    |> get_aggregate_query()
    |> Repo.aggregate(:count, :id)
  end

The changeset doesn't cast or required any values so I'm not sure if it returns an empty Hit or if it returns all the values, I'll need to test it manually to see what is saved in the database.

@nelsonic I'm not sure if there is an easy way to look at the database with fly.io?

Yeah, the weird thing is that this works on localhost. Putting my distributed systems / high availability hat on, I think this request is hitting the “read replica” … Fly.io forces us to have two instances of PostgreSQL. I think this request might read from the Read instance and thus does not increment. 💭

I've checked also on localhost and everything seems to work properly.
The insert functions don't really use the changeset at the moment and insert the new Hit passed as attributes.
I don't know how fly.io works with Postgres yet but I will read about it and try to understand what causes the issue.

@SimonLab are you adding anything else to the PR #137 ? 💭
And do we need to do a mix ecto.migrate for this to work or can we just ship it? :shipit:

I still want to have a bit more look at the schemas, I think it's possible to create a "hit" entry without having to create all the other steps manually first:

def insert_hit(conn, username, repository) do
useragent = Hits.get_user_agent_string(conn)
# remote_ip comes in as a Tuple {192, 168, 1, 42} >> 192.168.1.42 (dot quad)
ip = Enum.join(Tuple.to_list(conn.remote_ip), ".")
# TODO: perform IP Geolocation lookup here so we can insert lat/lon for map!
# insert the useragent:
useragent_id = Useragent.insert(%Useragent{name: useragent, ip: ip})
# insert the user:
user_id = User.insert(%User{name: username})
# strip ".svg" from repo name and insert:
repository = repository |> String.split(".svg") |> List.first()
repository_attrs = %Repository{name: repository, user_id: user_id}
repository_id = Repository.insert(repository_attrs)
# insert the hit record:
hit_attrs = %Hit{repo_id: repository_id, useragent_id: useragent_id}
count = Hit.insert(hit_attrs)
# Send hit to connected clients via channel github.com/dwyl/hits/issues/79
HitsWeb.Endpoint.broadcast("hit:lobby", "hit", %{
"user" => username,
"repo" => repository,
"count" => count
})
# return the count for the badge:
count
end

@SimonLab we kinda want this data to be stored though … we’re you thinking we would store the data in an ETS Cache and simply increment the counter and return that. And then once the response is sent the cache would be cleared in the background? 💭

I still want to save all the data in Postgres, I want to try to use one hit changeset to manage all the inserts. I think Ecto can decide if an associated items should be created or just retrieved from the database, instead writing these steps manually in the code.
I need to have a better look at the Ecto documentation and doing some tests with all the associations.

@SimonLab been thinking that maybe we should restore the "old" behaviour for hits i.e. increment the counter/badge on each page view. But then add the functionality to show only unique views for people who want it via URL query param e.g: &show=unique

@nelsonic I've created this PR which add the show=unique query parameter, let me know if this work 👍

@SimonLab your change appears to have deployed successfully to Fly.io: https://hits.dwyl.com/ 👍🏻

A quick overview of associations in Ecto:

image

I've added two new migrations with 462ff16 which defined unique_index on the useragents and users table. This allow us to check at a database level that we don't have any duplicates created (name + ip must be different for entries in useragents and name are all different on users).
Now we can simplify the insert functions:

hits/lib/hits/user.ex

Lines 30 to 39 in eddc463

case Hits.Repo.get_by(__MODULE__, name: attrs.name) do
# User not found, insert!
nil ->
{:ok, user} = attrs |> changeset(%{}) |> Hits.Repo.insert()
user.id
user ->
user.id
end

case Hits.Repo.get_by(__MODULE__, name: attrs.name, ip: attrs.ip) do
# Agent not found, insert!
nil ->
{:ok, useragent} = attrs |> changeset(%{}) |> Hits.Repo.insert()
useragent.id
useragent ->
useragent.id
end

and use instead the upsert feature by using the on_conflict option

I'm also looking at adding a unique_index on the repository table, we want the name and the user_id to be unique in this case.

These migrations might failed when we deploy if there are already some duplicates in the database, so we might need to do some cleaning if necessary.

@SimonLab saw your additions on the branch/PR. Look good. 👍
Only thing I would say is: try and document the changes in the README.md so that other people understand them. 💭

done with #137