zotero/dataserver

self hosting documentation

Opened this issue ยท 17 comments

I found no issue dealing with "self hosting" and I think this process should be documented.

According to this post on the zotero forum, "Zotero makes the code available but doesn't provide any support for self-hosted servers."

This is something I'd like to dig in, so consider this as a "tracking issue", not "feature request".

Figured much out on this end? I'd love to self-host my Zotero servers so I can sync large documents to it, I am not going to pay $10/m because I have a couple dozen large docs...

I also have multiple devices (Work & Personal) that I want to sync between, so I have one document repository for them.

I did not look into it. For me it's not a money problem rather a philosophical one. I won't engage in a project which does not fully comply with the free software idea. It's not the same if the Zotero project wants to let people set up their own servers (1) or if they prefer to keep them into their ecosystem (2).
I will only consider a paid plan if option 2 is valid.

I'm not sure what you think the "free software idea" is, but the idea that there's some universally accepted definition of it that we're in violation of is absurd. Nearly everything we produce is open source, including the dataserver, which is AGPL-licensed. You are perfectly free to run your own version, and various labs and companies do so. We just don't currently have the time or resources to provide any support for doing so.

This isn't a documentation issue โ€” it's a technical and resource one. Creating a version that's easy to self-host would just be a completely different project from running a hosted service for millions of users on a sprawling AWS infrastructure with countless services, databases, database shards, caches, search clusters, Lambdas, etc., and all the processes for deploying, debugging, upgrading, and monitoring it, not to mention managing compatibility across different versions of Zotero clients, would be completely different. We still do hope to make a more easily self-hostable version at some point, but for now our priority is maintaining the service for Zotero users.

I understand the point. Everything falls back to a resource allocation problem. That's why I wish funding could be directed towards certain features. And that's why this issue can remain open until someone allocates his time to fix it.

Just to be totally clear, though, this isn't something that just needs time allocated once and then would be fixed. It would need to be a permanent, ongoing effort to keep a self-hostable version current. We try to document necessary changes in commit messages, with scripts for DB upgrade steps, notes about new dependencies, etc., but any sort of update to a self-hosted version would just require a totally different process from what we do internally.

It'd be relatively feasible to build a container-based version of the server environment โ€” we've experimented with that in the past โ€” but once there are users and existing data, keeping it working and current and compatible with Zotero clients is a much more difficult proposition.

It seems that what Zotero needs is a stable, well-documented API for the data-server. Honestly, in order to do good development, this API is probably already documented somewhere. Unless I'm mistaken, however, this API doesn't appear to be publicly available (the APIs published here are only for read-only access to data on the server). Correction: the API does support read/write operations.

With the full API publicly available, then anyone can choose to implement a data-server however they wish, so long as it complies with the API.

@edgimar:

the APIs published here are only for read-only access to data on the server

Given that there are sections for โ€œWrite Requestsโ€, โ€œFile Uploadsโ€, and โ€œSyncingโ€, Iโ€™m not sure what makes you think that.

@dstillman, good question! ๐Ÿ™‚ I didn't look at it closely enough - sorry for the confusion. It might be helpful to at least add a readme to this repo indicating that the server implements that API (and linking to it).

It'd be relatively feasible to build a container-based version of the server environment โ€” we've experimented with that in the past โ€” but once there are users and existing data, keeping it working and current and compatible with Zotero clients is a much more difficult proposition.

This is the key realisation people need to make. Unless you have highly competent internal SRE/DevOps resources, running a proper container-based infrastructure can be highly challenging. Any script-kiddie can spin up a container on docker, but running and seemlessly upgrading complicated systems with 10s or even 100s of services on container orchestrators that are serving thousands/millions of users in a reliable manner is why good SREs get as much as good devs at Google & Co.

Just discovered this project and was wondering about self-hosting sync as well. Disappointed that I would have to rely on third-party servers. I will stick with Joplin. You should look at how they sync, where you can choose from several synchronization targets.

wanstr commented

I'm new to Zotero so any help is greatly appreciated. Now the issue I'm facing:

  1. I have a huge a mount of references I would like to add
  2. I also want to attach the pdf files
  3. I prefer not to store the files with Zotero

So it seems with 1 and 2 I should use the web api, but then with 3 my option is limited to using my own webdav. The web api doesn't seem to be able to upload to webadv. Solution?

@wanstr: Please post all questions to the Zotero Forums.

Just to be totally clear, though, this isn't something that just needs time allocated once and then would be fixed. It would need to be a permanent, ongoing effort to keep a self-hostable version current. We try to document necessary changes in commit messages, with scripts for DB upgrade steps, notes about new dependencies, etc., but any sort of update to a self-hosted version would just require a totally different process from what we do internally.

It'd be relatively feasible to build a container-based version of the server environment โ€” we've experimented with that in the past โ€” but once there are users and existing data, keeping it working and current and compatible with Zotero clients is a much more difficult proposition.

I think the best solution here is to leave people to figure out how to host the 'datacenter' (it's open source) (plus uniuuu seems to keep an up-to-date repo (with the hopes of merging it to the main zotprime) here

The the first step is to make sure to document how to set your own sync server (in the app) and what needs to be changed there. (but this isn't the repo for that).

Curious what the state is, working with organizations with sensitive content and not always an internet connection, zotero looks interesting, but having a self hostable server would be nice. Sysadmin/devops skills isn't much of an issue, but even they need documentation.

Curious what the state is, working with organizations with sensitive content and not always an internet connection, zotero looks interesting, but having a self hostable server would be nice. Sysadmin/devops skills isn't much of an issue, but even they need documentation.

Honestly, the main problem for me has not been getting a server running (I mentioned that repo above, that uses docker).

But for me, it is getting the program to use the server. While you can compile the program yourself, and use it that way. I find that to be burdensome for deployment (for my personal use, atleast, as I won't get updates, not in repos)

Does anyone have tested the following project

https://github.com/linuxserver/docker-zotero

It seems to be designed for it.
But I didn't find any mention on how to change the zotero client.

Unless I am mistaken, docker-zotero is for hosting the web client of Zotero. It is not relevant to self-hosting the dataserver, me thinks, since it presumably has the same issue as the desktop client where you need to compile it yourself to point it to the right server. ZotPrime V2 is probably the closest we have right now.