/zeta-alpha-challenge

Challenge to design and implement a search system in python

Apache License 2.0Apache-2.0

Zeta Alpha Challenge

At Zeta Alpha we’re building the next generation neural discovery platform. Our platform allows you to find documents, stay up-to-date on relevant developments, and organize your knowledge discovery work. Our search system is the part that ingests, understands, and retrieves documents from multiple sources.

Assignment: Implement a search system MVP

In this challenge you will be creating an MVP of the search system. You are provided with a source folder containing multiple PDF documents. Your objective is to implement a solution capable of:

  1. Ingesting, parsing, and storing the provided documents
  2. Retrieving a list of documents by matching the title, authors, and/or content.

You can find the source documents here.

Note: You may not have the knowledge to make an exceptional implementation of all parts of this system. Therefore it's recommended that you try to shine in your areas of expertise, while picking pre-built solutions for the others.

Deliverables

The result should be in the form of functional code along with documentation for running and using it.

[Optional] Bonus points

A typical search system includes the following stages:

  • An offline stage that periodically polls multiple data sources (external APIs, websites, etc.), processes them, and populates a search database.
  • A retrieval stage that allows users to search, rank, and sort the documents from the database.

Some of the challenges we face, when designing a search system, revolve around providing a good experience for both our internal and external users. For example:

  1. Allowing data and content managers to easily manage data sources, as well as individual documents within the sources.
  2. Allowing NLP/ML engineers to easily experiment and develop processing workflows.
  3. Allowing search engineers to tweak and develop the retrieval process.
  4. Allowing end-users to interact with the platform in a fast, reliable, and secure way.

How would you address (some of) these challenges when designing a search system?

Tips and remarks

  • Don't hesitate to contact us with any clarification questions.
  • We expect about half a day to build your solution.
  • The coding assignment should be written in python.
  • You can choose any existing libraries and packages that you deem necessary.
  • Expect to be questioned about your technology and architectural decisions, as well as implementation details.
  • Work in an agile way. You might not be able to completely solve everything in the assignment, so pick wisely.