/startup-search

Primary LanguagePythonGNU Affero General Public License v3.0AGPL-3.0

๐Ÿ’ก Run the neural search over start-ups ๐Ÿฅฐ

*A TABLE OF CONTENTS *

๐ŸŽฌ Overview

Having recently visited Web Summit, I have realized how many brilliant ideas there are in the world and how much the proper search over them is necessary. However, simple Google search over start-up description using partial descriptions of start-up specializations might be not very successful. That's where neural search might be handy.

About this app:
Learnings 1. Doing first neural search and even NLP-related solution
2. After FineTuner event applying better and faster tuning strategies
Dataset used Link to Start-Up dataset
Model used Link to Sentence-Transformers model

๐Ÿ Build the app with Python

These instructions explain how to build the app and deploy it with Python.

๐Ÿ—๏ธ Requirements

It is recommended to consider several points before diving in using the Startup Search app such as following:

  1. You have a working Python 3.8 environment.
  2. We recommend creating a new Python virtual environment to have a clean installation of Jina and prevent dependency conflicts.
  3. You have at least 2GB of free space on your hard drive.

๐Ÿ‘พ Step 1. Clone the repo and install Jina

Begin by cloning the repo, so you can get the required files and datasets. (If you already have this repository on your machine make sure to fetch the most recent version)

git clone https://github.com/EliaLesyk/startup-search.git

And enter the correct folder:

cd startup-search

In your terminal, you should now be located in you the startup-search folder. Let's install Jina and the other required Python libraries. For further information on installing Jina check out our documentation.

pip install -r requirements.txt

๐Ÿ“ฅ Step 2. Download your data to search (Optional)

You can use bash script for getting the data. There are some default settings set to ensure that the file will be downloaded from the link and converted to the necessary format.

Bash script:

  • Run sh get_data.sh.

๐Ÿƒ Step 3. Index your data

In this step, we will index our data. For this step you can use this command:

python app.py --index

It is also possible to run indexing over the small part to come to the result faster:

python app.py --index --n 10

If you see the following output, it means your data has been correctly indexed.

Flow@5162[S]:flow is closed and all resources are released, current build level is 0

๐Ÿ”Ž Step 4: Query your data

Next, we will deploy our query Flow. We're doing it in the interface from Streamlit that will be automatically opened in your default browser.

  1. Open a new terminal without closing the one with Indexing.
  2. Run the query Flow in your terminal like this:
streamlit run run.py

๐Ÿš€ Step 5: Check the results

Now you should see the web interface where you can:

  • type the keywords that will be used for neaural search over the descriptions of start-ups
  • give feedback to the system with "Mark as Relevant" and "Mark as Irrelevant", thus, changing the parameters of the system and receiving more relevant results.

๐Ÿ“– References

Embracing the power of open source, I have used the following websites to build the neural search over start-ups:

  • Source of code inspiration - the showcase solution with Jina for the search over scientific articles from arXiv.
  • Source of data - the article describing the usage of neural search for quering start-up data.
  • Source of model - the model called SPECTER: Document-level Representation Learning using Citation-informed Transformers.

๐Ÿ”ฎ About Me

Official Photo

I'm currently Master Student in Friedrich-Alexander University of Erlangen-Nuremberg. My Master Programme is International Information Systems with majors in Business Analytics, Machine Learning and Business Intelligence.

My work is in the area of digital health for radiology at Siemens Healthineers. We're building the validated prototypes helping doctors cure cancer.

The ongoing research I undertake is in the area of speech emotion recognition applying deep learning and transfer knowledge methods for children and adult speech data.


โญ๏ธ Next steps

  1. The next version of the app might include the filter over cities where start-ups are located.
  2. Dealing with Docker on MacOS with M1 Chip is still the problematic issue. It needs more time to be solved to be deployed for this app.

๐Ÿ‘ฉโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ Jina Community

  • Slack channel - a communication platform for developers to discuss Jina.
  • LinkedIn - get to know Jina AI as a company and find job opportunities.
  • Twitter Follow - follow and interact with Jina using hashtag #JinaSearch.
  • Company - know more about the company and how it's fully committed to open-source!

๐Ÿฆ„ License

Copyright (c) 2021 Jina AI Limited. All rights reserved.

Jina is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.