Jina

用于 𝙖𝙣𝙮 类数据的云原生神经搜索框架。「Cloud-native neural search framework for 𝙖𝙣𝙮 kind of data」

Github星跟蹤圖

Jina🔊 is a neural search framework that empowers anyone to build SOTA & scalable deep learning search applications in minutes.

🌌 All data types - Scalable indexing, querying, understanding of any data: video, image, long/short text, music, source code, PDF, etc.

⏱️ Save time - The design pattern of neural search systems, from zero to a production-ready system in minutes.

🌩️ Fast & cloud-native - Distributed architecture from day one, scalable & cloud-native by design: enjoy
containerizing, streaming, paralleling, sharding, async scheduling, HTTP/gRPC/WebSocket protocol.

🍱 Own your stack - Keep end-to-end stack ownership of your solution, avoid integration pitfalls you get with
fragmented, multi-vendor, generic legacy tools.

Install

  • via PyPI: pip install jina
  • via Conda: conda install jina -c conda-forge
  • via Docker: docker run jinaai/jina:latest
  • More install options

Documentation

Run Quick Demo

Build Your First Jina App

Document, Executor, and Flow are three fundamental concepts in Jina.

Leveraging these three components, let's build an app that find lines from a code snippet that are most similar to the query.

💡 Preliminaries: character embedding, pooling, Euclidean distance
📗 Read our docs for details

1️⃣ Copy-paste the minimum example below and run it:

import numpy as np
from jina import Document, DocumentArray, Executor, Flow, requests

class CharEmbed(Executor):  # a simple character embedding with mean-pooling
    offset = 32  # letter `a`
    dim = 127 - offset + 1  # last pos reserved for `UNK`
    char_embd = np.eye(dim) * 1  # one-hot embedding for all chars

    @requests
    def foo(self, docs: DocumentArray, **kwargs):
        for d in docs:
            r_emb = [ord(c) - self.offset if self.offset <= ord(c) <= 127 else (self.dim - 1) for c in d.text]
            d.embedding = self.char_embd[r_emb, :].mean(axis=0)  # average pooling

class Indexer(Executor):
    _docs = DocumentArray()  # for storing all documents in memory

    @requests(on='/index')
    def foo(self, docs: DocumentArray, **kwargs):
        self._docs.extend(docs)  # extend stored `docs`

    @requests(on='/search')
    def bar(self, docs: DocumentArray, **kwargs):
         docs.match(self._docs, metric='euclidean')

f = Flow(port_expose=12345, protocol='http', cors=True).add(uses=CharEmbed, parallel=2).add(uses=Indexer)  # build a Flow, with 2 parallel CharEmbed, tho unnecessary
with f:
    f.post('/index', (Document(text=t.strip()) for t in open(__file__) if t.strip()))  # index all lines of _this_ file
    f.block()  # block for listening request

2️⃣ Open http://localhost:12345/docs (an extended Swagger UI) in your browser, click /search tab and input:

{"data": [{"text": "@requests(on=something)"}]}

That means, we want to find lines from the above code snippet that are most similar to @request(on=something). Now click Execute button!

3️⃣ Not a GUI fan? Let's do it in Python then! Keep the above server running and start a simple client:

from jina import Client, Document
from jina.types.request import Response


def print_matches(resp: Response):  # the callback function invoked when task is done
    for idx, d in enumerate(resp.docs[0].matches[:3]):  # print top-3 matches
        print(f'[{idx}]{d.scores["euclidean"].value:2f}: "{d.text}"')


c = Client(protocol='http', port=12345)  # connect to localhost:12345
c.post('/search', Document(text='request(on=something)'), on_done=print_matches)

, which prints the following results:

         Client@1608[S]:connected to the gateway at localhost:12345!
[0]0.168526: "@requests(on='/index')"
[1]0.181676: "@requests(on='/search')"
[2]0.218218: "from jina import Document, DocumentArray, Executor, Flow, requests"

😔 Doesn't work? Our bad! Please report it here.

Support

Join Us

Jina is backed by Jina AI. We are actively hiring AI engineers, solution engineers to build the next neural search ecosystem in opensource.

Contributing

We welcome all kinds of contributions from the open-source community, individuals and partners. We owe our success to
your active involvement.

All Contributors



















概覽

名稱與所有者jina-ai/jina
主編程語言Python
編程語言Python (語言數: 5)
平台Docker, Linux, Mac
許可證Apache License 2.0
發布數399
最新版本名稱v3.25.1 (發布於 2024-04-10 14:41:54)
第一版名稱v0.0.5 (發布於 )
創建於2020-02-13 17:04:44
推送於2024-04-19 12:21:16
最后一次提交2024-04-10 23:04:20
星數20.1k
關注者數209
派生數2.2k
提交數8.5k
已啟用問題?
問題數1936
打開的問題數11
拉請求數3466
打開的拉請求數10
關閉的拉請求數696
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?
去到頂部