Today, we are excited to announce that Literal AI is going into public beta. Literal AI is the multi-modal LLM observability and evaluation solution for Developers and Product Managers. My co-founder Willy and I started working on Literal AI last year. Since then, we have assembled a team of top-notch developers, count hundreds of daily active users, and onboarded our first enterprise clients.
From Chainlit to Literal AI
Chainlit, the open-source Python library to build conversational AI applications, now has over 5000 Github stars and a strong community on Discord. The framework is now used by thousands of developers and companies such as Microsoft, Google, BackMarket.
Chainlit was leveraged early on to share RAG applications and collect human feedback. After hundreds of user calls on the iteration cycle of developers and teams, we decided to double down on those same LLMOps features Chainlit already had and to build Literal AI, in collaboration with design partners.
Literal AI
Literal AI enables ambitious engineering and product teams to ship multimodal LLM applications fast and reliably. Literal AI combines LLM observability, LLM evaluation, Prompt engineering, and LLM monitoring to natively foster a powerful development iteration loop. In Literal AI, all these features are connected by design.
LLM observability: LLM tracing of calls, in-context debugging with session visualization, and advanced data management, allow developers and product managers to iterate and debug issues faster, leverage logs to fine-tune smaller models, increase performance, and reduce costs.
LLM evaluation: Create Datasets and combine offline evals, online evals, and A/B testing to compare LLMs, prompts, and agent configurations. It is compatible with Open-Source evaluation frameworks such as RAGAS.
Prompt engineering & management: Literal AI enables seamless collaboration between developers and product teams by managing, optimizing, and integrating prompt versions, facilitating efficient LLM application development through prompt versioning, a Prompt API, and a versatile Prompt Playground.
LLM monitoring: LLM monitoring and analytics involve tracking and visualizing performance metrics like latency and token count, setting automated alert rules, and collecting human feedback through APIs for comprehensive product and user insights.
What sets Literal AI apart?
We built Literal AI to account for use cases ranging from simple LLM calls – extraction, summarization, generation – to complex agentic applications.
Our pillars:
Simplicity: You only need two lines of code to get started. Then, you can leverage SDKs and decorators to add LLM observability and tracing.
Collaborative: Developers iterate and evaluate the LLM systems. Product Managers & SMEs can iterate on prompts, annotate, and create datasets directly from Literal AI. PMs monitor the application performance.
Multimodal: Chainlit being multimodal, Literal AI supports natively multi-modal use cases.
Conversational: Literal AI has a native data model for RAG and conversational applications with Threads and Messages.
Integrations: Literal AI's Python and TypeScript SDK support a wide range of integrations: OpenAI, LangChain, LlamaIndex, Vercel AI SDK, and more.
Thinking ahead, most software experiences will involve one or many multimodal AI agents. As more LLM API providers support multi-modal use cases, we're looking forward to real-time agentic experiences.
How to get started?
We're excited to put Literal AI in the hands of more users, and to get their feedback. Sign up here to get access to Literal AI, and contact us to get an access token to the docker image.