What is LangChain? How it works and how to get started

Hey there! LangChain is an open-source Python framework that lets you build powerful applications by connecting large language models like GPT-3 with external data sources and computations.

Developed by Anthropic and released in October 2024, LangChain gives you a way to overcome some key limitations of vanilla language models like ChatGPT.

Now, ChatGPT can provide impressively human-like responses in conversation. But it has some major weaknesses:

Its knowledge is limited to what was available on the internet up to 2021. Anything after that is a blank.
It cannot provide truly factual, reliable answers requiring real-world expertise. Ask it about medicine, law, engineering and you‘ll get responses that sound convincing but lack depth.
Each response is independent and isolated. ChatGPT has no memory or state carrying over between interactions.

This is where LangChain comes in. LangChain bridges these gaps by letting you integrate large language models with databases, APIs, web scraping results, and more.

This combination of capabilities allows LangChain apps to provide informative, trustworthy answers across many real-world domains.

Let‘s explore some of the key benefits LangChain offers and how you can get started using it!

Why LangChain is a game-changer

There are several compelling advantages LangChain provides over vanilla LLMs:

🤖 Integration with LLMs from multiple providers

LangChain gives you a standard interface to connect with LLMs from top providers like Anthropic, Cohere, Google, Hugging Face, and OpenAI.

This makes it easy to leverage different models in your app for different use cases. For example, you could use:

Anthropic‘s Claude for certain types of queries where accuracy is critical
OpenAI‘s GPT-3 for more conversational interactions
Google‘s LaMDA for natural-language capabilities
Hugging Face‘s Blooma for biological/medical knowledge

With LangChain, you can easily swap models to fit your needs.

📜 Flexible prompt programming

LangChain provides tools to help you construct prompts appropriately, manage long-running conversations, incorporate chat history into prompts, and supply external context to influence model behavior.

This advanced prompt engineering unlocks more controlled, optimized use of LLMs.

🧠 Stateful sessions with memory

LangChain includes memory components that allow your app to maintain state across conversations. It can remember facts, keep track of entities mentioned, and recall past interactions.

This kind of memory is essential for coherent chatbot experiences that mimic human conversation flow.

🕵🏻 Modular orchestration

You can build agents with LangChain that analyze the user input and dynamically call different tools as needed: LLMs for generation, databases for retrieval, APIs for world knowledge, etc.

This flexible orchestration allows your app to combine the best capabilities according to the context.

As you can see, LangChain opens up much more advanced LLM use cases compared to vanilla ChatGPT.

Adoption and growth

Since its launch in October 2024, LangChain has seen impressive growth:

2,400+ stars on GitHub
650+ contributors
Average of 200+ new commits per month

This makes it one of the fastest growing open-source AI projects out there.

Anthropic recently raised $680 million in funding, with plans to expand LangChain capabilities. CEO Dario Amodei has stated that LangChain is key to "build AI assistants that are helpful, harmless, and honest."

With this level of investment and a robust community, expect LangChain to rapidly evolve in 2024 and beyond.

LangChain‘s vision

According to Daniel Chapman, Anthropic‘s VP of AI Research who created LangChain:

"We‘re building tools like LangChain to empower developers to create the next generation of AI assistants that don‘t just converse, but actually help – with knowledge seeking, task automation, data retrieval, and more."

Chapman has outlined plans to enhance LangChain as a platform for LLM app development:

Broaden model integrations beyond LLMs like DALL-E for images
Streamline injecting domain knowledge into apps
Expand tooling for LLM coordination
Grow data integration capabilities
Simplify prompt programming

With this roadmap, LangChain is positioned to become the leading open-source solution for building capable, trustworthy AI apps.

Getting started using LangChain

Ready to start building with LangChain? Here‘s a quick guide to get up and running:

1. Install LangChain

First, install the LangChain library using either:

pip install langchain

conda install langchain -c conda-forge

This will install LangChain and all its dependencies.

2. Connect your LLM

Next, you need to connect LangChain to the large language model you want to use.

For example, to use OpenAI‘s GPT-3 API:

# Install OpenAI SDK
pip install openai

# Set API key
import os
os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY"

You can follow similar steps to connect Cohere, Anthropic, Google, etc.

3. Import LangChain and build your agent

With your LLM connected, you can start importing LangChain and building your agent:

from langchain import *

# Create your agent
agent = MyAgent()

# Start using LangChain capabilities!

Refer to the Python or JavaScript quickstart guides.

You‘ll find examples for:

Building a conversational chatbot
Creating a text completion model
Generating text embeddings
Integrating external data
Constructing prompts
Orchestrating multiple AI tools

4. Join the community

As you build with LangChain, tap into the active open-source community:

Discord – Get help debugging issues and discuss use cases
Forums – Ask questions and find answers from other users
GitHub – Follow development, request features, and contribute!

Integrating real-world data

While large language models have impressive capabilities, combining them with real-world data is where LangChain truly shines.

There are two primary ways LangChain ingests external data:

Document loaders

Document loaders allow you to load various data sources into a common Document format that LangChain can use:

Text files
CSV/TSV datasets
JSON records
PDFs
Web page scrape results
Database query results
and more…

For example, you could load:

Product manuals as text documents
Customer account details from a CSV file
Wikipedia JSON dumps
Academic papers scraped as PDFs

This data can then be indexed for efficient retrieval.

Indexers

Indexers take the document loaders and build vector databases that can be quickly queried to find relevant documents.

Popular choices include:

Vectorstore – Anthropic‘s vector index
Pinecone – Vector database for production use
Milvus – Open-source vector search engine

Once loaded and indexed, you can query your documents directly from LangChain agents.

This allows your agents to provide informative answers even for narrow domains using your custom data!

Web scraping to feed LangChain

One powerful way to supply LangChain with domain-specific data is by web scraping.

Tools like Apify make it easy to build web scrapers to extract clean, structured data from websites.

For example, you could:

Scrape product pages to load a product database
Crawl documentation sites to index help articles
Build price comparison scrapers to track ecommerce sites
Scrape niche websites related to your field to create a custom search engine

The Apify SDK lets you run this web scraping directly from LangChain code.

from apify_client import ApifyClient
from langchain import *

# Scrape a site with Apify 
apify = ApifyClient("YOUR_API_KEY")
scraper = apify.actor("apify/web-scraper")
run = scraper.call(start_url="https://www.example.com", extract_xpath="//p")

# Load scrape results into LangChain
loader = ApifyLoader(run)
docs = loader.load() 

# Index the documents
index = VectorstoreIndexCreator().from_loaders([loader])

# Query the index!
results = index.query("What is on example.com?")

This allows you to leverage web data directly within a LangChain agent.

Apify provides a wide range of scrapers to handle different sources:

Web Scraper – Scrape any site or template
Unblocker – Get raw pages behind JS
Crawler – Crawl a whole website
API Client – Call REST APIs
Punctuation Fixer – Clean up text

You can integrate data from any of these tools into LangChain in just a few lines of code!

Building advanced LangChain agents

By combining large language models with external data and modular logic, LangChain enables you to create remarkably advanced applications.

Let‘s walk through some examples of complex agents you could build:

Smart question answering bot

Build an agent that:

Uses a semantic parser to analyze questions and determine intent
Checks a SQL database for direct answers first
Falls back to querying indexed documents if no database match
Uses a conversational LLM when no documents are relevant
Swaps between Claude and GPT-3 depending on question type
Maintains conversational state using LangChain memory

This allows your bot to provide accurate, trustworthy answers across a wide range of topics!

Ecommerce support agent

Create an agent that:

Loads customer account data like order history from a CSV
Integrates product documentation web scrape results
Queries a Pinecone index to find relevant help articles
Uses Bloomua‘s medically-trained LLM for health/diet questions
Orchestrates Anthropic, Cohere, and GPT-3 models
Remembers customer details and past interactions

You now have a capable ecommerce support chatbot!

Expert researcher assistant

Build an agent that:

Integrates field-specific publications scraped from databases
Summarizes lengthy documents using AI extractors
Answers factual queries with indexed research papers
Generates text explanations of complex concepts
Compares findings from multiple papers
Cites sources and includes references

This allows you to create an AI research partner tailored to your domain!

As you can see, LangChain enables incredibly advanced applications by orchestrating LLMs, data sources, and logic modules.

The possibilities are truly endless for what you can build by providing the right design and data to LangChain.

Helpful LangChain resources

As you learn and explore building with LangChain, check out these handy resources:

Documentation

The official docs cover all LangChain concepts and capabilities. Focus on:

Agents – Building conversational agents
LLMs – Using models like GPT-3
Memory – Remembering facts
Prompt Engineering – Optimizing prompts
Data Integration – Ingesting documents

Tutorials

The Jupyter notebook tutorials provide examples for key concepts like building chatbots, integrating LLMs, and querying data.

YouTube

This YouTube playlist contains video walkthroughs covering LangChain basics.

Community

Join the Discord to engage with other users, get help, and share ideas!

So there you have it – everything you need to get started with LangChain and build the next generation of AI assistants! By providing the right data and design, you can overcome the limits of today‘s LLMs.

I‘m excited to see what you build. Go unleash your creativity with LangChain!