Overview
A LangChain document loader uses the LangChain framework to simplify the ingestion of data into an LLM (large language model). The langchain-stack-overflow-for-teams document loader allows you to load your Teams Basic, Business, or Enterprise content into a variety of LLMs and frameworks that support the LangChain format. Learn more about the LangChain framework.
NOTE: The code and examples provided are intended for demonstration purposes only. While we strive to ensure accuracy and clarity, the code is provided “as-is” without guarantees, warranties, or support of any kind. For any production use, we recommend reviewing the code thoroughly and adapting it to meet your specific requirements and environment. Use of this material is at your own discretion and risk.
Installation
The Stack Overflow for Teams LangChain document loader is a Python module you can access here: langchain-stack-overflow-for-teams (StackOverflowTeamsApiV3Loader). To install the module, use a Python package management solution such as pip or uv. For example:
pip install langchain-stack-overflow-for-teams
uv add langchain-stack-overflow-for-teams
Supported parameters
The StackOverflowTeamsApiV3Loader supports a number of parameters to allow flexibility in retrieving data from your site.
Parameter | Required | Description |
| Yes | The API v3 access token for your Stack Overflow for Teams site. |
| Teams Basic, Business: No | Your site's API v3 endpoint. This defaults to api.stackoverflowteams.com for Teams Basic and Business. Teams Enterprise users must set this parameter (for example: [your_site].stackenterprise.co/api). |
| Teams Basic, Business: Yes | The team to retrieve data from. This is required for Teams Basic and Business. Set this for Teams Enterprise to retrieve data from a private team (optional). |
| Yes | The content type to retrieve: questions (and corresponding answers) or articles. |
| No | The data retrieval start date in ISO 8601 format (YYYY-MM-DDThh:mm:ssZ) format, useful for returning only the newest data. |
| No | The attribute to sort results by: "activity", "creation", or "score". |
| No | The order to sort results by: "asc" or “desc”. |
| No | Return answered questions that have at least one positively-scored answer (doesn't apply to article content type): "true" or "false" |
| No | Return questions with accepted answers only (doesn't apply to article content type): "true" or "false" |
| No | The number of retries for the request: number (default is 3) |
| No | The timeout for the request: number (default is 30 seconds) |
Authentication
The LangChain loader authenticates using an API v3 auth token previously created in Stack Overflow for Teams. The process for creating a token differs for Stack Overflow for Teams Basic, Business, and Enterprise sites.
Stack Overflow for Teams Basic and Business use a personal access token (PAT). Learn more about PAT authentication in this Stack Overflow for Teams Basic and Business API v3 Overview article.
Stack Overflow for Teams Enterprise uses a service key access token. Learn more about service keys and access tokens in this Stack Overflow for Teams Enterprise API v3 Overview article.
Examples
These examples demonstrate how to use the library to authenticate into and retrieve data from a Stack Overflow for Teams site. The last example demonstrates a fully functional implementation of the LangChain loader and LanceDB vector store, one of many possible methods to ingest your data into an LLM.
Get articles from Teams Basic or Business
from langchain_stack_overflow_for_teams import StackOverflowTeamsApiV3Loaderloader = StackOverflowTeamsApiV3Loader( access_token=os.environ.get("SO_PAT"), team="my team", content_type="articles", ) docs = loader.load()
Get questions with accepted answers from Teams Basic or Business
from langchain_stack_overflow_for_teams import StackOverflowTeamsApiV3Loaderloader = StackOverflowTeamsApiV3Loader( access_token=os.environ.get("SO_PAT"), team="my team", content_type="questions", has_accepted_answer="true", , ) docs = loader.load()
Get articles from Teams Enterprise
Unlike the previous examples that use the default Teams Basic and Business endpoint, the following Teams Enterprise examples specify the site URL.
from langchain_stack_overflow_for_teams import StackOverflowTeamsApiV3Loaderloader = StackOverflowTeamsApiV3Loader( endpoint="[your_site].stackenterprise.co/api", access_token=os.environ.get("SO_API_TOKEN"), content_type="articles", ) docs = loader.load()
Get articles from a private team in Teams Enterprise
from langchain_stack_overflow_for_teams import StackOverflowTeamsApiV3Loaderloader = StackOverflowTeamsApiV3Loader( endpoint="[your_site].stackenterprise.co/api", access_token=os.environ.get("SO_API_TOKEN"), team="my team", content_type="articles", ) docs = loader.load()
Full Example
This example retrieves content from a Stack Overflow for Teams Enterprise site and loads it into a LanceDB vector store for access by an LLM-based system.
This example uses all available parameters to retrieve questions:
with at least one positively-scored answer
with at least one accepted answer
from a private team
since a specified date
sorted by activity, descending
""" This script demonstrates use the Langchain add_documents model to naively load all documents every time (easy, but not efficient) """ import os from dotenv import load_dotenv from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings from langchain_community.vectorstores import LanceDB from langchain_text_splitters import HTMLSemanticPreservingSplitter from langchain_stack_overflow_for_teams import StackOverflowTeamsApiV3Loader from langchain.tools.retriever import create_retriever_tool from langchain_core.prompts import ChatPromptTemplate from langchain.agents import AgentExecutor, create_tool_calling_agentdef main(): load_dotenv() # initialize the LLM using gpt-4o-mini from Azure OpenAI llm = AzureChatOpenAI( azure_deployment="gpt-4o-mini", api_version="2025-01-01-preview", ) # initialize the embeddings using Azure OpenAI embeddings = AzureOpenAIEmbeddings() # initialize the LanceDB vector store db = LanceDB( table_name="docs", uri="./db/lancedb", embedding=embeddings, ) # load documents using our StackOverflowTeamsApiV3Loader document loader loader = StackOverflowTeamsApiV3Loader( endpoint="[your_site].stackenterprise.co/api", access_token=os.environ.get("SO_API_TOKEN"), team="[your_team]", date_from="2021-05-01T00:00:00Z", sort="activity", order="desc", content_type="questions", is_answered="true", has_accepted_answer="true", ) docs = loader.load() print(f"Loaded {len(docs)} documents") # chunk the documents using the HTMLSemanticPreservingSplitter print("Chunking documents...") documents = HTMLSemanticPreservingSplitter( headers_to_split_on=[("h1", "Header 1"), ("h2", "Header 2")], max_chunk_size=1000, chunk_overlap=200, preserve_parent_metadata=True ).transform_documents(docs) print(f"Chunked {len(documents)} documents.") if len(documents) > 0: print(documents[0]) # load the embeddings into our LanceDB vector store db.add_documents(documents) # build a retriever tool from the LanceDB vector store for use by the LLM retriever = db.as_retriever() retriever_tool = create_retriever_tool( retriever=retriever, name="StackOverflowTeamsRetriever", description="Retrieves Stack Overflow for Teams questions and answers", ) # set up the prompt template for the agent prompt = ChatPromptTemplate.from_messages( [ ("system", "You are a helpful assistant that answers questions based on Stack Overflow for Teams data."), ("human", "{input}"), ("placeholder", "{agent_scratchpad}"), ] ) # create the agent with the LLM and the retriever tool tools = [retriever_tool] agent = create_tool_calling_agent(llm, tools, prompt) agent_executor = AgentExecutor(agent=agent, tools=tools) # run the agent with an example question that could be answered by the Stack Overflow for Teams data result = agent_executor.invoke({"input": "what is the SLA for Stack Overflow for Teams Enterprise?"}) print(result)if __name__ == "__main__": main()
Special thanks to Stack Overflow's Ray Terrill for the content of this article.
Need help? Submit an issue or question through our support portal.