When using AzureOpenAIEmbeddings
, met a network error: Arguments: (ConnectionError(ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))),)
. Let’s see what happen.
Problem description
Here is the code:
import os
import httpx
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings
http_client = httpx.Client(verify=False)
token_provider = get_bearer_token_provider(
DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)
def get_azure_emb():
return AzureOpenAIEmbeddings(
azure_deployment=os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME"),
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
openai_api_version=os.getenv("OPENAI_API_VERSION"),
azure_ad_token_provider=token_provider,
http_client=http_client,
)
embedding = get_azure_emb()
r = embedding.embed_query("this is a test text")
print(r)
Here is the error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py", line 715, in urlopen
httplib_response = self._make_request(
File "/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py", line 404, in _make_request
self._validate_conn(conn)
File "/opt/conda/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1060, in _validate_conn
conn.connect()
File "/opt/conda/lib/python3.10/site-packages/urllib3/connection.py", line 419, in connect
self.sock = ssl_wrap_socket(
File "/opt/conda/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(
File "/opt/conda/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File "/opt/conda/lib/python3.10/ssl.py", line 513, in wrap_socket
return self.sslsocket_class._create(
File "/opt/conda/lib/python3.10/ssl.py", line 1104, in _create
self.do_handshake()
File "/opt/conda/lib/python3.10/ssl.py", line 1375, in do_handshake
self._sslobj.do_handshake()
ConnectionResetError: [Errno 104] Connection reset by peer
Resolution
Install the tiktoken in your running env: https://stackoverflow.com/questions/76106366/how-to-use-tiktoken-in-offline-mode-computer
Root cause analysis
Inspired by this Github comment, I found the root cause is the code is not running in the public network.
AzureOpenAIEmbeddings uses tiktoken
lib to implment a feature called check_embedding_ctx_length:
Whether to check the token length of inputs and automatically split inputs longer than embedding_ctx_length.
By default, this feature is enable. So by following the call chain embed_documents -> _get_len_safe_embeddings -> self._tokenize
, it will run
try:
encoding = tiktoken.encoding_for_model(model_name)
except KeyError:
encoding = tiktoken.get_encoding("cl100k_base")
And this line will throw this error ConnectionResetError: [Errno 104] Connection reset by peer
if it can’t access public network.
So after install tiktoken as offline mode, the error will disappear.