mcs07/PubChemPy

SSLCertVerificationError while running the simplest of PubChemPy examples

aromring opened this issue · 15 comments

The following example from PCP documentation:
import pubchempy as pcp c = pcp.Compound.from_cid(5090) print(c.molecular_formula)
results in
`SSLCertVerificationError Traceback (most recent call last)
File c:\Python312\Lib\urllib\request.py:1344, in AbstractHTTPHandler.do_open(self, http_class, req, **http_conn_args)
1343 try:
-> 1344 h.request(req.get_method(), req.selector, req.data, headers,
1345 encode_chunked=req.has_header('Transfer-encoding'))
1346 except OSError as err: # timeout error

File c:\Python312\Lib\http\client.py:1336, in HTTPConnection.request(self, method, url, body, headers, encode_chunked)
1335 """Send a complete request to the server."""
-> 1336 self._send_request(method, url, body, headers, encode_chunked)

File c:\Python312\Lib\http\client.py:1382, in HTTPConnection._send_request(self, method, url, body, headers, encode_chunked)
1381 body = _encode(body, 'body')
-> 1382 self.endheaders(body, encode_chunked=encode_chunked)

File c:\Python312\Lib\http\client.py:1331, in HTTPConnection.endheaders(self, message_body, encode_chunked)
1330 raise CannotSendHeader()
-> 1331 self._send_output(message_body, encode_chunked=encode_chunked)

File c:\Python312\Lib\http\client.py:1091, in HTTPConnection._send_output(self, message_body, encode_chunked)
1090 del self._buffer[:]
-> 1091 self.send(msg)
1093 if message_body is not None:
1094
...
-> 1347 raise URLError(err)
1348 r = h.getresponse()
1349 except:

URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get issuer certificate (_ssl.c:1000)>`
I can browse to https://pubchem.ncbi.nlm.nih.gov/ without a problem and search for CID=5090 delivers rofecoxib.
How to fix it?
BTW, I am on Windows 10, not MAC

I am having the same issue, did you find a solution?

No. And authors don't bother to answer, either.

It might not applicable but have you tried with older Python such as 3.10 or below?

@aromring Aiming to replicate your findings (Python 3.12.4 in Windows 10, pubchempy installed in a virtual environment), I started with the example from the project's landing page. Initially, I mistyped the cid in question -- this compound was resolved; when correcting the typo, I got pretty much the same error you report.

However, the story doesn't end here. Your entry 5090 was resolved well. And going to cid1423 (the one which failed during the first attempt), this now equally was resolved. See this log:

C:\Users\win10>cd Desktop

C:\Users\win10\Desktop>mkdir test_pubchempy_Windows

C:\Users\win10\Desktop>cd test_pubchempy_Windows

C:\Users\win10\Desktop\test_pubchempy_Windows>python -m venv support

C:\Users\win10\Desktop\test_pubchempy_Windows>support\Scripts\activate.bat

(support) C:\Users\win10\Desktop\test_pubchempy_Windows>pip install pubchempy
Collecting pubchempy
  Downloading PubChemPy-1.0.4.tar.gz (29 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: pubchempy
  Building wheel for pubchempy (pyproject.toml) ... done
  Created wheel for pubchempy: filename=PubChemPy-1.0.4-py3-none-any.whl size=13842 sha256=fa42a3a0010017a821184291a0b105baabb61ee06d98b77463825988a939cc55
  Stored in directory: c:\users\win10\appdata\local\pip\cache\wheels\78\0f\d0\080f82ce0d7fdc771401b6acac304bd2ee77d67dee34737bd6
Successfully built pubchempy
Installing collected packages: pubchempy
Successfully installed pubchempy-1.0.4

[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip

(support) C:\Users\win10\Desktop\test_pubchempy_Windows>python
Python 3.12.4 (tags/v3.12.4:8e8a4ba, Jun  6 2024, 19:30:16) [MSC v.1940 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> # example from the landing page
>>>
>>> from pubchempy import get_compounds, Compound
>>> comp = Compound.from_cid(1432)
>>> print(comp.isomeric_smiles)
CN(C)C(CSC)C1=C2C(=C3C(=N2)C=CC(=C3Br)O)C=CN1
>>>
>>> # sorry, typo
>>> comp = Compound.from_cid(1423)
Traceback (most recent call last):
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\urllib\request.py", line 1344, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\http\client.py", line 1336, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\http\client.py", line 1382, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\http\client.py", line 1331, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\http\client.py", line 1091, in _send_output
    self.send(msg)
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\http\client.py", line 1035, in send
    self.connect()
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\http\client.py", line 1470, in connect
    super().connect()
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\http\client.py", line 1001, in connect
    self.sock = self._create_connection(
                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\socket.py", line 829, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\socket.py", line 964, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\win10\Desktop\test_pubchempy_Windows\support\Lib\site-packages\pubchempy.py", line 726, in from_cid
    record = json.loads(request(cid, **kwargs).read().decode())['PC_Compounds'][0]
                        ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\Desktop\test_pubchempy_Windows\support\Lib\site-packages\pubchempy.py", line 271, in request
    response = urlopen(apiurl, postdata)
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\urllib\request.py", line 215, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\urllib\request.py", line 515, in open
    response = self._open(req, data)
               ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\urllib\request.py", line 532, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\urllib\request.py", line 492, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\urllib\request.py", line 1392, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\urllib\request.py", line 1347, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 11001] getaddrinfo failed>
>>> comp = Compound.from_cid(1423)
Traceback (most recent call last):
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\urllib\request.py", line 1344, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\http\client.py", line 1336, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\http\client.py", line 1382, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\http\client.py", line 1331, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\http\client.py", line 1091, in _send_output
    self.send(msg)
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\http\client.py", line 1035, in send
    self.connect()
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\http\client.py", line 1470, in connect
    super().connect()
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\http\client.py", line 1001, in connect
    self.sock = self._create_connection(
                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\socket.py", line 829, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\socket.py", line 964, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\win10\Desktop\test_pubchempy_Windows\support\Lib\site-packages\pubchempy.py", line 726, in from_cid
    record = json.loads(request(cid, **kwargs).read().decode())['PC_Compounds'][0]
                        ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\Desktop\test_pubchempy_Windows\support\Lib\site-packages\pubchempy.py", line 271, in request
    response = urlopen(apiurl, postdata)
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\urllib\request.py", line 215, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\urllib\request.py", line 515, in open
    response = self._open(req, data)
               ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\urllib\request.py", line 532, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\urllib\request.py", line 492, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\urllib\request.py", line 1392, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\win10\AppData\Local\Programs\Python\Python312\Lib\urllib\request.py", line 1347, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 11001] getaddrinfo failed>
>>> # but the other worked?
>>>
>>> comp = Compound.from_cid(1432)
>>> print(comp.isomeric_smiles)
CN(C)C(CSC)C1=C2C(=C3C(=N2)C=CC(=C3Br)O)C=CN1
>>>
>>> # attempt the example of issue 89
>>> comp = Compound.from_cid(5090)
>>> print(comp.isomeric_smiles)
CS(=O)(=O)C1=CC=C(C=C1)C2=C(C(=O)OC2)C3=CC=CC=C3
>>>
>>> # now with the syntax as filed in issue 89
>>> import pubchempy as pcp
>>> c = pcp.Compound.from_cid(5090)
>>> print(c.molecular_formula)
C17H14O4S
>>>
>>> # try again compound 1423 from the landing page
>>> c = pcp.Compound.from_cid(1423)
>>> print(c.molecular_formula)
C19H37NO2

For curiosity, I tested the approach again in Linux Debian 13/trixie; record 5090 was resolved successfully right the first time:

debian:~$ lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux trixie/sid
Release:	n/a
Codename:	trixie
debian:~$ cd Desktop
debian:~/Desktop$ mkdir test_pubchempy_Debian
debian:~/Desktop$ python -m venv sup
debian:~/Desktop$ source ./sup/bin/activate
(sup) debian:~/Desktop$ python --version
Python 3.12.4
(sup) debian:~/Desktop$ pip install pubchempy
Collecting pubchempy
  Using cached PubChemPy-1.0.4-py3-none-any.whl
Installing collected packages: pubchempy
Successfully installed pubchempy-1.0.4
(sup) debian:~/Desktop$ python
Python 3.12.4 (main, Jul 15 2024, 12:17:32) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> # test example from the landing page
>>> from pubchempy import get_compounds, Compound
>>> comp = Compound.from_cid(1423)
>>> print(comp.isomeric_smiles)
CCCCCCCNC1CCCC1CCCCCCC(=O)O
>>> comps = get_compounds('Aspirin', 'name')
>>> print(comps[0].xlogp)
1.2
>>> 
>>> # now testing the compound of issue 89
>>> comp = Compound.from_cid(5090)
>>> print(comp.isomeric_smiles)
CS(=O)(=O)C1=CC=C(C=C1)C2=C(C(=O)OC2)C3=CC=CC=C3
>>> 
>>> # test of compound of issue 89, syntax as filed in issue
>>> import pubchempy as pcp
>>> c = pcp.Compound.from_cid(5090)
>>> print(c.molecular_formula)
C17H14O4S
>>> exit()

Conceptually, maybe it is worth to let pubchempy i) attempt once to connect with the servers of NIH. In case this fails, ii) to repeat this attempt a couple of times (similar to --tries in wget/wget2). Perhaps this already is (implicitly?) implemented in the source code; but because the project's update cycle and uptake of PRs is slow, I don't recall this right from the tip of my hat. It anyway is sensible to organize processing a list of requests to an external database (where connections can fail) in a try/except clause.

(Aspects of parallelization on the local, and a potential throttle to prevent DDoS like scenarios on the remote/server site are not considered here.)

wow, that is a very interesting find, @nbehrnd !! Thank you so much for your work!

It might not applicable but have you tried with older Python such as 3.10 or below?
No

@aromring Aiming to replicate your findings (Python 3.12.4 in Windows 10, pubchempy installed in a virtual environment), I started with the example from the project's landing page. Initially, I mistyped the cid in question -- this compound was resolved; when correcting the typo, I got pretty much the same error you report.

Hi @nbehrnd, Thank you for spending time on this problem! Yours is an interesting find, indeed. I tried your slightly different way of getting SMILES, directly in Python, and the results are enclosed below. I am getting different error message (CERTIFICATE_VERIFY_FAILED) from yours (getaddrinfo). Unfortunately, it's permanent and it does not matter how many times I repeat the same command. :(

`Python 3.12.5 (tags/v3.12.5:ff3bc82, Aug 6 2024, 20:45:27) [MSC v.1940 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

from pubchempy import get_compounds, Compound
comp = Compound.from_cid(1432)
Traceback (most recent call last):
File "C:\Python312\Lib\urllib\request.py", line 1344, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "C:\Python312\Lib\http\client.py", line 1336, in request
self._send_request(method, url, body, headers, encode_chunked)
File "C:\Python312\Lib\http\client.py", line 1382, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "C:\Python312\Lib\http\client.py", line 1331, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "C:\Python312\Lib\http\client.py", line 1091, in _send_output
self.send(msg)
File "C:\Python312\Lib\http\client.py", line 1035, in send
self.connect()
File "C:\Python312\Lib\http\client.py", line 1477, in connect
self.sock = self._context.wrap_socket(self.sock,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\ssl.py", line 455, in wrap_socket
return self.sslsocket_class._create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\ssl.py", line 1042, in _create
self.do_handshake()
File "C:\Python312\Lib\ssl.py", line 1320, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get issuer certifica
te (_ssl.c:1000)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "C:\Users\rfraczkiewicz\AppData\Roaming\Python\Python312\site-packages\pubchempy.py", line 726, in from_cid
record = json.loads(request(cid, **kwargs).read().decode())['PC_Compounds'][0]
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\rfraczkiewicz\AppData\Roaming\Python\Python312\site-packages\pubchempy.py", line 271, in request
response = urlopen(apiurl, postdata)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\urllib\request.py", line 215, in urlopen
return opener.open(url, data, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\urllib\request.py", line 515, in open
response = self._open(req, data)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\urllib\request.py", line 532, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\urllib\request.py", line 492, in _call_chain
result = func(*args)
^^^^^^^^^^^
File "C:\Python312\Lib\urllib\request.py", line 1392, in https_open
return self.do_open(http.client.HTTPSConnection, req,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\urllib\request.py", line 1347, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get issuer c
ertificate (_ssl.c:1000)>

`

@aromring I'm not aware if running pubchempy in a virtual environment which equally has pip-system-certs (a suggestion to an answer I found on stackoverflow here, posted June 2022) will remove this error.

You mention access of the pubchem data with python/pubchempy fails (and for one case, same for me) while access of the data with a web browser works. To narrow down if the hump in the road is in pubchempy, or elsewhere, can you test the two python scripts attached below? Download them, remove the .txt file extension (this only was added to easily attach them here), and launch them from the command line.

  1. The Python script in main.py.txt is a verbatim copy of a trinket of IUPAC-InChI documentation (link) published in 2019. It should accept simple chemical names like acetone, benzene, pyridine, etc when its "gui" asks you. (It will fail for names of more than one word. E.g., sodium chloride is not a valid input here.) Among the options of output is the Hill formula.

  2. The format about "how to send a request to pubchem" is public. So it is possible to edit the syntax in the script a little bit. The Python script in main_edit.py.txt now requests the cid instead.

I agree, they may appear "toy like"; in performance a setback to programmatic queries of multiple cid at once. It is not a fix in/around pubchempy, only a (hopefully temporary) bypass. But they are intentionally simple (e.g. no pubchem session key for a performance better than 5 requests/s max, the 30 s timeout; only functions provided by Python's standard library, etc).

main.py.txt
main_edit.py.txt

@khoivan88 pubchempy as designed by @mcs07 offers to check its functions with pytest. I just run it (see log attached below) on the project's state as left in April 2017. There are a few which appear to affect the interaction with NIH. For instance line 212 reading

E       urllib.error.HTTPError: HTTP Error 404: PUGREST.NotFound

Because you (@khoivan88) worked on/around openenventory, may you have a look on the log? Do these errors relate to / might they cause the problems reported by @aromring ?

2024-09-05_pubchempy_pytest.log

Hi @nbehrnd, Thank you again for looking into it. I've run both scripts and got the same result in each case: SSL: CERTIFICATE_VERIFY_FAILED. I do have pip-system-certs installed, so this is not the issue.

`Enter a chemical name: acetone


Select the value below to retrieve


           INCHI[0]
        INCHIKEY[1]

MOLECULAR FORMULA[2]
SMILES[3]
MOLECULAR WEIGHT[4]


Enter a number choice? 4
Traceback (most recent call last):
File "C:\Python312\Lib\urllib\request.py", line 1344, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "C:\Python312\Lib\http\client.py", line 1336, in request
self._send_request(method, url, body, headers, encode_chunked)
File "C:\Python312\Lib\http\client.py", line 1382, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "C:\Python312\Lib\http\client.py", line 1331, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "C:\Python312\Lib\http\client.py", line 1091, in _send_output
self.send(msg)
File "C:\Python312\Lib\http\client.py", line 1035, in send
self.connect()
File "C:\Python312\Lib\http\client.py", line 1477, in connect
self.sock = self._context.wrap_socket(self.sock,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\ssl.py", line 455, in wrap_socket
return self.sslsocket_class._create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\ssl.py", line 1042, in _create
self.do_handshake()
File "C:\Python312\Lib\ssl.py", line 1320, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get issuer certifica
te (_ssl.c:1000)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "G:\Name_to_SMILES\main.py", line 52, in
main()
File "G:\Name_to_SMILES\main.py", line 49, in main
choices()
File "G:\Name_to_SMILES\main.py", line 21, in choices
choiceID()
File "G:\Name_to_SMILES\main.py", line 35, in choiceID
html = urllib.request.urlopen(string1 + string2 + string3 + selChoice + string4).read()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\urllib\request.py", line 215, in urlopen
return opener.open(url, data, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\urllib\request.py", line 515, in open
response = self._open(req, data)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\urllib\request.py", line 532, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\urllib\request.py", line 492, in _call_chain
result = func(*args)
^^^^^^^^^^^
File "C:\Python312\Lib\urllib\request.py", line 1392, in https_open
return self.do_open(http.client.HTTPSConnection, req,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python312\Lib\urllib\request.py", line 1347, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get issuer c
ertificate (_ssl.c:1000)>
`

@aromring Do you equally use Python 3.12.4? Your log suggests 3.12.X (e.g., 312 in C:\Python312\Lib\urllib) with an unknown X (else python --version). At least something in the range since October 2023. What is the source of the Python you use -- is it Python from python.org, or (for instance) the portable WinPython? Is there maybe some hurdle by the Windows computer you use/the network the computer is connected to (just guessing) which is in the way?

Next attempt: start a virtual environment of Python, amend it with urllib3 via

pip install urllib3

After a successful installation of this one (i.e. in addition to Python's standard library), launch this "script"

import urllib3

resp = urllib3.request("GET", "https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2010/property/MolecularFormula/TXT")
print(str(resp.data))
print(str(resp.data)[2:-3])  # after trimming

On my side, this yields

b'C22H28N6O14P2\n'
C22H28N6O14P2

-- first line is the raw result, the second after trimming the string a little.

  1. How many cid's are to be processed/to be resolved? Presuming one can split the list in to smaller chunks of comma separated sub-lists, one can attempt to engage a trinket, too. I can copy paste the self contained snippet
import urllib.request
from time import sleep  # allows to later limit the rate of requests

list_of_cid = [1020, 1234, 5678]

for entry in list_of_cid:
    string1 = "https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/"
    
    string3 = "/property/MolecularFormula/TXT"
    query = "".join([string1, str(entry), string3])

    try:
        reply_by_nih = urllib.request.urlopen(query).read()
        formula = reply_by_nih.decode("UTF-8").strip()
        print(f"{entry}\t{formula}")
    except Exception as e:
        print(e)
    
    sleep(0.4)  # i.e. a delay of 0.4 s between each request to NIH    

into a trinket, press the play button to obtain a table with one column about the cid sent, and the molecular formula received:

example-800

Only the what follows list_of_cid in the brackets would require an adjustment to the cid of interest. (Agreed, this doodle confounds all kinds of possible errors into a very general try/exception clause. But instead of stopping the list altogether, despite an exception, the other 99 cid of the list still are sent to NIH.)

Edit: in case approach with urllib3 works, I attach a script which wraps this for the CLI. With a working connection "to outside" it runs e.g.,

$ python multiple02.py 123 456
123	C5H12N4O
456	C3H4N2O4

multiple02.py.txt

Hi @nbehrnd, I owe you a beer. :) I use Python 3.12.5 downloaded from Python.org. There are no hurdles for my directly connected computer in terms of accessing the Internet.
urllib3 was already installed. The first script results in already predictable outcome:

WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get issuer certificate (_ssl.c:1000)'))': /rest/pug/compound/cid/2010/property/MolecularFormula/TXT
WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get issuer certificate (_ssl.c:1000)'))': /rest/pug/compound/cid/2010/property/MolecularFormula/TXT
WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get issuer certificate (_ssl.c:1000)'))': /rest/pug/compound/cid/2010/property/MolecularFormula/TXT

SSLCertVerificationError Traceback (most recent call last)
File ~\AppData\Roaming\Python\Python312\site-packages\urllib3\connectionpool.py:467, in HTTPConnectionPool._make_request(self, conn, method, url, body, headers, retries, timeout, chunked, response_conn, preload_content, decode_content, enforce_content_length)
466 try:
--> 467 self._validate_conn(conn)
468 except (SocketTimeout, BaseSSLError) as e:

File ~\AppData\Roaming\Python\Python312\site-packages\urllib3\connectionpool.py:1092, in HTTPSConnectionPool._validate_conn(self, conn)
1091 if conn.is_closed:
-> 1092 conn.connect()
1094 if not conn.is_verified:

File ~\AppData\Roaming\Python\Python312\site-packages\urllib3\connection.py:642, in HTTPSConnection.connect(self)
634 warnings.warn(
635 (
636 f"System time is way off (before {RECENT_DATE}). This will probably "
(...)
639 SystemTimeWarning,
640 )
--> 642 sock_and_verified = _ssl_wrap_socket_and_match_hostname(
643 sock=sock,
644 cert_reqs=self.cert_reqs,
645 ssl_version=self.ssl_version,
646 ssl_minimum_version=self.ssl_minimum_version,
647 ssl_maximum_version=self.ssl_maximum_version,
...
--> 515 raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
517 log.debug("Incremented Retry for (url='%s'): %r", url, new_retry)
519 return new_retry

MaxRetryError: HTTPSConnectionPool(host='pubchem.ncbi.nlm.nih.gov', port=443): Max retries exceeded with url: /rest/pug/compound/cid/2010/property/MolecularFormula/TXT (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get issuer certificate (_ssl.c:1000)')))

Hence, I tried the following:

import urllib3 resp = urllib3.request("GET", "https://github.com") print(str(resp.data)) print(str(resp.data)[2:-3]) # after trimming

with success:

image

Hopefully, this proves absence of generic problems with accessing the web from Python environment.

Equally predictably, all of your other scripts can't get past the "request()" line due to the dreaded CERTIFICATE_VERIFY_FAILED.

Trinket fails for different reason:
image
What the heck? It worked for you!

Never mind, trinket works if I use python3. Duh...
image
There are no mysterious errors with net connectivity, methinks. It's just some nasty problem at PubChem...

Finally! I've found solution thanks to https://stackoverflow.com/questions/30405867/how-to-get-python-requests-to-trust-a-self-signed-ssl-certificate
Follow their instructions on how to obtain PubChem's PEM chain file. Save it, then prepend the following lines to your Python script:

import os
os.environ["REQUESTS_CA_BUNDLE"] = 'path/to/corporate/cert.pem'
os.environ["SSL_CERT_FILE"] = 'path/to/corporate/cert.pem'

The question remains: why PubChemPy just can't get the certificate??

@aromring Nice to read that a contact to NIH was established.

My points now are:

  1. The answer which works for you starts with

    If you're behind a corporate network firewall like I was [...]
    Doesn't this stand a little bit against your answer of
    There are no hurdles for my directly connected computer in terms of accessing the Internet.

  2. Is your question of

    The question remains: why PubChemPy just can't get the certificate??

    a feature suggest to either a) add such a certificate somewhere into the source / project here on GitHub such that the installation of pubchempy (at least from GitHub)* would it already in its backpack and thus be "ready-to-go", or

    b) to let PubChemPy collect once per session such a certificate (in line of the answer by foggy on the same stackexchange.overflow page you refer to) to then be able to get in touch with NIH, exchange and collect cid numbers for SMILES strings, molecular formualae, etc?

MacOS and Linux Debian, though both Unix-like, still differ enough that I don't find in /etc/mkcert/ (answer Fabien Snauwert) but some in /etc/ssl/certs. Variant a) likely could be more portable, especially to allow Windows users to continue to use pubchempy.

* And what about the instance on PyPi which -- like the root repository of the project here -- fully belongs to @mcs07 to accept PR and publish new/updated versions ... (</comment on>: possibly a cid based request to NIH now can report additional types data than (implemented) in 2017. </comment off>.)