02A Chemical Structure Inputs for PUG-REST
Author
Joshua Schrier
Title
02A Chemical Structure Inputs for PUG-REST
Description
Use SMILES and InChI strings to specify the input compound for a PUG-REST request.
Category
Educational Materials
Keywords
cheminformatics, Chemoinformatics, chemical information, PubChem, quantitative structure, property relationships, QSPR, machine learning, computer-aided drug design, chemistry
URL
http://www.notebookarchive.org/2020-10-ebntr4w/
DOI
https://notebookarchive.org/2020-10-ebntr4w
Date Added
2020-10-31
Date Last Modified
2020-10-31
File Size
168.86 kilobytes
Supplements
Rights
CC BY-NC-SA 4.0



Chemical Structure Inputs for PUG - REST
Chemical Structure Inputs for PUG - REST
Objectives
Objectives
◼
Use SMILES and InChI strings to specify the input compound for a PUG-REST request.
◼
Use a structure-data (SD) file to specify the input compound for a PUG-REST request.
◼
Learn to submit a PUG-REST request using the HTTP-POST method.
You can use a chemical structure as an input for a PUG-REST request. PUG-REST accepts some popular chemical structure line notations such as SMILES and InChI strings. It is also possible to use an Structure-Data File (SDF) as a structure input.
To learn how to specify the structure input in a PUG-REST request, one needs to know that there are two methods by which data are transferred from clients (users) and servers (PubChem) through PUG-REST. Discussing what these methods are in detail is beyond the scope of this material, and it is enough to know three things:
To learn how to specify the structure input in a PUG-REST request, one needs to know that there are two methods by which data are transferred from clients (users) and servers (PubChem) through PUG-REST. Discussing what these methods are in detail is beyond the scope of this material, and it is enough to know three things:
◼
When you make a PUG-REST request by typing the request URL in the address bar of your web browser (such as Google Chrome, Safari, MS Internet Explorer), the HTTP GET method is used.
◼
The HTTP GET method transfers information encoded in a single-line URL.
◼
Some chemical structure inputs are not appropriate to encode in a single-line URL (because they may contain special characters not compatible with the URL syntax, span over multiple lines, or too long), and the HTTP POST needs to be used for such cases.
For more information on HTTP GET and POST, read the following documents.
◼
◼
Using the HTTP GET method.
Using the HTTP GET method.
Structure encoded in the URL path.
Structure encoded in the URL path.
In some cases, you can encode a chemical structure in the PUG-REST request URL path as in the following example:
In[]:=
prolog="https://pubchem.ncbi.nlm.nih.gov/rest/pug";smiles1="CC(C)CC1=CC=C(C=C1)C(C)C(=O)O";url=URLBuild[{prolog,"compound/smiles",smiles1,"cids/txt"}]
Out[]=
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/smiles/CC%28C%29CC1%3DCC%3DC%28C%3DC1%29C%28C%29C%28%3DO%29O/cids/txt
Observe how URLBuild has converted special characters like parentheses and equals signs into encoded version (prefixed by the % symbol). This request URL returns ibuprofen (CID 3672):
In[]:=
URLExecute[url]
Out[]=
3672
In contrast, now try to run the following (and expect an error):
In[]:=
smiles2="CC1=C([C@@](SC1=O)(C)/C=C(\\C)/C=C)O";(*comment:Mathematicainterprets\asanescapecharacter,so"\\"includesasingle"\"*)url=URLBuild[{prolog,"compound/smiles",smiles2,"cids/txt"}]res=URLExecute[url]
Out[]=
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/smiles/CC1%3DC%28%5BC%40%40%5D%28SC1%3DO%29%28C%29/C%3DC%28%5CC%29/C%3DC%29O/cids/txt
Out[]=
Status: 400Code: PUGREST.BadRequestMessage: Unable to standardize the given structure - perhaps some special characters need to be escaped or data packed in a MIME form?Detail: error: Detail: status: 400Detail: output: Caught ncbi::CException: Standardization failedDetail: Output Log:Detail: Record 1: Warning: Cactvs Ensemble cannot be created from input stringDetail: Record 1: Error: Unable to convert input into a compound objectDetail: Detail:
Note in the above example that the SMILES string contains special characters. In this case a forward slash (“/”), which is also used in the URL path. These special characters conflict with the PUG-REST request URL syntax, causing an error when used in the PUG-REST request URL. The “Message” suggests the source of this error, and its solution.
Structure encoded as a URL argument
Structure encoded as a URL argument
To circumvent the issue mentioned above, the SMILES string may be encoded as the URL arguments (as an optional parameter followed by the "?" character). To more clearly illustrate how this construction is performed, we will form the URL by an explicit concatenation of the strings, before retrieving the CID number:
In[]:=
url2=prolog<>"/compound/smiles/cids/txt?smiles="<>smiles2res2=URLExecute[url2]
Out[]=
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/smiles/cids/txt?smiles=CC1=C([C@@](SC1=O)(C)/C=C(\C)/C=C)O
Out[]=
135403829
Structure passed as a parameter
Structure passed as a parameter
URLBuild facilitates constructing structured queries provided as a list of parameter -> value rule pairs:
In[]:=
url3=URLBuild[{prolog,"/compound/smiles/cids/txt"},(*listofpathspecifiers*){"smiles"smiles2}(*parameternames"smiles",andthevalueitshouldhave*)]res3=URLExecute[url3]
Out[]=
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/smiles/cids/txt?smiles=CC1%3DC%28%5BC%40%40%5D%28SC1%3DO%29%28C%29%2FC%3DC%28%5CC%29%2FC%3DC%29O
Out[]=
135403829
In[]:=
url3b=URLBuild[{prolog,"/compound/smiles/cids/txt"}]res3b=URLExecute[url3b,{"smiles"smiles2}]
Out[]=
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/smiles/cids/txt
Out[]=
135403829
Looking closely at the first two URLs used to construct perform the request:
In[]:=
url2url3
Out[]=
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/smiles/cids/txt?smiles=CC1=C([C@@](SC1=O)(C)/C=C(\C)/C=C)O
Out[]=
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/smiles/cids/txt?smiles=CC1%3DC%28%5BC%40%40%5D%28SC1%3DO%29%28C%29%2FC%3DC%28%5CC%29%2FC%3DC%29O
From these two URLs, we can see two important things :
◼
When the structure is passed using a rule pair using URLBuild (i.e., “res3”), the structure is automatically encoded as a URL argument (after the “?” mark).
◼
URLBuild converts the special characters in the SMILES string according to the URL encoding rules: https://www.w3schools.com/tags/ref_urlencode.asp. For example, the equal sign “=” changes into “%3D”, and “(“ into “%28”, “/” into “%2F”, etc.
However, the fact that the same result is returned indicates that the two approaches to construct the HTTP GET request are essentially the same.
Exercises
Exercises
Exercise 1a Retrieve (in the CSV (comma-separated values) format) the Hydrogen bond donor and acceptor counts, TPSA, and XLogP of the chemical represented by the SMILES string: “C1=CC(=C(C=C1Cl)O)OC2=C(C=C(C=C2)Cl)Cl”. When you construct a PUG-REST URL for this request, encode the structure in the URL path. Warning: As noted in Assignment 1, Exercise 3b, the non-letter characters in isomeric SMILES strings can be mistaken for other types of data, which can result in an error message. To correctly handle this, it is necessary to specify an explicit format interpretation, using the URLExecute[url, params, format] style. In this case, you want the format of either URLExecute[url, {}, “Text”] or URLExecute[url, {}, “CSV”].
In[]:=
(*Writeyourcodeinthiscell*)
Exercise 1b Get the CID corresponding to the following InChI string, using the HTTP GET method. Pay attention to the case sensitivity of the URL parameter part after the “?” mark.
In[]:=
inchi="InChI=1S/C17H14O4S/c1-22(19,20)14-9-7-12(8-10-14)15-11-21-17(18)16(15)13-5-3-2-4-6-13/h2-10H,11H2,1H3";
In[]:=
(*Writeyourcodeinthiscell*)
Using the HTTP POST method
Using the HTTP POST method
Comparison of HTTP POST and GET
Comparison of HTTP POST and GET
All three examples above use the HTTP GET method—although you would not necessarily know this because Mathematica tries to hide irrelevant details from the user. Alternatively, one can use the HTTP POST method by explicitly specifying the values. For example, the following example returns the identical result as the last two HTTP GET examples:
In[]:=
url=prolog<>"/compound/smiles/cids/txt"URLExecute[url,{"smiles"smiles2,"RequestMethod""POST"}]
Out[]=
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/smiles/cids/txt
Out[]=
135403829
Alternatively we can get more fine-grained control by constructing an HTTPRequest. The parameters are set by construction an association with the key “Query” whose value is a list of rules defining each parameter. By default, HTTPRequest performs a GET operation:
In[]:=
req=HTTPRequest[url,<|"Query"{"smiles"smiles2}|>]
Out[]=
HTTPRequest
|
Note that HTTPRequest merely constructs the request; to execute it on the server, it must be provided to URLExecute:
In[]:=
URLExecute[req]
Out[]=
135403829
Alternatively, there is a more fine-grained function, URLRead, that allows you to learn more about the response—particularly information about error messages and other limits— but it is not necessary for our purposes yet:
In[]:=
URLRead[req]
Out[]=
HTTPResponse
|
We can change this request into a POST operation by adding the Method key to the association:
In[]:=
req=HTTPRequest[url,<|"Method""POST","Query"->{"smiles"smiles2}|>]response=URLExecute[req]
Out[]=
HTTPRequest
|
Out[]=
135403829
Observe how the URL listed in the HTTPRequest object contains the URL-encoded smiles string at the end.
POST operations also allow us to send data in a request body, in addition to sending the data as query parameters in the URL. This is especially useful for sending longer data with more complicated structure. To send this, we instead change the “Query” key to “Body” when constructing the HTTPRequest:
POST operations also allow us to send data in a request body, in addition to sending the data as query parameters in the URL. This is especially useful for sending longer data with more complicated structure. To send this, we instead change the “Query” key to “Body” when constructing the HTTPRequest:
In[]:=
req=HTTPRequest[url,<|"Method""POST","Body"->{"smiles"smiles2}|>]response=URLExecute[req]
Out[]=
HTTPRequest
|
Out[]=
135403829
Observe how the URL shown does not contain the SMILES string, and there is now a ContentType setting. The result returned is the same.
HTTP POST for multi-line structure input
HTTP POST for multi-line structure input
The HTTP POST method should be used if the input molecular structure for PUG - REST request span over multiple lines (e.g., stored in a structure - data file (SDF) format).The SDF file contains structure information of a molecule in a multi - line format, along with other data. Here we’ll download an example SDF file for this course from the web using the Import function. (Import also works for files on your local computer by providing a file path.)
In[]:=
Import["https://chem.libretexts.org/@api/deki/files/231990/lecture02_ex2b_compound1.sdf?revision=1"]
Out[]=
Molecule
|
Observe that Mathematica detects that the file is an SDF and interprets it as a Molecule. Although this can be quite useful in many cases, we cannot send a Molecule entity to the REST API. Instead, we want to import the contents of the SDF file as an (uninterrupted) “Text” format for use with the REST API:
In[]:=
mysdf=Import["https://chem.libretexts.org/@api/deki/files/231990/lecture02_ex2b_compound1.sdf?revision=1","Text"]
Out[]=
126941 -OEChem-08171915162D 55 57 0 1 0 0 0 0 0999 V2000 15.5875 -0.2771 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 12.1196 -1.2637 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 14.7273 1.2262 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 15.5759 -3.2771 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 13.8439 -3.2704 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 8.6671 1.7496 0.0000 N 0 0 3 0 0 0 0 0 0 0 0 0 12.9914 0.2329 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 6.0290 1.2424 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 6.0290 3.3117 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 4.2690 3.2771 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 3.4030 1.7771 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 4.2690 0.2771 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 2.5369 3.2771 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 13.8555 -0.2704 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0 13.8516 -1.2704 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 9.5312 1.2462 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 14.7157 -1.7738 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 7.7991 1.2529 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 11.2593 0.2396 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 12.1234 -0.2638 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 9.5273 0.2463 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 10.3991 1.7429 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 14.7234 0.2262 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 10.3914 -0.2571 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 11.2632 1.2396 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 6.9350 1.7563 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 8.6709 2.7496 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 14.7118 -2.7738 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 6.9350 2.7979 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 5.1350 1.7771 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 5.1350 2.7771 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 4.2690 1.2771 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.4030 2.7771 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 13.8579 0.3496 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 13.2414 -1.1604 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 13.6373 -1.8522 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 15.3259 -1.8838 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 14.9300 -1.1920 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 12.9938 0.8529 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 7.3988 0.7795 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 8.1958 0.7764 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 8.9892 -0.0617 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 10.4015 2.3629 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 10.3890 -0.8771 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 11.8013 1.5475 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 9.2909 2.7472 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 8.6733 3.3696 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 8.0509 2.7520 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 7.4708 3.1100 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 16.1256 0.0308 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 15.5735 -3.8971 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 3.7320 -0.0329 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 4.8059 -0.0329 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 2.5369 3.8971 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 2.0000 2.9671 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 1 23 1 0 0 0 0 1 50 1 0 0 0 0 2 20 2 0 0 0 0 3 23 2 0 0 0 0 4 28 1 0 0 0 0 4 51 1 0 0 0 0 5 28 2 0 0 0 0 6 16 1 0 0 0 0 6 18 1 0 0 0 0 6 27 1 0 0 0 0 14 7 1 6 0 0 0 7 20 1 0 0 0 0 7 39 1 0 0 0 0 8 26 1 0 0 0 0 8 30 2 0 0 0 0 9 29 1 0 0 0 0 9 31 2 0 0 0 0 10 31 1 0 0 0 0 10 33 2 0 0 0 0 11 32 2 0 0 0 0 11 33 1 0 0 0 0 12 32 1 0 0 0 0 12 52 1 0 0 0 0 12 53 1 0 0 0 0 13 33 1 0 0 0 0 13 54 1 0 0 0 0 13 55 1 0 0 0 0 14 15 1 0 0 0 0 14 23 1 0 0 0 0 14 34 1 0 0 0 0 15 17 1 0 0 0 0 15 35 1 0 0 0 0 15 36 1 0 0 0 0 16 21 2 0 0 0 0 16 22 1 0 0 0 0 17 28 1 0 0 0 0 17 37 1 0 0 0 0 17 38 1 0 0 0 0 18 26 1 0 0 0 0 18 40 1 0 0 0 0 18 41 1 0 0 0 0 19 20 1 0 0 0 0 19 24 2 0 0 0 0 19 25 1 0 0 0 0 21 24 1 0 0 0 0 21 42 1 0 0 0 0 22 25 2 0 0 0 0 22 43 1 0 0 0 0 24 44 1 0 0 0 0 25 45 1 0 0 0 0 26 29 2 0 0 0 0 27 46 1 0 0 0 0 27 47 1 0 0 0 0 27 48 1 0 0 0 0 29 49 1 0 0 0 0 30 31 1 0 0 0 0 30 32 1 0 0 0 0M END> <PUBCHEM_COMPOUND_CID>126941> <PUBCHEM_COMPOUND_CANONICALIZED>1> <PUBCHEM_CACTVS_COMPLEXITY>704> <PUBCHEM_CACTVS_HBOND_ACCEPTOR>12> <PUBCHEM_CACTVS_HBOND_DONOR>5> <PUBCHEM_CACTVS_ROTATABLE_BOND>9> <PUBCHEM_CACTVS_SUBSKEYS>AAADceB7+AAAAAAAAAAAAAAAAAAAAAAAAAA8WIAAAAAAAACx/AAAHgAQCAAADCjBnwQ/+L/IEgCoAzf3fACCgC01EqAJ2KG4dNiKaHLA3fGUZQhslgLYyae8rwCeCAAAAAAAAAAQAAAAAAAAAAAAAAAAAA==> <PUBCHEM_IUPAC_OPENEYE_NAME>(2S)-2-[[4-[(2,4-diaminopteridin-6-yl)methyl-methyl-amino]benzoyl]amino]pentanedioic acid> <PUBCHEM_IUPAC_CAS_NAME>(2S)-2-[[[4-[(2,4-diamino-6-pteridinyl)methyl-methylamino]phenyl]-oxomethyl]amino]pentanedioic acid> <PUBCHEM_IUPAC_NAME_MARKUP>(2<I>S</I>)-2-[[4-[(2,4-diaminopteridin-6-yl)methyl-methylamino]benzoyl]amino]pentanedioic acid> <PUBCHEM_IUPAC_NAME>(2S)-2-[[4-[(2,4-diaminopteridin-6-yl)methyl-methylamino]benzoyl]amino]pentanedioic acid> <PUBCHEM_IUPAC_SYSTEMATIC_NAME>(2S)-2-[[4-[[2,4-bis(azanyl)pteridin-6-yl]methyl-methyl-amino]phenyl]carbonylamino]pentanedioic acid> <PUBCHEM_IUPAC_TRADITIONAL_NAME>(2S)-2-[[4-[(2,4-diaminopteridin-6-yl)methyl-methyl-amino]benzoyl]amino]glutaric acid> <PUBCHEM_IUPAC_INCHI>InChI=1S/C20H22N8O5/c1-28(9-11-8-23-17-15(24-11)16(21)26-20(22)27-17)12-4-2-10(3-5-12)18(31)25-13(19(32)33)6-7-14(29)30/h2-5,8,13H,6-7,9H2,1H3,(H,25,31)(H,29,30)(H,32,33)(H4,21,22,23,26,27)/t13-/m0/s1> <PUBCHEM_IUPAC_INCHIKEY>FBOZXECLQNJBKD-ZDUSSCGKSA-N> <PUBCHEM_XLOGP3>-1.8> <PUBCHEM_EXACT_MASS>454.171316> <PUBCHEM_MOLECULAR_FORMULA>C20H22N8O5> <PUBCHEM_MOLECULAR_WEIGHT>454.4> <PUBCHEM_OPENEYE_CAN_SMILES>CN(CC1=CN=C2C(=N1)C(=NC(=N2)N)N)C3=CC=C(C=C3)C(=O)NC(CCC(=O)O)C(=O)O> <PUBCHEM_OPENEYE_ISO_SMILES>CN(CC1=CN=C2C(=N1)C(=NC(=N2)N)N)C3=CC=C(C=C3)C(=O)N[C@@H](CCC(=O)O)C(=O)O> <PUBCHEM_CACTVS_TPSA>211> <PUBCHEM_MONOISOTOPIC_WEIGHT>454.171316> <PUBCHEM_TOTAL_CHARGE>0> <PUBCHEM_HEAVY_ATOM_COUNT>33> <PUBCHEM_ATOM_DEF_STEREO_COUNT>1> <PUBCHEM_ATOM_UDEF_STEREO_COUNT>0> <PUBCHEM_BOND_DEF_STEREO_COUNT>0> <PUBCHEM_BOND_UDEF_STEREO_COUNT>0> <PUBCHEM_ISOTOPIC_ATOM_COUNT>0> <PUBCHEM_COMPONENT_COUNT>1> <PUBCHEM_CACTVS_TAUTO_COUNT>-1> <PUBCHEM_COORDINATE_TYPE>15255> <PUBCHEM_BONDANNOTATIONS>10 31 810 33 811 32 811 33 816 21 816 22 819 24 819 25 821 24 822 25 826 29 830 31 830 32 814 7 68 26 88 30 89 29 89 31 8$$$$
This multi-line SDF data is used as an input for a PUG - REST request through the HTTP POST:
In[]:=
url=URLBuild[{prolog,"/compound/sdf/cids/txt"}]req=HTTPRequest[url,<|"Method""POST","Body"{"sdf"mysdf}|>]URLExecute[req]
Out[]=
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/sdf/cids/txt
Out[]=
HTTPRequest
|
Out[]=
126941
HTTP POST for SDF file input
HTTP POST for SDF file input
One may want to use the structure stored in a file as the input for a PUG - REST request. As above, we will read in a file from the course website. (Alternatively, a local file can be read by specifying the path.)
In[]:=
mysdf=Import["https://chem.libretexts.org/@api/deki/files/231989/Structure2D_CID_5288826.sdf?revision=1","Text"]
Out[]=
5288826 -OEChem-08171913162D 40 44 0 1 0 0 0 0 0999 V2000 2.2314 0.0528 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 2.0000 -2.4021 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 2.0000 2.4021 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 6.1607 -0.9511 0.0000 N 0 0 3 0 0 0 0 0 0 0 0 0 3.6897 -0.4755 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0 4.5133 -0.9511 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0 5.3370 -0.4755 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0 2.8660 -0.9511 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0 4.2392 0.2219 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.6897 0.4755 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 5.3370 0.4755 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 5.5918 0.2219 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 4.5133 0.9511 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.8660 -1.9022 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0 4.5133 -1.9022 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.8660 0.9511 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.6897 -2.3777 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 6.8418 -1.6832 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 4.5133 1.9022 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.8660 1.9022 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.6897 2.3777 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 5.0597 -1.6022 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 5.6284 -1.2740 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 2.0496 -1.1875 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 4.3760 0.8266 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 3.6795 0.4887 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 5.9476 0.3679 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 5.5490 1.0581 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 6.1840 0.4057 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 5.4989 0.8349 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 2.8660 -2.5222 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 5.0503 -2.2122 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 3.6897 -2.9977 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 6.3879 -2.1055 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 7.2641 -2.1371 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 7.2957 -1.2609 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 5.0503 2.2122 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 3.6897 2.9977 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 2.0000 -3.0222 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 2.0000 3.0222 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 1 8 1 0 0 0 0 1 16 1 0 0 0 0 14 2 1 6 0 0 0 2 39 1 0 0 0 0 3 20 1 0 0 0 0 3 40 1 0 0 0 0 4 7 1 0 0 0 0 4 12 1 0 0 0 0 4 18 1 0 0 0 0 5 6 1 0 0 0 0 5 8 1 0 0 0 0 5 9 1 1 0 0 0 5 10 1 0 0 0 0 6 7 1 0 0 0 0 6 15 1 0 0 0 0 6 22 1 1 0 0 0 7 11 1 0 0 0 0 7 23 1 6 0 0 0 8 14 1 0 0 0 0 8 24 1 1 0 0 0 9 12 1 0 0 0 0 9 25 1 0 0 0 0 9 26 1 0 0 0 0 10 13 2 0 0 0 0 10 16 1 0 0 0 0 11 13 1 0 0 0 0 11 27 1 0 0 0 0 11 28 1 0 0 0 0 12 29 1 0 0 0 0 12 30 1 0 0 0 0 13 19 1 0 0 0 0 14 17 1 0 0 0 0 14 31 1 0 0 0 0 15 17 2 0 0 0 0 15 32 1 0 0 0 0 16 20 2 0 0 0 0 17 33 1 0 0 0 0 18 34 1 0 0 0 0 18 35 1 0 0 0 0 18 36 1 0 0 0 0 19 21 2 0 0 0 0 19 37 1 0 0 0 0 20 21 1 0 0 0 0 21 38 1 0 0 0 0M END> <PUBCHEM_COMPOUND_CID>5288826> <PUBCHEM_COMPOUND_CANONICALIZED>1> <PUBCHEM_CACTVS_COMPLEXITY>494> <PUBCHEM_CACTVS_HBOND_ACCEPTOR>4> <PUBCHEM_CACTVS_HBOND_DONOR>2> <PUBCHEM_CACTVS_ROTATABLE_BOND>0> <PUBCHEM_CACTVS_SUBSKEYS>AAADceB6MAAAAAAAAAAAAAAAAAAAASAAAAA8YIEAAAAWAEjBAAAAHgAACAAADzzhmAYyBoMABgCAAiBCAAACCAAgIAAIiAAOiIgNNiKGsRuGeCOkwBGLuAew8PcPoAABAAAYQADQAAaAADSAAAAAAAAAAA==> <PUBCHEM_IUPAC_OPENEYE_NAME>(4R,4aR,7S,7aR,12bS)-3-methyl-2,4,4a,7,7a,13-hexahydro-1H-4,12-methanobenzofuro[3,2-e]isoquinoline-7,9-diol> <PUBCHEM_IUPAC_CAS_NAME>(4R,4aR,7S,7aR,12bS)-3-methyl-2,4,4a,7,7a,13-hexahydro-1H-4,12-methanobenzofuro[3,2-e]isoquinoline-7,9-diol> <PUBCHEM_IUPAC_NAME_MARKUP>(4<I>R</I>,4<I>a</I><I>R</I>,7<I>S</I>,7<I>a</I><I>R</I>,12<I>b</I><I>S</I>)-3-methyl-2,4,4<I>a</I>,7,7<I>a</I>,13-hexahydro-1<I>H</I>-4,12-methanobenzofuro[3,2-e]isoquinoline-7,9-diol> <PUBCHEM_IUPAC_NAME>(4R,4aR,7S,7aR,12bS)-3-methyl-2,4,4a,7,7a,13-hexahydro-1H-4,12-methanobenzofuro[3,2-e]isoquinoline-7,9-diol> <PUBCHEM_IUPAC_SYSTEMATIC_NAME>(4R,4aR,7S,7aR,12bS)-3-methyl-2,4,4a,7,7a,13-hexahydro-1H-4,12-methanobenzofuro[3,2-e]isoquinoline-7,9-diol> <PUBCHEM_IUPAC_TRADITIONAL_NAME>(4R,4aR,7S,7aR,12bS)-3-methyl-2,4,4a,7,7a,13-hexahydro-1H-4,12-methanobenzofuro[3,2-e]isoquinoline-7,9-diol> <PUBCHEM_IUPAC_INCHI>InChI=1S/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10-11,13,16,19-20H,6-8H2,1H3/t10-,11+,13-,16-,17-/m0/s1> <PUBCHEM_IUPAC_INCHIKEY>BQJCRHHNABKAKU-KBQPJGBKSA-N> <PUBCHEM_XLOGP3>0.8> <PUBCHEM_EXACT_MASS>285.136493> <PUBCHEM_MOLECULAR_FORMULA>C17H19NO3> <PUBCHEM_MOLECULAR_WEIGHT>285.34> <PUBCHEM_OPENEYE_CAN_SMILES>CN1CCC23C4C1CC5=C2C(=C(C=C5)O)OC3C(C=C4)O> <PUBCHEM_OPENEYE_ISO_SMILES>CN1CC[C@]23[C@@H]4[C@H]1CC5=C2C(=C(C=C5)O)O[C@H]3[C@H](C=C4)O> <PUBCHEM_CACTVS_TPSA>52.9> <PUBCHEM_MONOISOTOPIC_WEIGHT>285.136493> <PUBCHEM_TOTAL_CHARGE>0> <PUBCHEM_HEAVY_ATOM_COUNT>21> <PUBCHEM_ATOM_DEF_STEREO_COUNT>5> <PUBCHEM_ATOM_UDEF_STEREO_COUNT>0> <PUBCHEM_BOND_DEF_STEREO_COUNT>0> <PUBCHEM_BOND_UDEF_STEREO_COUNT>0> <PUBCHEM_ISOTOPIC_ATOM_COUNT>0> <PUBCHEM_COMPONENT_COUNT>1> <PUBCHEM_CACTVS_TAUTO_COUNT>-1> <PUBCHEM_COORDINATE_TYPE>15255> <PUBCHEM_BONDANNOTATIONS>10 13 810 16 813 19 816 20 819 21 814 2 620 21 85 9 56 22 57 23 68 24 5$$$$
Now the structure stored in the variable “mysdf” can be used in a PUG-REST request through HTTP-POST. For example, the code cell below shows how to retrieve various names (also called “synonyms”) of the input structure:
In[]:=
url=URLBuild[{prolog,"/compound/sdf/synonyms/txt"}]req=HTTPRequest[url,<|"Method""POST","Body"{"sdf"mysdf}|>]res=URLExecute[req]
Out[]=
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/sdf/synonyms/txt
Out[]=
HTTPRequest
|
Out[]=
morphineMorphiaMorphinumMorphiumMorphin(-)-MorphineMorphinaDepoDurDuromorphMeconiumMorphinismMoscontinOspalivinaMS Continl-Morphine57-27-2DulcontinMorfinaRoxanolMORPHINE SULFATEInfumorphNepentheDreamerMorphoAvinzaHocusKadianUnkieCube juiceHard stuffOramorph SRStatex SRMs EmmaMorphin [German]Morfina [Italian]DuramorphMorphina [Italian]M-EslonMorphine [BAN]CCRIS 5762HSDB 2134(5R,6S,9R,13S,14R)-4,5-Epoxy-N-methyl-7-morphinen-3,6-diolUNII-76I7G6D29CCHEBI:17303CHEMBL70EINECS 200-320-2(5alpha,6alpha)-17-methyl-7,8-didehydro-4,5-epoxymorphinan-3,6-diol4,5alpha-Epoxy-17-methyl-7-morphinen-3,6alpha-diol7,8-Didehydro-4,5-epoxy-17-methyl-morphinan-3,6-diol(7R,7AS,12BS)-3-METHYL-2,3,4,4A,7,7A-HEXAHYDRO-1H-4,12-METHANO[1]BENZOFURO[3,2-E]ISOQUINOLINE-7,9-DIOLDEA No. 9300Morphine Anhydrate76I7G6D29CMorphine (BAN)RMS(5alpha,6alpha)-Didehydro-4,5-epoxy-17-methylmorphinan-3,6-diolMorphinan-3,6-alpha-diol, 7,8-didehydro-4,5-alpha-epoxy-17-methyl-Morphinan-3,6-diol, 7,8-didehydro-4,5-epoxy-17-methyl-, (5alpha,6alpha)-9H-9,9c-Iminoethanophenanthro(4,5-bcd)furan-3,5-diol, 4a,5,7a,8-tetrahydro-12-methyl-methyl[?]diolAguettantDinamorfSevredolDimorf(5alpha,6alpha)-7,8-Didehydro-4,5-epoxy-17-methylmorphinan-3,6-diol(4R,4aR,7S,7aR,12bS)-3-methyl-2,4,4a,7,7a,13-hexahydro-1H-4,12-methanobenzofuro[3,2-e]isoquinoline-7,9-diolD-(-)-MorphineDolcontinOramorph(Morphine)Anhydrous morphineSubstitol (TN)Morphine (MOR)MOR(-)-(etorphine)(-)Morphine sulfateMorfina Dosa (TN)SDZ202-250NSC11441SDZ 202-250Epitope ID:116646Morphinan-3,6-diol, 7,8-didehydro-4,5-epoxy-17-methyl- (5alpha,6alpha)-SCHEMBL2997BIDD:GT0147GTPL1627IDS-NM-009DTXSID9023336Morphine 0.1 mg/ml in MethanolMorphine 1.0 mg/ml in MethanolN02AA01ZINC3812983BDBM50000092AKOS015966554DB00295LS-91748MOIC01516D08233Q812257,8-Didehydro-4,5-epoxy-17-methylmorphinan-3,6-diolUNII-1M5VY6ITRT component BQJCRHHNABKAKU-KBQPJGBKSA-N17-methyl-7,8-didehydro-4,5alpha-epoxymorphinan-3,6alpha-diol7,8-Didehydro-4,5-epoxy-17-methylmorphinan-3,6-diol(morphine)(5alpha,6beta)-17-methyl-7,8-didehydro-4,5-epoxymorphinan-3,6-diol3-(4-Hydroxy-phenyl)-1-propyl-piperidine-3-carboxylic acid ethyl ester6-tert-Butyl-3-methyl-1,2,3,4,5,6-hexahydro-2,6-methano-benzo[d]azocine(-)(5.alpha.,6.alpha.)-7,8-Didehydro-4,5-epoxy-17-methylmorphinan-3,6-diolMorphinan-3,6-diol, 7,8-didehydro-4,5-epoxy-17-methyl- (5..alpha.,6.alpha.)-Morphine solution, 1.0 mg/mL in methanol, ampule of 1 mL, certified reference material(1S,5R,13R,14S)-12-oxa-4-azapentacyclo[9.6.1.01,13.05,17.07,18]octadeca-7(18),8,10,15-tetraene-10,14-diol(1S,5R,13R,14S,17R)-4-methyl-12-oxa-4-azapentacyclo[9.6.1.0^{1,13}.0^{5,17}.0^{7,18}]octadeca-7(18),8,10,15-tetraene-10,14-diol(1S,5R,13R,14S,17R)-4-methyl-12-oxa-4-azapentacyclo[9.6.1.0^{1,13}.0^{5,17}.0^{7,18}]octadeca-7,9,11(18),15-tetraene-10,14-diol(morphine) 4-methyl-(1S,5R,13R,14S,17R)-12-oxa-4-azapentacyclo[9.6.1.01,13.05,17.07,18]octadeca-7(18),8,10,15-tetraene-10,14-diol2-{4-[2,4-diamino-6-pteridinylmethyl(methyl)amino]phenylcarboxamido}pentanedioic acid(morphine)4-methyl-(1S,5R,13R,14S,17R)-12-oxa-4-azapentacyclo[9.6.1.01,13.05,17.07,18]octadeca-7(18),8,10,15-tetraene-10,14-diol4-methyl-(1S,5R,13R,14S,17R)-12-oxa-4-azapentacyclo[9.6.1.01,13.05,17.07,18]octadeca-7(18),8,10,15-tetraene-10,14-diol ; HydroChloride4-methyl-(1S,5R,13R,14S,17R)-12-oxa-4-azapentacyclo[9.6.1.01,13.05,17.07,18]octadeca-7(18),8,10,15-tetraene-10,14-diol ;sulphate salt(morphine)4-methyl-(1S,5R,13R,14S,17R)-12-oxa-4-azapentacyclo[9.6.1.01,13.05,17.07,18]octadeca-7(18),8,10,15-tetraene-10,14-diol((Morphine))4-methyl-(1S,5R,13R,14S,17R)-12-oxa-4-azapentacyclo[9.6.1.01,13.05,17.07,18]octadeca-7(18),8,10,15-tetraene-10,14-diol(morphine sulfate)4-methyl-(1S,5R,13R,14S,17R)-12-oxa-4-azapentacyclo[9.6.1.01,13.05,17.07,18]octadeca-7(18),8,10,15-tetraene-10,14-diol(morphine)4-methyl-(1S,5R,13R,14S,17R)-12-oxa-4-azapentacyclo[9.6.1.01,13.05,17.07,18]octadeca-7(18),8,10,15-tetraene-10,14-diol(Morphine)(HCl)4-methyl-(1S,5R,13R,14S,17R)-12-oxa-4-azapentacyclo[9.6.1.01,13.05,17.07,18]octadeca-7(18),8,10,15-tetraene-10,14-diol,sulfate(Morphinesulfate)4-methyl-(1S,5R,13R,14S,17R)-12-oxa-4-azapentacyclo[9.6.1.01,13.05,17.07,18]octadeca-7(18),8,10,15-tetraene-10,14-diolMorphine4-methyl-12-oxa-4-azapentacyclo[9.6.1.01,13.05,17.07,18]octadeca-7(18),8,10,15-tetraene-10,14-diol4-methyl-12-oxa-4-azapentacyclo[9.6.1.01,13.05,17.07,18]octadeca-7(18),8,10,15-tetraene-10,14-diol (morphine)4-methyl-12-oxa-4-azapentacyclo[9.6.1.01,13.05,17.07,18]octadeca-7(18),8,10,15-tetraene-10,14-diol(Morphine)6,11-Dimethyl-3-(3-methyl-but-2-enyl)-1,2,3,4,5,6-hexahydro-2,6-methano-benzo[d]azocin-8-ol(Morphine)9H-9,9c-Iminoethanophenanthro(4,5-bcd)furan-3,5-diol, 4alpha,5,7alpha,8-tetrahydro-12-methyl-MORPHINE; (5A,6A)-7,8-DIDEHYDRO-4,5-EPOXY-17-METHYLMORPHINIAN-3,6-DIOL; MORPHIUM; MORPHIA; DOLCONTIN; DUROMORPH; MORPHINA; NEPENTHEMorphine;4-methyl-(1S,5R,13R,14S,17R)-12-oxa-4-azapentacyclo[9.6.1.01,13.05,17.07,18]octadeca-7(18),8,10,15-tetraene-10,14-diol
Exercise:
Exercise:
Exercise 2a Retrieve (in the CSV format) the XlogP, molecular weight, hydrogen bond donor count, hydrogen bond acceptor count, and TPSA of the compounds contained in the five SDF files below, which can be downloaded from the Chapter 2 Assignments page.
◼
Write a function to construct the complete URL for each of the file names stored in the ‘files’ list below. For example, the first one should be: https://chem.libretexts.org/@api/deki/files/231990/lecture02_ex2b_compound1.sdf?revision=1
◼
◼
Write a function to perform the query.
◼
◼
Better: Use the MapBatched function introduced in Assignment 1 to insert a Pause of 0.2 seconds between each (one) entry. You will need to refer to the MapBatched documentation online to set the action and batch size rather than use the defaults.
In[]:=
files={"lecture02_ex2b_compound1.sdf","lecture02_ex2b_compound2.sdf","lecture02_ex2b_compound3.sdf","lecture02_ex2b_compound4.sdf","lecture02_ex2b_compound5.sdf"}
Out[]=
{lecture02_ex2b_compound1.sdf,lecture02_ex2b_compound2.sdf,lecture02_ex2b_compound3.sdf,lecture02_ex2b_compound4.sdf,lecture02_ex2b_compound5.sdf}
In[]:=
(*Writeyourcodeinthiscell.*)
Attributions
Attributions
Adapted from the corresponding OLCC 2019 Python Assignment:
https://chem.libretexts.org/Courses/Intercollegiate_Courses/Cheminformatics_OLCC_ (2019)/2._Representing _Small _Molecules _on _Computers/2.7%3 A_Python _Assignment/Python_Assignment_ 2A
https://chem.libretexts.org/Courses/Intercollegiate_Courses/Cheminformatics_OLCC_ (2019)/2._Representing _Small _Molecules _on _Computers/2.7%3 A_Python _Assignment/Python_Assignment_ 2A


Cite this as: Joshua Schrier, "02A Chemical Structure Inputs for PUG-REST" from the Notebook Archive (2020), https://notebookarchive.org/2020-10-ebntr4w

Download

