Registration Dossier

Administrative data

water solubility
Type of information:
Adequacy of study:
key study
2 (reliable with restrictions)
Rationale for reliability incl. deficiencies:
results derived from a valid (Q)SAR model and falling into its applicability domain, with adequate and reliable documentation / justification
Justification for type of information:
The program WSKOWWIN v1.42 estimates the water solubility (WSol) of an organic compound using the compounds log octanol-water partition coefficient (Kow). WSKOWWIN requires only a chemical structure to estimate Wsol. Structures are entered into WSKOWWIN through SMILES (Simplified Molecular Input Line Entry System) notations.

2. MODEL (incl. version number)

CAS: 97-67-6

[Explain how the model fulfils the OECD principles for (Q)SAR model validation. Consider attaching the QMRF or providing a link]

- Defined endpoint:
Water solubility of organic compounds at 25°C

- Unambiguous algorithm:
A complete description of the estimation methodology used by WSKOWWIN is available in two documents prepared for the U.S. Environmental Protection Agency, Office of Pollution Prevention and Toxics (Meylan and Howard, 1994a,b).
A journal article that describes the methodology is also available (Meylan et al., 1996). The WSKOWWIN program estimates the water solubility of an organic compound using the compounds log octanol-water partition coefficient (log Kow)

Data Collection
A database of more than 8400 compounds with reliably measured log Kow values had already been compiled from available sources. Most experimental values were taken from a "star-list" compilation of Hansch and Leo (1985) that had already been critically evaluated (see also Hansch et al, 1995) or an extensive compilation by Sangster (1993) that includes many "recommended" values based upon critical evaluation. Other log Kow values were taken from sources located through the Environmental Fate Data Base (EFDB) system (Howard et al, 1982, 1986). A few values were taken from Section 4a, 8d, and 8e submissions the to U.S. EPA under the Toxic Substances Control Act (see
Water solubilities were collected from the AQUASOL dATAbASETM of the University of Arizona (Yalkowsky and Dannenfelser, 1990), Syracuse Research Corporation's PHYSPROP© Database (SRC,1994), and sources located through the Environmental Fate Data Base (EFDB) system (Howard et al, 1982, 1986). Water solubilities were primarily constrained to the 20-25oC temperature range with 25oC being preferred.
Melting points were collected from sources such as AQUASOL dATAbASETM, PHYSPROP©, and EDFB as well as the Handbook of Chemistry and Physics (Lide, 1990) and the Aldrich Catalog (Aldrich, 1992).
Regression & Results
A dataset of 1450 compounds (941 solids, 509 liquids) having reliably measured water solubility, log Kow and melting point was used as the training set for developing the new estimation algorithms for water solubility. Standard linear regressions were used to fit water solubility (as log S) with log Kow, melting point and molecular weight.

Residual errors from the initial regression fit were examined for compounds sharing common structural features with relatively consistent errors. On that basis, 12 compound classes were initially identified and added to the regression to comprise a multi-linear regression including log Kow, melting point and/or molecular weight plus 12 correction factors. Each correction factor is counted a maximum of once per structure [if applicable], no matter how many times the applicable fragment occurs. For example, the nitro factor in 1,4-dinitrobenzene is counted just once. A compound either contains a correction factor or it doesn't; therefore, the matrix for the multi-linear regression contained either a 0 or 1 for each correction factor. Appendix E describes the correction factors and coefficients used by WSKOWWIN.
WSKOWWIN estimates water solubility for any compound with one of two possible equations. The equations are equations 19 and 20 from Meylan and Howard (1994a) or equations 11 and 12 from the journal article (Meylan et al., 1996). The equations are:
log S (mol/L) = 0.796 - 0.854 log Kow - 0.00728 MW + ΣCorrections
log S (mol/L) = 0.693 - 0.96 log Kow - 0.0092(Tm-25) - 0.00314 MW + ΣCorrections

(where MW is molecular weight, Tm is melting point (MP) in deg C [used only for solids]) ... Summation of Corrections (ΣCorrections) are applied as described in Appendix E. When a measured MP is available, that equation is used; otherwise, the equation with just MW is used.

- Defined domain of applicability:
Currently there is no universally accepted definition of model domain. However, users may wish to consider the possibility that water solubility estimates are less accurate for compounds outside the MW range, water solubility range and log Kow range of the training set compounds. It is also possible that a compound may have a functional group(s) or other structural features not represented in the training set, and for which no correction factor was developed. These points should be taken into consideration when interpreting model results.

Range of water solubilities in the Training set:
Minimum = 4 x 10-7 mg/L (octachlorodibenzo-p-dioxin)
Maximum = completely soluble (various)

Range of Molecular Weights in the Training set:
Minimum = 27.03 (hydrocyanic acid)
Maximum = 627.62 (hexabromobiphenyl)

Range of Log Kow values in the Training set:
Minimum = -3.89 (aspartic acid)
Maximum = 8.27 (decachlorobiphenyl)

- Appropriate measures of goodness-of-fit and robustness and predictivity:
The regression equations used by the WSKOWWIN program were trained with a dataset of 1450 compounds.

Training set statistics:
N = 1450 compounds
correlation coefficient R2= 0.970
standard deviation = 0.409
average deviation = 0.313

As noted above WSKOWWIN estimates water solubility with one of two possible equations. When an experimental melting point is available, WSKOWWIN applies the equation containing both a melting point and the molecular weight (MW) parameters. In the absence of a melting point, the equation containing just the molecular weight is used to make the estimate. All compounds in the 1450 compound training set have known melting points or are known to be liquids at 25°C. The accuracy statistics for the two equations are as follows:

Melt Pt + MW MW only
r2 0.970 0.934
std deviation 0.409 0.585
avg deviation 0.313 0.442

Training set estimation error:
within ≤ 0.20 – 42.0%
within ≤ 0.40 – 69.5%
within ≤ 0.50 – 79.1%
within ≤ 0.60 – 86.0%
within ≤ 0.80 – 93.9%
within ≤ 1.00 – 97.4%

The WSKOWWIN estimation equations were initially validated on two datasets of compounds that were not included in the model training. A relatively small dataset was tested that consisted of 85 compounds having experimental log Kow values, but no available melting points. Many compounds in the 85 compound test set decompose before melting and would theoretically have very high melting points (e.g. amino acids and compounds having multiple nitrogens).

The accuracy statistics for the equation used by WSKOWWIN are:
number 85
r2 0.865
std deviation 0.961
avg deviation0.714

A much larger dataset of 817 compounds was also tested. All 817 compounds had experimental melting points, but none of the 817 compounds had a reliable experimental log Kow. The log Kow values used for the validation-testing were estimated (primarily using the KOWWIN program available at that time); therefore, the water solubility estimates are based on estimates for log Kow. Typically, estimates based on estimates reduce estimation accuracy, but this type of validation can provide insight into the ability of the method.

The accuracy statistics for this dataset are:
number 817
r2 0.902
std deviation 0.615
avg deviation0.480

- Mechanistic interpretation:
There is no overt mechanistic basis for the model.

Meylan, W.M. and P.H. Howard. 1994a. Upgrade of PCGEMS Water Solubility Estimation Method (May 1994 Draft). prepared for Robert S. Boethling, U.S. Environmental Protection Agency, Office of Pollution Prevention and Toxics, Washington, DC; prepared by Syracuse Research Corporation, Environmental Science Center, Syracuse, NY 13210.
Meylan, W.M. and P.H. Howard. 1994b. Validation of Water Solubility Estimation Methods Using Log Kow for Application in PCGEMS & EPI (Sept 1994, Final Report). prepared for Robert S. Boethling, U.S. Environmental Protection Agency, Office of Pollution Prevention and Toxics, Washington, DC; prepared by Syracuse Research Corporation, Environmental Science Center, Syracuse, NY 13210.
Meylan, W.M. P.H. Howard and R.S. Boethling. 1996. Improved method for estimating water solubility from octanol/water partition coefficient. Environ. Toxicol. Chem. 15: 100-106.

- Descriptor domain:
The substance has a molecular weight of 134.09 and is therefore in the molecular weight range of the compounds in the training set (between 27 and 627).
The substance has a predicted logKow of -1.68 and is therefore in the logKow range of the compounds in the training set (between-3.89 and 8.27).
The predicted water solubility is in the range of the compounds in the training set (between 4 x 10-7 mg/L and completely soluble).

- Structural and mechanistic domains:
All functional group(s) or structural features are represented in the training set.

- Other considerations (as appropriate):
For the isomeric compounds DL-malic acid (CAS 6915-15-7) as well as D-malic acid (CAS 636-61-3) the same prediction applies. For these substances experimental data is available. The experimental water solubility of DL-malic acid is 1E+006 mg/L (20 deg C) and the experimental water solubility of D-malic acid is 3.64E+005 mg/L (20 deg C) supporting the use of the in silico model for the substance L-malic acid.

The substance fits in the applicability domain of the model. The prediction is valid and can be used for classification and risk assessment.

Data source

Reference Type:
study report
Report Date:

Materials and methods

Test guideline
other: QSAR

Test material

Test material form:

Results and discussion

Water solubility
Water solubility:
1 000 000 mg/L
Conc. based on:
test mat.
25 °C
Remarks on result:
other: QSAR

Applicant's summary and conclusion