Predicting Polymer Solubility in Solvents Using SMILES Strings

Summary

This paper introduces a deep learning framework for accurately predicting polymer solubility (wt%) in various solvents directly from their SMILES representations. Utilizing a dataset of over 8,000 simulated polymer-solvent pairs and validating with experimental data, the model demonstrates strong agreement and generalizability, offering a scalable solution for high-throughput solvent screening.

Medical Relevance

Predicting polymer solubility is crucial for pharmaceutical formulation, where polymers are used as excipients, binders, coatings, and drug delivery vehicles. This method can accelerate the design and optimization of drug formulations, ensuring desired drug release profiles, stability, and manufacturing processes.

AI Health Application

The AI model predicts polymer solubility, which is crucial for optimizing the design and development of pharmaceutical formulations. This includes selecting appropriate polymers for drug encapsulation, controlled release mechanisms, and improving drug stability and bioavailability. It enables high-throughput screening of solvents for manufacturing pharmaceuticals and designing new drug delivery systems with tailored properties.

Key Points

  • A deep learning framework was developed to predict polymer solubility (wt%) using SMILES strings as input for both polymers and solvents.
  • The model leverages a 2,394-feature representation per sample, combining molecular descriptors and fingerprints derived from SMILES strings.
  • Training was conducted on a dataset of 8,049 polymer-solvent pairs at 25°C, generated from calibrated molecular dynamics simulations (Zhou et al., 2023).
  • A fully connected neural network with six hidden layers, optimized using Adam and evaluated with mean squared error loss, achieved strong agreement with actual solubility values.
  • Generalizability was confirmed by maintaining high accuracy on 25 unseen experimental polymer-solvent combinations from the Materials Genome Project.
  • The findings support the viability of SMILES-based machine learning for scalable and high-throughput solubility prediction.
  • This approach has direct implications for applications in pharmaceutical formulation, green chemistry, polymer processing, and advanced materials design.

Methodology

The study employed a deep learning approach, converting SMILES strings of polymers and solvents into a 2,394-feature representation using molecular descriptors and fingerprints. A fully connected neural network with six hidden layers was trained on 8,049 simulated polymer-solvent pairs (at 25°C) from calibrated molecular dynamics. Model training utilized the Adam optimizer with mean squared error (MSE) as the loss function. Performance was validated against unseen experimental data to assess generalizability.

Key Findings

The deep learning model achieved strong agreement between predicted and actual polymer solubility values on the extensive simulated dataset. Crucially, it demonstrated high accuracy when tested on an independent set of 25 experimentally measured polymer-solvent combinations, affirming its generalizability. This validates the effectiveness of SMILES-based machine learning for accurate and scalable polymer solubility prediction.

Clinical Impact

This predictive framework can significantly accelerate the development of new pharmaceutical products by rapidly screening potential polymer-solvent combinations, optimizing drug formulation processes, and designing advanced drug delivery systems (e.g., controlled-release matrices, nanoparticles). It reduces the need for extensive experimental trial-and-error, potentially leading to faster market entry for new drugs and safer, more effective medical devices by informing the selection of biocompatible materials and their processing solvents.

Limitations

While demonstrating strong generalizability, the experimental validation set size (25 samples) is relatively small compared to the training data. The model's current scope is limited to 25°C, suggesting potential limitations for applications requiring solubility predictions at varying temperatures or pressures, which are common in pharmaceutical processing and *in vivo* environments.

Future Directions

Future work could involve expanding the dataset with a wider range of experimentally measured polymer-solvent interactions under diverse conditions (e.g., varying temperature, pressure, pH). Further research might explore incorporating more complex deep learning architectures or integrating this predictive capability with automated high-throughput experimental platforms for closed-loop materials discovery and optimization in pharmaceutical and biomedical applications.

Medical Domains

Pharmaceuticals Drug Delivery Biomedical Engineering Pharmacology Materials Science (applied to medicine)

Keywords

Polymer solubility SMILES strings Deep learning Machine learning Pharmaceutical formulation Drug delivery Solvent screening Materials informatics

Abstract

Understanding and predicting polymer solubility in various solvents is critical for applications ranging from recycling to pharmaceutical formulation. This work presents a deep learning framework that predicts polymer solubility, expressed as weight percent (wt%), directly from SMILES representations of both polymers and solvents. A dataset of 8,049 polymer solvent pairs at 25 deg C was constructed from calibrated molecular dynamics simulations (Zhou et al., 2023), and molecular descriptors and fingerprints were combined into a 2,394 feature representation per sample. A fully connected neural network with six hidden layers was trained using the Adam optimizer and evaluated using mean squared error loss, achieving strong agreement between predicted and actual solubility values. Generalizability was demonstrated using experimentally measured data from the Materials Genome Project, where the model maintained high accuracy on 25 unseen polymer solvent combinations. These findings highlight the viability of SMILES based machine learning models for scalable solubility prediction and high-throughput solvent screening, supporting applications in green chemistry, polymer processing, and materials design.