How To Use The RSGC Program
The RSGC Program is designed to be versatile and be added to your existing python scripts if you want to couple the RSGC program with other methods and protocols.
For a general use case, this is the python script you might want to set up to run the RSGC program on your existing crystals.
Run_RSGC.py
python script
Run_RSGC.py |
---|
| """
Run_RSGC.py, Geoffrey Weal, 12/4/24
This script will allow you to run the Remove SideGroups from Crystals (RSGC) program upon the crystals of interest.
"""
import os, shutil
from RSGC import RSGC, Hydrogen_in_Ring_Exception
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# PART I: Get the names of the folders holding the crystals you want to remove sidegroups from
# First, give the name of the folder that contains the crystal database you want to remove sidegroups from here.
crystal_database_dirname = 'crystal_database'
# Second, give the name of the folder that contains repaired crystals obtained from the ReCrystals program.
repaired_crystal_database_dirname = None # f'repaired_{crystal_database_dirname}'
# Third, give the identifiers of the crystals you do not want to remove sidegroups from but may be in the databases here
exclude_identifiers = ['ECIGUV']
exclude_identifiers += ['XEZCOX', 'XEZDAK']
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Part II: Determine settings for running the RSGC program.
# Fourth, Determine if you want any saturated aliphatic groups to be replaced with
# * Ethyl group (set this to True), or
# * Methyl group (set this to False).
leave_as_ethyls = True
# Fifth, determine if you also want to save the molecules from the crystals individually,
# as well as save the full crystal.
save_molecules_individually = True
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Part III: Check that the folders of the databases exist.
# Sixth, check that the crystal database you gave exists.
if not os.path.exists(crystal_database_dirname):
raise Exception(f'Error: {crystal_database_dirname} does not exist in {os.getcwd()}')
# Seventh, check that the crystal database holding reparred crystals you gave exists.
if (repaired_crystal_database_dirname is not None) and (not os.path.exists(repaired_crystal_database_dirname)):
raise Exception(f'Error: {crystal_database_dirname} does not exist in {os.getcwd()}')
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Part VI: Get the paths to the crystals you want to remove sidegroups from.
# Eighth, get the names of the files contained in the crystal database folder.
crystal_database_filenames = sorted(os.listdir(crystal_database_dirname))
# Ninth, get the names of the files contained in the repaired crystal database folder.
repaired_crystal_database_filenames = sorted(os.listdir(repaired_crystal_database_dirname)) if (repaired_crystal_database_dirname is not None) else []
# Tenth, initalise the list to hold all the paths to the crystals to remove sidegroups from.
filepath_names = []
# Eleventh, obtain all the paths to the crystals to remove sidegroups from.
for crystal_database_filename in crystal_database_filenames:
# 11.1: Make sure that the file ends with ".xyz""
if not crystal_database_filename.endswith('.xyz'):
continue
# 11.2: Get the name of the identifier for this crystal.
crystal_identifier = crystal_database_filename.replace('.xyz','')
# 11.3: If the crystal is in the exclude_identifiers list, don't process it.
if crystal_identifier in exclude_identifiers:
continue
# 11.4: Check if the crystal is in the repaired crystal database folder.
# * If it is, take the crystal from the repaired folder rather than the original folder.
crystal_folder_name = repaired_crystal_database_dirname if (crystal_database_filename in repaired_crystal_database_filenames) else crystal_database_dirname
# 11.5: Add the path to the crystal file to the filepath_names list.
filepath_names.append(crystal_folder_name+'/'+crystal_database_filename)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# PART V: Reset RSGC files from previous RSGC runs
# Twelfth, reset the RSGC files.
# 12.1: Remove the folder that we will place crystals in that we will remove sidegroups from.
crystal_database_with_removed_sidegroups_folder_name = f'{crystal_database_dirname}_with_removed_sidegroups'
if os.path.exists(crystal_database_with_removed_sidegroups_folder_name):
shutil.rmtree(crystal_database_with_removed_sidegroups_folder_name)
# 12.2: Remove the file indicating what issues were found when running the RSGC program.
if os.path.exists('RSGC_issues.txt'):
os.remove('RSGC_issues.txt')
# 12.3: Remove the file containing which rings contain hydrogens in them when running the RSGC program.
if os.path.exists('Rings_with_hydrogens_in_them.txt'):
os.remove('Rings_with_hydrogens_in_them.txt')
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# PART VI: Run the RSGC program on the crystals you want to remove sidegroups from
# Thirteenth, run the RSGC program on the crystals you want to remove sidegroups from
# 13.1: Obtain the total number of crystal you want to process with the RSGC program.
total_no_of_crystals = str(len(filepath_names))
# 13.2: Set a counter to record successful RSGC executions.
successful = 0
# 13.3: For each crystal in the filepath_names list.
for counter, filepath in enumerate(filepath_names, start=0):
# 13.4: Print to screen how many crystals have been processed by the RSGC program.
print('Running crystal: '+str(counter)+' out of '+total_no_of_crystals)
# 13.5: Run the RSGC program.
try:
# 13.5.1: Run the RSGC program.
RSGC(filepath, leave_as_ethyls=leave_as_ethyls, save_molecules_individually=save_molecules_individually)
# 13.5.2: Record the successful result.
successful += 1
except Hydrogen_in_Ring_Exception as exception_message:
# 13.5.3: If there was an issue with the RSGC program, write the issue in the 'RSGC_issues.txt' file
with open('RSGC_issues.txt','a+') as issuesTXT:
issuesTXT.write(filepath+': '+str(exception_message)+'\n')
# 13.6: Report the number of successful executions.
print('========================')
print('Number of successfuls: '+str(successful))
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
Below we describe the Run_RSGC.py
python script above so you can understand how it works.
Part I: Get the names of the crystals you want to remove sidegroups from
In the first section, give the name of the folder that contains the crystals you want to remove sidegroups from in the crystal_database_dirname
variable. Change these variables to the names of your files and folders. The names of the files and folders you need to give here are:
- Also give the name of the folder containing the crystals that were repaired during the
RSGC
program in the repaired_crystal_database_dirname
variable.
- If you don't want to give a
repaired_crystal_database_dirname
variable, set it to None
. (i.e: repaired_crystal_database_dirname = None
).
- If there are any crystal you are having problems with and you want to exclude them in the meantime, put them in the
exclude_identifiers
list.
An example of the code for PART I
is shown below:
Part I of Run_RSGC.py: Get the names of the crystals you want to remove sidegroups from |
---|
| # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# PART I: Get the names of the folders holding the crystals you want to remove sidegroups from
# First, give the name of the folder that contains the crystal database you want to remove sidegroups from here.
crystal_database_dirname = 'crystal_database'
# Second, give the name of the folder that contains repaired crystals obtained from the ReCrystals program.
repaired_crystal_database_dirname = None # f'repaired_{crystal_database_dirname}'
# Third, give the identifiers of the crystals you do not want to remove sidegroups from but may be in the databases here
exclude_identifiers = ['ECIGUV']
exclude_identifiers += ['XEZCOX', 'XEZDAK']
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
Part II: Determine settings
In the second section, determine the settings for running the RGSC program. Change these to the settings you desire. The variable settings you can set here are:
leave_as_ethyls
(bool.): This indicates if you want to change any saturated aliphatic sidechain into ethyl groups, in which case set this to True
. If you would want change these saturated aliphatic sidechain into methyl groups, set this to False
.
save_molecules_individually
(bool.): This indicates if you want to save the molecules from the crystals individually, as well as save the full crystal.
An example of the code for PART II
is shown below:
Part II of Run_RSGC.py: Determine settings |
---|
| # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Part II: Determine settings for running the RSGC program.
# Fourth, Determine if you want any saturated aliphatic groups to be replaced with
# * Ethyl group (set this to True), or
# * Methyl group (set this to False).
leave_as_ethyls = True
# Fifth, determine if you also want to save the molecules from the crystals individually,
# as well as save the full crystal.
save_molecules_individually = True
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
Part III: Database checks
In the third section, check that the crystal databases exist. You can leave this as is, or modify it as you would like. An example of the code for PART III
is shown below:
Part III of Run_RSGC.py: Database checks |
---|
| # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Part III: Check that the folders of the databases exist.
# Sixth, check that the crystal database you gave exists.
if not os.path.exists(crystal_database_dirname):
raise Exception(f'Error: {crystal_database_dirname} does not exist in {os.getcwd()}')
# Seventh, check that the crystal database holding reparred crystals you gave exists.
if (repaired_crystal_database_dirname is not None) and (not os.path.exists(repaired_crystal_database_dirname)):
raise Exception(f'Error: {crystal_database_dirname} does not exist in {os.getcwd()}')
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
Part IV: Gather the paths to the crystal files
In the fourth section, gather all the paths of the crystal files you want to remove. You can leave this as is, or modify it as you would like. An example of the code for PART IV
is shown below:
Part IV of Run_RSGC.py: Gather the paths to the crystal files |
---|
| # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Part VI: Get the paths to the crystals you want to remove sidegroups from.
# Eighth, get the names of the files contained in the crystal database folder.
crystal_database_filenames = sorted(os.listdir(crystal_database_dirname))
# Ninth, get the names of the files contained in the repaired crystal database folder.
repaired_crystal_database_filenames = sorted(os.listdir(repaired_crystal_database_dirname)) if (repaired_crystal_database_dirname is not None) else []
# Tenth, initalise the list to hold all the paths to the crystals to remove sidegroups from.
filepath_names = []
# Eleventh, obtain all the paths to the crystals to remove sidegroups from.
for crystal_database_filename in crystal_database_filenames:
# 11.1: Make sure that the file ends with ".xyz""
if not crystal_database_filename.endswith('.xyz'):
continue
# 11.2: Get the name of the identifier for this crystal.
crystal_identifier = crystal_database_filename.replace('.xyz','')
# 11.3: If the crystal is in the exclude_identifiers list, don't process it.
if crystal_identifier in exclude_identifiers:
continue
# 11.4: Check if the crystal is in the repaired crystal database folder.
# * If it is, take the crystal from the repaired folder rather than the original folder.
crystal_folder_name = repaired_crystal_database_dirname if (crystal_database_filename in repaired_crystal_database_filenames) else crystal_database_dirname
# 11.5: Add the path to the crystal file to the filepath_names list.
filepath_names.append(crystal_folder_name+'/'+crystal_database_filename)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
Part V: Remove existing files from previous RSGC runs
In the fifth section, remove any existing files that were produced during previous RSGC runs. You can leave this as is, or modify it as you would like. An example of the code for PART V
is shown below:
Part V of Run_RSGC.py: Remove existing files from previous RSGC runs |
---|
| # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# PART V: Reset RSGC files from previous RSGC runs
# Twelfth, reset the RSGC files.
# 12.1: Remove the folder that we will place crystals in that we will remove sidegroups from.
crystal_database_with_removed_sidegroups_folder_name = f'{crystal_database_dirname}_with_removed_sidegroups'
if os.path.exists(crystal_database_with_removed_sidegroups_folder_name):
shutil.rmtree(crystal_database_with_removed_sidegroups_folder_name)
# 12.2: Remove the file indicating what issues were found when running the RSGC program.
if os.path.exists('RSGC_issues.txt'):
os.remove('RSGC_issues.txt')
# 12.3: Remove the file containing which rings contain hydrogens in them when running the RSGC program.
if os.path.exists('Rings_with_hydrogens_in_them.txt'):
os.remove('Rings_with_hydrogens_in_them.txt')
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
Part VI: Run the RSGC program
In the sixth section, remove sidegroups from all your crystals of interest using the RSGC program. You can leave this as is, or modify it as you would like. An example of the code for PART VI
is shown below:
Part VI of Run_RSGC.py: Run the RSGC program |
---|
| # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# PART VI: Run the RSGC program on the crystals you want to remove sidegroups from
# Thirteenth, run the RSGC program on the crystals you want to remove sidegroups from
# 13.1: Obtain the total number of crystal you want to process with the RSGC program.
total_no_of_crystals = str(len(filepath_names))
# 13.2: Set a counter to record successful RSGC executions.
successful = 0
# 13.3: For each crystal in the filepath_names list.
for counter, filepath in enumerate(filepath_names, start=0):
# 13.4: Print to screen how many crystals have been processed by the RSGC program.
print('Running crystal: '+str(counter)+' out of '+total_no_of_crystals)
# 13.5: Run the RSGC program.
try:
# 13.5.1: Run the RSGC program.
RSGC(filepath, leave_as_ethyls=leave_as_ethyls, save_molecules_individually=save_molecules_individually)
# 13.5.2: Record the successful result.
successful += 1
except Hydrogen_in_Ring_Exception as exception_message:
# 13.5.3: If there was an issue with the RSGC program, write the issue in the 'RSGC_issues.txt' file
with open('RSGC_issues.txt','a+') as issuesTXT:
issuesTXT.write(filepath+': '+str(exception_message)+'\n')
# 13.6: Report the number of successful executions.
print('========================')
print('Number of successfuls: '+str(successful))
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
|
Output from the RSGC Program
The RSGC program will create a folder called crystals_with_sidechains_removed
and save the xyz files of the crystals given in your Run_RSGC.py
script that you want to remove the aliphatic sidechains of.
- Another folder called
crystals_with_sidechains_removed_molecules
will also be created. This will contain the xyz
files of the individual molecules from your crystal files. This folder is purely created to allow you to check the molecules that make up your crystal, and make it easier to double-check that only the alphatic sidechains that only contain sp3 carbons and hydrogens have been removed.
As well as the crystals_with_sidechains_removed
and crystals_with_sidechains_removed_molecules
folders, the RSGC program will also create a file called RSGC_issues.txt
that will record any warning messages produced while the RSGC program.
Example Output Files from the RSGC Program
Click here to find examples of crystals from the CCDC that have been repaired with the instructions from a Run_RSGC.py
file.