API reference
gwas.py
- class gwas.ActionAppendDeprecated(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]
Methods
__call__
(parser, namespace, values[, ...])Call self as a function.
format_usage
- class gwas.ActionStoreDeprecated(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]
Methods
__call__
(parser, namespace, values[, ...])Call self as a function.
format_usage
- class gwas.LoadFromFile(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]
Methods
__call__
(parser, namespace, values[, ...])Call self as a function.
format_usage
- class gwas.Logger(fh, mode)[source]
Lightweight logging.
Methods
error
(msg)Print to log file, error file and stdout with a single command.
log
(msg)Print to log file and stdout with a single command.
- class gwas.NumpyEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]
Methods
default
(obj)Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).encode
(o)Return a JSON string representation of a Python data structure.
iterencode
(o[, _one_shot])Encode the given object and yield each string representation as available.
- default(obj)[source]
Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)
pgs.pgs
- class pgs.pgs.BasePGS(sumstats_file='/REF/examples/prsice2/Height.gwas.txt.gz', pheno_file='/REF/examples/prsice2/EUR.height', phenotype='Height', phenotype_class='CONTINUOUS', geno_file_prefix='/REF/examples/prsice2/EUR', output_dir='qc-output', **kwargs)[source]
Base PGS object declaration with some shared properties for subclassing
- Parameters:
- sumstats_file: str
summary statistics file (.gz)
- pheno_file: str
phenotype file (for instance, .height)
- phenotype: str or None
if not
None
, phenotype name (must be a column header in pheno_file)- phenotype_class: str
phenotype class, either
CONTINUOUS
orBINARY
- geno_file_prefix: str
path to QC’d .bed, .bim, .fam files (w.o. file ending) (</ENV/path/to/data/file>)
- output_dir: str
path for output files (<path>)
- **kwargs
- Attributes:
- data_prefix: str
file name prefix of .bed, .bim, etc. files
Methods
get_str:
abstract method for returning string with commands
- class pgs.pgs.PGS_LDpred2(sumstats_file='/REF/examples/prsice2/Height.gwas.txt.gz', pheno_file='/REF/examples/prsice2/EUR.height', phenotype='Height', phenotype_class='CONTINUOUS', geno_file_prefix='/REF/examples/prsice2/EUR', output_dir='PGS_ldpred2_inf', method='auto', file_geno_rds='PGS_ldpred2_inf/EUR.rds', **kwargs)[source]
Helper class for setting up LDpred2 PRS analysis. Inherited from class
BasePGS
- Parameters:
- sumstats_file: str
summary statistics file (.gz)
- pheno_file: str
phenotype file (for instance, .height)
- phenotype: str or None
if not
None
, phenotype name (must be a column header inpheno_file
)- phenotype_class: str
phenotype class, either ‘CONTINUOUS’ or ‘BINARY’
- geno_file_prefix: str
path to QC’d .bed, .bim, .fam files (w.o. file ending) (</ENV/path/to/data/file>)
- output_dir: str
path for output files (<path>)
- method: str
LDpred2 method, either “auto” (default) or “inf” for infinitesimal
- file_geno_rds: str
base name for .rds file output
- **kwargs
dict of additional keyword/arguments pairs parsed to the
$LDPRED2_SCRIPTS/ldpred2.R
script (see file for full set of options). If the option is only a flag without value, set value as None-type or empty string.
Methods
generate_eigenvec_eigenval_files:
get_model_evaluation_str:
get_str:
- generate_eigenvec_eigenval_files(nPCs=6)[source]
Return string which can be included in job script for generating .eigenvec and .eigenval files in the output directory using PLINK
- Parameters:
- nPCs: int
number of PCs to account for
- get_model_evaluation_str(eigenvec_file=None, nPCs=None, covariate_file=None)[source]
Return callable string for fitting a simple linear model between PGS score and phenotype data using R stats::lm, printing stats::lm.fit.summary output to file
- Parameters:
- eigenvec_file: path
path to file with PCs (no header, columns FID, IID, PC1, PC2, …)
- nPCs: int
number of PCs to account for
- covariate_file: path
path to file with covariates (header, columns FID, IID, <covariate>)
- Returns:
- str
- get_str(create_backing_file=True)[source]
Public method to create commands
- Parameters:
- create_backing_file: bool
if True (default), prepend statements for running the
$LDPRED2_SCRIPTS/createBackingFile.R
script, generatingfile_geno_rds
- Returns:
- list of str
list of command line statements for analysis run
- class pgs.pgs.PGS_PRSice2(sumstats_file='/REF/examples/prsice2/Height.gwas.txt.gz', pheno_file='/REF/examples/prsice2/EUR.height', phenotype='Height', phenotype_class='CONTINUOUS', geno_file_prefix='/REF/examples/prsice2/EUR', output_dir='PGS_prsice2', covariate_file='/REF/examples/prsice2/EUR.cov', eigenvec_file='/REF/examples/prsice2/EUR.eigenvec', nPCs=6, MAF=0.01, INFO=0.8, **kwargs)[source]
Helper class for setting up PRSice-2 PRS analysis. Inherited from class
BasePGS
- Parameters:
- sumstats_file: str
summary statistics file (.gz)
- pheno_file: str
phenotype file (for instance, .height)
- phenotype: str or None
if not
None
, phenotype name (must be a column header inpheno_file
)- phenotype_class: str
phenotype class, either
CONTINUOUS
orBINARY
- geno_file_prefix: str
path to QC’d .bed, .bim, .fam files (w.o. file ending) (</ENV/path/to/data/file>)
- output_dir: str
path for output files (<path>)
- covariate_file: str or None
path to covariate file (.cov)
- eigenvec_file: str or None
path to eigenvec file (.eig) with PCs
- nPCs: int
number of Principal Components (PCs) to include in covariate generation
- MAF: float
base-MAF upper threshold value (0.01)
- INFO: float
base-INFO upper threshold value (0.8)
- **kwargs
dict of additional keyword/arguments pairs parsed to the Rscripts/PRSice.R script (see file for full set of options). If the option is only a flag without value, set value as None-type or empty string.
- Attributes:
- data_prefix: str
file name prefix of .bed, .bim, etc. files
Methods
get_model_evaluation_str:
get_str:
- class pgs.pgs.PGS_Plink(sumstats_file='/REF/examples/prsice2/Height.gwas.txt.gz', pheno_file='/REF/examples/prsice2/EUR.height', phenotype='Height', phenotype_class='CONTINUOUS', geno_file_prefix='QC_data/EUR', output_dir='PGS_plink', covariate_file='/REF/examples/prsice2/EUR.cov', eigenvec_file='/REF/examples/prsice2/EUR.eigenvec', clump_p1=1, clump_r2=0.1, clump_kb=250, clump_snp_field='SNP', clump_field='P', range_list=None, strat_indep_pairwise=None, nPCs=6, score_columns=None, **kwargs)[source]
Helper class for setting up Plink PRS analysis. Inherited from class
BasePGS
- Parameters:
- sumstats_file: str
summary statistics file (.gz)
- pheno_file: str
phenotype file (for instance, .height)
- phenotype: str or None
if not
None
, phenotype name (must be a column header inpheno_file
)- phenotype_class: str
phenotype class, either
CONTINUOUS
orBINARY
- geno_file_prefix: str
path to QC’d .bed, .bim, .fam files (w.o. file ending) (</ENV/path/to/data/file>)
- output_dir: str
path for output files (<path>)
- covariate_file: str
path to covariance file (.cov)
- eigenvec_file: str or None
None, or path to eigenvec file (.eigenvec)
- clump_p1: float
plink –clump-p1 parameter value (default: 1)
- clump_r2: float
plink –clump-r2 parameter value (default: 0.1)
- clump_kb: float
plink –clump-r2 parameter value (default: 250)
- clump_snp_field: str
plink –clump-snp-field parameter value (default: ‘SNP’)
- clump_field: str
plink –clump-field parameter value (default: ‘P’)
- range_list: list of floats
list of p-value ranges for plink –q-score-range arg. (default: [0.001, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5])
- strat_indep_pairwise: list of scalars
plink –indep-pairwise parameters for stratification describing window size (kb), step size (variant ct), r^2 threshold (default: [250, 50, 0.25])
- nPCs: int
plink –pca parameter value (default: 6)
- # score_args: list
- # plink –score arguments (default: [3, 4, 12, ‘header’])
- score_columns: list of str
for plink’s –score, column names in sumstats_file. Requires header. Default: [‘SNP’, ‘A1’, ‘BETA’]
- **kwargs
- Attributes:
- data_prefix: str
file name prefix of .bed, .bim, etc. files
Methods
get_model_evaluation_str:
get_str:
- class pgs.pgs.Standard_GWAS_QC(sumstats_file='/REF/examples/prsice2/Height.gwas.txt.gz', pheno_file='/REF/examples/prsice2/EUR.height', geno_file_prefix='/REF/examples/prsice2/EUR', output_dir='QC_data', phenotype='Height', data_postfix='.QC', QC_target_kwargs=None, QC_prune_kwargs=None, QC_relatedness_prune_kwargs=None, **kwargs)[source]
Helper class for common GWAS QC. Inherited from class
BasePGS
Based on the tutorial https://choishingwan.github.io/PRS-Tutorial/target/#qc-of-target-data
Use with caution. This class is not fully tested.
- Parameters:
- sumstats_file: str
summary statistics file (.gz)
- pheno_file: str
phenotype file (for instance, .height)
- geno_file_prefix: str
path to (raw) .bed, .bim, .fam files (w.o. file ending) (</ENV/path/to/data/file>)
- output_dir: str
path for output files (<path>)
- phenotype: str
default: ‘Height’
- data_postfix: str
default: ‘.QC’
- QC_target_kwargs: dict
default: {‘maf’: 0.01, ‘hwe’: 1e-6, ‘geno’: 0.01, ‘mind’: 0.01}
- QC_prune_kwargs: dict
default: {‘indep-pairwise’: [200, 50, 0.25]}
- QC_relatedness_prune_kwargs: dict
defaultL: {‘rel-cutoff’: 0.125}
- **kwargs
Methods
get_str:
- pgs.pgs.convert_dict_to_str(d, key_prefix='--')[source]
- Parameters:
- d: dict
key, value pairs
- key_prefix: str
string prefix for key names. Default: “–”
- Returns:
- str
string formatted as “–key0 value0 –key1 value1 …”. In case values are iterable, it will be formatted as “–key0 value0[0] value0[1] … –key0”
- pgs.pgs.df_colums_to_file(source_file, output_file, usecols=None, delim_whitespace=True, delimiter=None, **kwargs)[source]
Extract columns from dataframe (.csv) on file to output_file
- Parameters:
- source_file: file path
.csv (or similar) input file read by pandas.read_csv.
- output_file: file path
output file to be written
- usecols: list of str or None
columns to read and write
- delim_whitespace: bool
parsed to df.read_csv. Default: True
- delimiter: None or str
delimiter. Default: None
- **kwargs
keyword arguments parsed to pd.read_csv()
- pgs.pgs.post_run_plink(output_dir, data_prefix, best_fit_file='best_fit_prs.csv', score_file='test.score')[source]
Read best-fit predictions and export standardized
test.score
file to output_dir from class PGS_Plink output- Parameters:
- output_dir: path
path to output directory
- data_prefix: str
standard file name prefix (for .bed, .bim, .fam, etc.)
- best_fit_file: str
.csv file in
output_dir
with best fit Threshold value. Default: ‘best_fit_prs.csv’- score_file: str
test score file in
output_dir
. Default: ‘test.score’
- pgs.pgs.post_run_prsice2(output_dir, data_prefix, score_file='test.score')[source]
Read best-fit predictions and export standardized
test.score
file to output_dir from class PGS_PRSice2 output- Parameters:
- output_dir: path
path to output directory
- data_prefix: str
standard file name prefix (for .bed, .bim, .fam, etc.)
- score_file: str
test score file in
output_dir
. Default: ‘test.score’