API reference

gwas.py 

class gwas.ActionAppendDeprecated(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Methods

__call__(parser, namespace, values[, ...])

Call self as a function.

format_usage

class gwas.ActionStoreDeprecated(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Methods

__call__(parser, namespace, values[, ...])

Call self as a function.

format_usage

class gwas.LoadFromFile(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Methods

__call__(parser, namespace, values[, ...])

Call self as a function.

format_usage

class gwas.Logger(fh, mode)[source]

Lightweight logging.

Methods

`error`(msg)	Print to log file, error file and stdout with a single command.
`log`(msg)	Print to log file and stdout with a single command.

error(msg)[source]: Print to log file, error file and stdout with a single command.

log(msg)[source]: Print to log file and stdout with a single command.

class gwas.NumpyEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]

Methods

`default`(obj)	Implement this method in a subclass such that it returns a serializable object for `o`, or calls the base implementation (to raise a `TypeError`).
`encode`(o)	Return a JSON string representation of a Python data structure.
`iterencode`(o[, _one_shot])	Encode the given object and yield each string representation as available.

default(obj)[source]

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)

gwas.sec_to_str(t)[source]: Convert seconds to days:hours:minutes:seconds

pgs.pgs

class pgs.pgs.BasePGS(sumstats_file='/REF/examples/prsice2/Height.gwas.txt.gz', pheno_file='/REF/examples/prsice2/EUR.height', phenotype='Height', phenotype_class='CONTINUOUS', geno_file_prefix='/REF/examples/prsice2/EUR', output_dir='qc-output', **kwargs)[source]

Base PGS object declaration with some shared properties for subclassing

Parameters:

sumstats_file: str: summary statistics file (.gz)
pheno_file: str: phenotype file (for instance, .height)
phenotype: str or None: if not None, phenotype name (must be a column header in pheno_file)
phenotype_class: str: phenotype class, either CONTINUOUS or BINARY
geno_file_prefix: str: path to QC’d .bed, .bim, .fam files (w.o. file ending) (</ENV/path/to/data/file>)
output_dir: str: path for output files (<path>)
**kwargs

Attributes:

data_prefix: str: file name prefix of .bed, .bim, etc. files

Methods

get_str:

abstract method for returning string with commands

abstract get_str()[source]: Required public method

class pgs.pgs.PGS_LDpred2(sumstats_file='/REF/examples/prsice2/Height.gwas.txt.gz', pheno_file='/REF/examples/prsice2/EUR.height', phenotype='Height', phenotype_class='CONTINUOUS', geno_file_prefix='/REF/examples/prsice2/EUR', output_dir='PGS_ldpred2_inf', method='auto', file_geno_rds='PGS_ldpred2_inf/EUR.rds', **kwargs)[source]

Helper class for setting up LDpred2 PRS analysis. Inherited from class BasePGS

Parameters:

sumstats_file: str: summary statistics file (.gz)
pheno_file: str: phenotype file (for instance, .height)
phenotype: str or None: if not None, phenotype name (must be a column header in pheno_file)
phenotype_class: str: phenotype class, either ‘CONTINUOUS’ or ‘BINARY’
geno_file_prefix: str: path to QC’d .bed, .bim, .fam files (w.o. file ending) (</ENV/path/to/data/file>)
output_dir: str: path for output files (<path>)
method: str: LDpred2 method, either “auto” (default) or “inf” for infinitesimal
file_geno_rds: str: base name for .rds file output
**kwargs: dict of additional keyword/arguments pairs parsed to the $LDPRED2_SCRIPTS/ldpred2.R script (see file for full set of options). If the option is only a flag without value, set value as None-type or empty string.

Methods

generate_eigenvec_eigenval_files:
get_model_evaluation_str:
get_str:

generate_eigenvec_eigenval_files(nPCs=6)[source]

Return string which can be included in job script for generating .eigenvec and .eigenval files in the output directory using PLINK

Parameters:

nPCs: int: number of PCs to account for

get_model_evaluation_str(eigenvec_file=None, nPCs=None, covariate_file=None)[source]

Return callable string for fitting a simple linear model between PGS score and phenotype data using R stats::lm, printing stats::lm.fit.summary output to file

Parameters:

eigenvec_file: path: path to file with PCs (no header, columns FID, IID, PC1, PC2, …)
nPCs: int: number of PCs to account for
covariate_file: path: path to file with covariates (header, columns FID, IID, <covariate>)

Returns:

str

get_str(create_backing_file=True)[source]

Public method to create commands

Parameters:

create_backing_file: bool: if True (default), prepend statements for running the $LDPRED2_SCRIPTS/createBackingFile.R script, generating file_geno_rds

Returns:

list of str: list of command line statements for analysis run

class pgs.pgs.PGS_PRSice2(sumstats_file='/REF/examples/prsice2/Height.gwas.txt.gz', pheno_file='/REF/examples/prsice2/EUR.height', phenotype='Height', phenotype_class='CONTINUOUS', geno_file_prefix='/REF/examples/prsice2/EUR', output_dir='PGS_prsice2', covariate_file='/REF/examples/prsice2/EUR.cov', eigenvec_file='/REF/examples/prsice2/EUR.eigenvec', nPCs=6, MAF=0.01, INFO=0.8, **kwargs)[source]

Helper class for setting up PRSice-2 PRS analysis. Inherited from class BasePGS

Parameters:

sumstats_file: str: summary statistics file (.gz)
pheno_file: str: phenotype file (for instance, .height)
phenotype: str or None: if not None, phenotype name (must be a column header in pheno_file)
phenotype_class: str: phenotype class, either CONTINUOUS or BINARY
geno_file_prefix: str: path to QC’d .bed, .bim, .fam files (w.o. file ending) (</ENV/path/to/data/file>)
output_dir: str: path for output files (<path>)
covariate_file: str or None: path to covariate file (.cov)
eigenvec_file: str or None: path to eigenvec file (.eig) with PCs
nPCs: int: number of Principal Components (PCs) to include in covariate generation
MAF: float: base-MAF upper threshold value (0.01)
INFO: float: base-INFO upper threshold value (0.8)
**kwargs: dict of additional keyword/arguments pairs parsed to the Rscripts/PRSice.R script (see file for full set of options). If the option is only a flag without value, set value as None-type or empty string.

Attributes:

data_prefix: str: file name prefix of .bed, .bim, etc. files

Methods

get_model_evaluation_str:
get_str:

get_model_evaluation_str()[source]

Return callable string for fitting a simple linear model between PGS score and phenotype data using R stats::lm, printing stats::lm.fit.summary output to file

Returns:

str

get_str()[source]

Public method to create commands

Returns:

list of str: list of command line statements for analysis run

class pgs.pgs.PGS_Plink(sumstats_file='/REF/examples/prsice2/Height.gwas.txt.gz', pheno_file='/REF/examples/prsice2/EUR.height', phenotype='Height', phenotype_class='CONTINUOUS', geno_file_prefix='QC_data/EUR', output_dir='PGS_plink', covariate_file='/REF/examples/prsice2/EUR.cov', eigenvec_file='/REF/examples/prsice2/EUR.eigenvec', clump_p1=1, clump_r2=0.1, clump_kb=250, clump_snp_field='SNP', clump_field='P', range_list=None, strat_indep_pairwise=None, nPCs=6, score_columns=None, **kwargs)[source]

Helper class for setting up Plink PRS analysis. Inherited from class BasePGS

Parameters:

sumstats_file: str: summary statistics file (.gz)
pheno_file: str: phenotype file (for instance, .height)
phenotype: str or None: if not None, phenotype name (must be a column header in pheno_file)
phenotype_class: str: phenotype class, either CONTINUOUS or BINARY
geno_file_prefix: str: path to QC’d .bed, .bim, .fam files (w.o. file ending) (</ENV/path/to/data/file>)
output_dir: str: path for output files (<path>)
covariate_file: str: path to covariance file (.cov)
eigenvec_file: str or None: None, or path to eigenvec file (.eigenvec)
clump_p1: float: plink –clump-p1 parameter value (default: 1)
clump_r2: float: plink –clump-r2 parameter value (default: 0.1)
clump_kb: float: plink –clump-r2 parameter value (default: 250)
clump_snp_field: str: plink –clump-snp-field parameter value (default: ‘SNP’)
clump_field: str: plink –clump-field parameter value (default: ‘P’)
range_list: list of floats: list of p-value ranges for plink –q-score-range arg. (default: [0.001, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5])
strat_indep_pairwise: list of scalars: plink –indep-pairwise parameters for stratification describing window size (kb), step size (variant ct), r^2 threshold (default: [250, 50, 0.25])
nPCs: int: plink –pca parameter value (default: 6)
# score_args: list
# plink –score arguments (default: [3, 4, 12, ‘header’])
score_columns: list of str: for plink’s –score, column names in sumstats_file. Requires header. Default: [‘SNP’, ‘A1’, ‘BETA’]
**kwargs

Attributes:

data_prefix: str: file name prefix of .bed, .bim, etc. files

Methods

get_model_evaluation_str:
get_str:

get_model_evaluation_str()[source]

Return callable string for fitting a simple linear model between PGS score and phenotype data using R stats::lm, printing stats::lm.fit.summary output to file

Returns:

str

get_str(mode='basic', update_effect_size=False)[source]

Parameters:

mode: str: ‘basic’ or ‘stratification’
update_effect_size: bool: if True, compute PGS using OR

class pgs.pgs.Standard_GWAS_QC(sumstats_file='/REF/examples/prsice2/Height.gwas.txt.gz', pheno_file='/REF/examples/prsice2/EUR.height', geno_file_prefix='/REF/examples/prsice2/EUR', output_dir='QC_data', phenotype='Height', data_postfix='.QC', QC_target_kwargs=None, QC_prune_kwargs=None, QC_relatedness_prune_kwargs=None, **kwargs)[source]

Helper class for common GWAS QC. Inherited from class BasePGS

Based on the tutorial https://choishingwan.github.io/PRS-Tutorial/target/#qc-of-target-data

Use with caution. This class is not fully tested.

Parameters:

sumstats_file: str: summary statistics file (.gz)
pheno_file: str: phenotype file (for instance, .height)
geno_file_prefix: str: path to (raw) .bed, .bim, .fam files (w.o. file ending) (</ENV/path/to/data/file>)
output_dir: str: path for output files (<path>)
phenotype: str: default: ‘Height’
data_postfix: str: default: ‘.QC’
QC_target_kwargs: dict: default: {‘maf’: 0.01, ‘hwe’: 1e-6, ‘geno’: 0.01, ‘mind’: 0.01}
QC_prune_kwargs: dict: default: {‘indep-pairwise’: [200, 50, 0.25]}
QC_relatedness_prune_kwargs: dict: defaultL: {‘rel-cutoff’: 0.125}
**kwargs

Methods

get_str:

get_str()[source]: Standard GWAS QC

pgs.pgs.convert_dict_to_str(d, key_prefix='--')[source]

Parameters:

d: dict: key, value pairs
key_prefix: str: string prefix for key names. Default: “–”

Returns:

str: string formatted as “–key0 value0 –key1 value1 …”. In case values are iterable, it will be formatted as “–key0 value0[0] value0[1] … –key0”

pgs.pgs.df_colums_to_file(source_file, output_file, usecols=None, delim_whitespace=True, delimiter=None, **kwargs)[source]

Extract columns from dataframe (.csv) on file to output_file

Parameters:

source_file: file path: .csv (or similar) input file read by pandas.read_csv.
output_file: file path: output file to be written
usecols: list of str or None: columns to read and write
delim_whitespace: bool: parsed to df.read_csv. Default: True
delimiter: None or str: delimiter. Default: None
**kwargs: keyword arguments parsed to pd.read_csv()

pgs.pgs.post_run_plink(output_dir, data_prefix, best_fit_file='best_fit_prs.csv', score_file='test.score')[source]

Read best-fit predictions and export standardized test.score file to output_dir from class PGS_Plink output

Parameters:

output_dir: path: path to output directory
data_prefix: str: standard file name prefix (for .bed, .bim, .fam, etc.)
best_fit_file: str: .csv file in output_dir with best fit Threshold value. Default: ‘best_fit_prs.csv’
score_file: str: test score file in output_dir. Default: ‘test.score’

pgs.pgs.post_run_prsice2(output_dir, data_prefix, score_file='test.score')[source]

Read best-fit predictions and export standardized test.score file to output_dir from class PGS_PRSice2 output

Parameters:

output_dir: path: path to output directory
data_prefix: str: standard file name prefix (for .bed, .bim, .fam, etc.)
score_file: str: test score file in output_dir. Default: ‘test.score’

pgs.pgs.run_call(call)[source]: run subprocess call

pgs.pgs.set_env(config)[source]

Function to set environment variables from config.yaml

TODO: add defaults

Parameters:

config: dict: config dictionary from config.yaml (or similar file)

API reference

gwas.py

pgs.pgs

gwas.py 