API reference

gwas.py

class gwas.ActionAppendDeprecated(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Methods

__call__(parser, namespace, values[, ...])

Call self as a function.

format_usage

class gwas.ActionStoreDeprecated(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Methods

__call__(parser, namespace, values[, ...])

Call self as a function.

format_usage

class gwas.LoadFromFile(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

Methods

__call__(parser, namespace, values[, ...])

Call self as a function.

format_usage

class gwas.Logger(fh, mode)[source]

Lightweight logging.

Methods

error(msg)

Print to log file, error file and stdout with a single command.

log(msg)

Print to log file and stdout with a single command.

error(msg)[source]

Print to log file, error file and stdout with a single command.

log(msg)[source]

Print to log file and stdout with a single command.

class gwas.NumpyEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]

Methods

default(obj)

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

encode(o)

Return a JSON string representation of a Python data structure.

iterencode(o[, _one_shot])

Encode the given object and yield each string representation as available.

default(obj)[source]

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)
gwas.sec_to_str(t)[source]

Convert seconds to days:hours:minutes:seconds

pgs.pgs

class pgs.pgs.BasePGS(sumstats_file='/REF/examples/prsice2/Height.gwas.txt.gz', pheno_file='/REF/examples/prsice2/EUR.height', phenotype='Height', phenotype_class='CONTINUOUS', geno_file_prefix='/REF/examples/prsice2/EUR', output_dir='qc-output', **kwargs)[source]

Base PGS object declaration with some shared properties for subclassing

Parameters:
sumstats_file: str

summary statistics file (.gz)

pheno_file: str

phenotype file (for instance, .height)

phenotype: str or None

if not None, phenotype name (must be a column header in pheno_file)

phenotype_class: str

phenotype class, either CONTINUOUS or BINARY

geno_file_prefix: str

path to QC’d .bed, .bim, .fam files (w.o. file ending) (</ENV/path/to/data/file>)

output_dir: str

path for output files (<path>)

**kwargs
Attributes:
data_prefix: str

file name prefix of .bed, .bim, etc. files

Methods

get_str:

abstract method for returning string with commands

abstract get_str()[source]

Required public method

class pgs.pgs.PGS_LDpred2(sumstats_file='/REF/examples/prsice2/Height.gwas.txt.gz', pheno_file='/REF/examples/prsice2/EUR.height', phenotype='Height', phenotype_class='CONTINUOUS', geno_file_prefix='/REF/examples/prsice2/EUR', output_dir='PGS_ldpred2_inf', method='auto', file_geno_rds='PGS_ldpred2_inf/EUR.rds', **kwargs)[source]

Helper class for setting up LDpred2 PRS analysis. Inherited from class BasePGS

Parameters:
sumstats_file: str

summary statistics file (.gz)

pheno_file: str

phenotype file (for instance, .height)

phenotype: str or None

if not None, phenotype name (must be a column header in pheno_file)

phenotype_class: str

phenotype class, either ‘CONTINUOUS’ or ‘BINARY’

geno_file_prefix: str

path to QC’d .bed, .bim, .fam files (w.o. file ending) (</ENV/path/to/data/file>)

output_dir: str

path for output files (<path>)

method: str

LDpred2 method, either “auto” (default) or “inf” for infinitesimal

file_geno_rds: str

base name for .rds file output

**kwargs

dict of additional keyword/arguments pairs parsed to the $LDPRED2_SCRIPTS/ldpred2.R script (see file for full set of options). If the option is only a flag without value, set value as None-type or empty string.

Methods

generate_eigenvec_eigenval_files:

get_model_evaluation_str:

get_str:

generate_eigenvec_eigenval_files(nPCs=6)[source]

Return string which can be included in job script for generating .eigenvec and .eigenval files in the output directory using PLINK

Parameters:
nPCs: int

number of PCs to account for

get_model_evaluation_str(eigenvec_file=None, nPCs=None, covariate_file=None)[source]

Return callable string for fitting a simple linear model between PGS score and phenotype data using R stats::lm, printing stats::lm.fit.summary output to file

Parameters:
eigenvec_file: path

path to file with PCs (no header, columns FID, IID, PC1, PC2, …)

nPCs: int

number of PCs to account for

covariate_file: path

path to file with covariates (header, columns FID, IID, <covariate>)

Returns:
str
get_str(create_backing_file=True)[source]

Public method to create commands

Parameters:
create_backing_file: bool

if True (default), prepend statements for running the $LDPRED2_SCRIPTS/createBackingFile.R script, generating file_geno_rds

Returns:
list of str

list of command line statements for analysis run

class pgs.pgs.PGS_PRSice2(sumstats_file='/REF/examples/prsice2/Height.gwas.txt.gz', pheno_file='/REF/examples/prsice2/EUR.height', phenotype='Height', phenotype_class='CONTINUOUS', geno_file_prefix='/REF/examples/prsice2/EUR', output_dir='PGS_prsice2', covariate_file='/REF/examples/prsice2/EUR.cov', eigenvec_file='/REF/examples/prsice2/EUR.eigenvec', nPCs=6, MAF=0.01, INFO=0.8, **kwargs)[source]

Helper class for setting up PRSice-2 PRS analysis. Inherited from class BasePGS

Parameters:
sumstats_file: str

summary statistics file (.gz)

pheno_file: str

phenotype file (for instance, .height)

phenotype: str or None

if not None, phenotype name (must be a column header in pheno_file)

phenotype_class: str

phenotype class, either CONTINUOUS or BINARY

geno_file_prefix: str

path to QC’d .bed, .bim, .fam files (w.o. file ending) (</ENV/path/to/data/file>)

output_dir: str

path for output files (<path>)

covariate_file: str or None

path to covariate file (.cov)

eigenvec_file: str or None

path to eigenvec file (.eig) with PCs

nPCs: int

number of Principal Components (PCs) to include in covariate generation

MAF: float

base-MAF upper threshold value (0.01)

INFO: float

base-INFO upper threshold value (0.8)

**kwargs

dict of additional keyword/arguments pairs parsed to the Rscripts/PRSice.R script (see file for full set of options). If the option is only a flag without value, set value as None-type or empty string.

Attributes:
data_prefix: str

file name prefix of .bed, .bim, etc. files

Methods

get_model_evaluation_str:

get_str:

get_model_evaluation_str()[source]

Return callable string for fitting a simple linear model between PGS score and phenotype data using R stats::lm, printing stats::lm.fit.summary output to file

Returns:
str
get_str()[source]

Public method to create commands

Returns:
list of str

list of command line statements for analysis run

Helper class for setting up Plink PRS analysis. Inherited from class BasePGS

Parameters:
sumstats_file: str

summary statistics file (.gz)

pheno_file: str

phenotype file (for instance, .height)

phenotype: str or None

if not None, phenotype name (must be a column header in pheno_file)

phenotype_class: str

phenotype class, either CONTINUOUS or BINARY

geno_file_prefix: str

path to QC’d .bed, .bim, .fam files (w.o. file ending) (</ENV/path/to/data/file>)

output_dir: str

path for output files (<path>)

covariate_file: str

path to covariance file (.cov)

eigenvec_file: str or None

None, or path to eigenvec file (.eigenvec)

clump_p1: float

plink –clump-p1 parameter value (default: 1)

clump_r2: float

plink –clump-r2 parameter value (default: 0.1)

clump_kb: float

plink –clump-r2 parameter value (default: 250)

clump_snp_field: str

plink –clump-snp-field parameter value (default: ‘SNP’)

clump_field: str

plink –clump-field parameter value (default: ‘P’)

range_list: list of floats

list of p-value ranges for plink –q-score-range arg. (default: [0.001, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5])

strat_indep_pairwise: list of scalars

plink –indep-pairwise parameters for stratification describing window size (kb), step size (variant ct), r^2 threshold (default: [250, 50, 0.25])

nPCs: int

plink –pca parameter value (default: 6)

# score_args: list
# plink –score arguments (default: [3, 4, 12, ‘header’])
score_columns: list of str

for plink’s –score, column names in sumstats_file. Requires header. Default: [‘SNP’, ‘A1’, ‘BETA’]

**kwargs
Attributes:
data_prefix: str

file name prefix of .bed, .bim, etc. files

Methods

get_model_evaluation_str:

get_str:

get_model_evaluation_str()[source]

Return callable string for fitting a simple linear model between PGS score and phenotype data using R stats::lm, printing stats::lm.fit.summary output to file

Returns:
str
get_str(mode='basic', update_effect_size=False)[source]
Parameters:
mode: str

‘basic’ or ‘stratification’

update_effect_size: bool

if True, compute PGS using OR

class pgs.pgs.Standard_GWAS_QC(sumstats_file='/REF/examples/prsice2/Height.gwas.txt.gz', pheno_file='/REF/examples/prsice2/EUR.height', geno_file_prefix='/REF/examples/prsice2/EUR', output_dir='QC_data', phenotype='Height', data_postfix='.QC', QC_target_kwargs=None, QC_prune_kwargs=None, QC_relatedness_prune_kwargs=None, **kwargs)[source]

Helper class for common GWAS QC. Inherited from class BasePGS

Based on the tutorial https://choishingwan.github.io/PRS-Tutorial/target/#qc-of-target-data

Use with caution. This class is not fully tested.

Parameters:
sumstats_file: str

summary statistics file (.gz)

pheno_file: str

phenotype file (for instance, .height)

geno_file_prefix: str

path to (raw) .bed, .bim, .fam files (w.o. file ending) (</ENV/path/to/data/file>)

output_dir: str

path for output files (<path>)

phenotype: str

default: ‘Height’

data_postfix: str

default: ‘.QC’

QC_target_kwargs: dict

default: {‘maf’: 0.01, ‘hwe’: 1e-6, ‘geno’: 0.01, ‘mind’: 0.01}

QC_prune_kwargs: dict

default: {‘indep-pairwise’: [200, 50, 0.25]}

QC_relatedness_prune_kwargs: dict

defaultL: {‘rel-cutoff’: 0.125}

**kwargs

Methods

get_str:

get_str()[source]

Standard GWAS QC

pgs.pgs.convert_dict_to_str(d, key_prefix='--')[source]
Parameters:
d: dict

key, value pairs

key_prefix: str

string prefix for key names. Default: “–”

Returns:
str

string formatted as “–key0 value0 –key1 value1 …”. In case values are iterable, it will be formatted as “–key0 value0[0] value0[1] … –key0”

pgs.pgs.df_colums_to_file(source_file, output_file, usecols=None, delim_whitespace=True, delimiter=None, **kwargs)[source]

Extract columns from dataframe (.csv) on file to output_file

Parameters:
source_file: file path

.csv (or similar) input file read by pandas.read_csv.

output_file: file path

output file to be written

usecols: list of str or None

columns to read and write

delim_whitespace: bool

parsed to df.read_csv. Default: True

delimiter: None or str

delimiter. Default: None

**kwargs

keyword arguments parsed to pd.read_csv()

Read best-fit predictions and export standardized test.score file to output_dir from class PGS_Plink output

Parameters:
output_dir: path

path to output directory

data_prefix: str

standard file name prefix (for .bed, .bim, .fam, etc.)

best_fit_file: str

.csv file in output_dir with best fit Threshold value. Default: ‘best_fit_prs.csv’

score_file: str

test score file in output_dir. Default: ‘test.score’

pgs.pgs.post_run_prsice2(output_dir, data_prefix, score_file='test.score')[source]

Read best-fit predictions and export standardized test.score file to output_dir from class PGS_PRSice2 output

Parameters:
output_dir: path

path to output directory

data_prefix: str

standard file name prefix (for .bed, .bim, .fam, etc.)

score_file: str

test score file in output_dir. Default: ‘test.score’

pgs.pgs.run_call(call)[source]

run subprocess call

pgs.pgs.set_env(config)[source]

Function to set environment variables from config.yaml

TODO: add defaults

Parameters:
config: dict

config dictionary from config.yaml (or similar file)