eXplainable PRS (XPRS)
Inputs
Mandatory Inputs
1. genotype.file
genotype.file is a mandatory input of the platform.
Interpretable PRS accept PLINK output format (bim, bed, fam) of test.file.
The Full path and the prefix of the plink file for the target data set should be provided.
- cohort.genotype.file
- reference.genotype.file & individual.genotype.file
If you have a cohort.genotype.file to observe risk genes within the cohort and you want to know each individual risk genes or risk SNPs, you can go to case 1.
Suppose you do NOT have a cohort.genotype.file, then you can use 1000G or UKBB as a reference. You can use a reference.genotype file to observe risk genes within the cohort and add individual genotype.files to see individual risk genes and SNPs. To do so, go to case 2.
2. PRS.scoring.file
PRS.scoring.file is a mandatory input of the platform. PRS.scoring.file is the file that is obtained from the PRS construction method, such as lassosum or PRS-CS. OR you can obtain from PRS catalog. The file must have the following header line.
chr
snpid
pos
A1
A2
beta
3. GWAS.association.file
GWAS.association.file is a mandatory input of the platform. GWAS.association.file is the file that is obtained from GWAS catalog. The file must have the following header line.
MAPPED_GENE
P_VALUE
Optional Inputs
4. sumstat.file
sumstat.file is an optional input of the platform. sumstat.file is the GWAS summary statistics used to construct the PRS.scoring.file. If you have used the PRS construction method to construct the PRS.scoringfile, please provide the GWAS.file that you have used to construct the PRS. The file must have the following header line.
CHR
SNPID
POS
A1
A2
P.VALUE
5. annnotation.file
annotation.file is a given input of the platform. annotation.file is obtained from UCSC genome browser. This file is used for positional mapping in the platform. Suppose you want to use your annotation.file, you could. The file must have the following header line.
chr
symbol
start
end
6. cS2G.file
cS2G.file is a given input of the platform. This file is obtained from the paper Combining SNP-to-gene linking strategies to identify disease genes and assess disease omnigenicity. This is used for cS2G mapping in the platform. If your input files are not in rsid, you MUST modify your cS2G.file and upload it for usage. The file must have the following header line.
SNP
GENE
cS2G
INFO
annotation
Parameters
1. Window size
Window size is one of the parameters of the platform.
Default window size: 200kb
2. Number of CPU cores
Number of CPU cores is one of the parameters of the platform.
Depending on your server specification, you could choose the number of CPU cores.
Default number of CPU cores: 8
3. SNP heritability top cut off
SNP heritability top cut off is one of the parameters of the platform.
If the PRS.scoring.file contains over 100,000 SNPs,
we will default to use only the top 50% of SNPs based on their heritability,
in order to streamline the process and focus on the most heritable SNPs.
If the SNPs in PRS.scoring.file do not exceed 100,000 SNP, we will use all the SNP in our
analysis.
Default SNP Heritability top cut off: 0.5
1. Window size
Window size is one of the parameters of the platform.
2. Number of CPU cores
Number of CPU cores is one of the parameters of the platform. Depending on your server specification, you could choose the number of CPU cores.
3. SNP heritability top cut off
SNP heritability top cut off is one of the parameters of the platform. If the PRS.scoring.file contains over 100,000 SNPs, we will default to use only the top 50% of SNPs based on their heritability, in order to streamline the process and focus on the most heritable SNPs. If the SNPs in PRS.scoring.file do not exceed 100,000 SNP, we will use all the SNP in our analysis.