eXplainable PRS (XPRS)


Inputs

The snpid using for genotype.file, PRS.scoring.file and GWAS.association.file is recommended to be rsid. If not, you should provide your cS2G.file by converting the rsid to snpid, using the original cS2G.file that is provided in the platform.
For PRS.scoring.file and GWAS.association.file, the A1 represents the alternative allele and A2 represents reference allele.

Mandatory Inputs

1. genotype.file

genotype.file is a mandatory input of the platform. Interpretable PRS accept PLINK output format (bim, bed, fam) of test.file. The Full path and the prefix of the plink file for the target data set should be provided.

  • Correct : /home/folder_name/test_file_name
  • Wrong: /home/folder_name/test_file_name.bim
    1. cohort.genotype.file
    2. If you have a cohort.genotype.file to observe risk genes within the cohort and you want to know each individual risk genes or risk SNPs, you can go to case 1.

    3. reference.genotype.file & individual.genotype.file
    4. Suppose you do NOT have a cohort.genotype.file, then you can use 1000G or UKBB as a reference. You can use a reference.genotype file to observe risk genes within the cohort and add individual genotype.files to see individual risk genes and SNPs. To do so, go to case 2.

    2. PRS.scoring.file

    PRS.scoring.file is a mandatory input of the platform. PRS.scoring.file is the file that is obtained from the PRS construction method, such as lassosum or PRS-CS. OR you can obtain from PRS catalog. The file must have the following header line.

    chr
    snpid
    pos
    A1
    A2
    beta

    3. GWAS.association.file

    GWAS.association.file is a mandatory input of the platform. GWAS.association.file is the file that is obtained from GWAS catalog. The file must have the following header line.

    MAPPED_GENE
    P_VALUE

    Optional Inputs

    4. sumstat.file

    sumstat.file is an optional input of the platform. sumstat.file is the GWAS summary statistics used to construct the PRS.scoring.file. If you have used the PRS construction method to construct the PRS.scoringfile, please provide the GWAS.file that you have used to construct the PRS. The file must have the following header line.

    CHR
    SNPID
    POS
    A1
    A2
    P.VALUE

    5. annnotation.file

    annotation.file is a given input of the platform. annotation.file is obtained from UCSC genome browser. This file is used for positional mapping in the platform. Suppose you want to use your annotation.file, you could. The file must have the following header line.

    chr
    symbol
    start
    end

    6. cS2G.file

    cS2G.file is a given input of the platform. This file is obtained from the paper Combining SNP-to-gene linking strategies to identify disease genes and assess disease omnigenicity. This is used for cS2G mapping in the platform. If your input files are not in rsid, you MUST modify your cS2G.file and upload it for usage. The file must have the following header line.

    SNP
    GENE
    cS2G
    INFO
    annotation


    Parameters

    1. Window size

    Window size is one of the parameters of the platform.

  • Default window size: 200kb
  • 2. Number of CPU cores

    Number of CPU cores is one of the parameters of the platform. Depending on your server specification, you could choose the number of CPU cores.

  • Default number of CPU cores: 8
  • 3. SNP heritability top cut off

    SNP heritability top cut off is one of the parameters of the platform. If the PRS.scoring.file contains over 100,000 SNPs, we will default to use only the top 50% of SNPs based on their heritability, in order to streamline the process and focus on the most heritable SNPs. If the SNPs in PRS.scoring.file do not exceed 100,000 SNP, we will use all the SNP in our analysis.

  • Default SNP Heritability top cut off: 0.5