Command-Line Interfaces#

yasim_sc#

yasim_sc art#

art.py -- Single-Cell LLRG adapter for ART, a NGS DNA-Seq simulator

SYNOPSIS: python -m yasim_sc art [-h] -F [FASTAS] [-j [JOBS]] [--simulator_name [SIMULATOR_NAME]] [-e [LLRG_EXECUTABLE_PATH]] -d [DEPTH] -o [OUT]
                                 [--sequencer_name [{GA1,GA2,HS10,HS20,HS25,HSXn,HSXt,MinS,MSv1,MSv3,NS50}]] [--read_length [READ_LENGTH]]
                                 [--pair_end_fragment_length_mean [PAIR_END_FRAGMENT_LENGTH_MEAN]] [--pair_end_fragment_length_std [PAIR_END_FRAGMENT_LENGTH_STD]]
                                 [--is_pair_end] [--preserve_intermediate_files]

OPTIONS:
  -h, --help
                        [OPTIONAL]
                        show this help message and exit
  -F [FASTAS], --fastas [FASTAS]
                        [REQUIRED] Type: str; No defaults
                        Directory of transcribed cDNA sequences in FASTA format from `transcribe` step
  -j [JOBS], --jobs [JOBS]
                        [OPTIONAL] Type: int; Default: 36
                        Number of LLRGs to be executed in parallel
  --simulator_name [SIMULATOR_NAME]
                        [OPTIONAL] Type: str; Default: None
                        Custom simulator name. Used in FASTQ tags
                        This step is done in assemble, so if --not_perform_assemble is set, this option would be useless.
  -e [LLRG_EXECUTABLE_PATH], --llrg_executable_path [LLRG_EXECUTABLE_PATH]
                        [OPTIONAL] Type: str; Default: art_illumina
                        Executable name or absolute path of art
  -d [DEPTH], --depth [DEPTH]
                        [REQUIRED] Type: str; No defaults
                        Path to Isoform-Level Depth directory generated by `generate_barcoded_isoform_replicates`
  -o [OUT], --out [OUT]
                        [REQUIRED] Type: str; No defaults
                        Path to output Directory.
  --sequencer_name [{GA1,GA2,HS10,HS20,HS25,HSXn,HSXt,MinS,MSv1,MSv3,NS50}]
                        [OPTIONAL] Type: str; Default: HS25
                        Name of Illumina Sequencer to Simulate: GA1 -- GenomeAnalyzer I, GA2 -- GenomeAnalyzer II, HS10 -- HiSeq 1000, HS20 -- HiSeq 2000, HS25 -- HiSeq 2500, HSXn -- HiSeqX PCR free, HSXt -- HiSeqX TruSeq, MinS -- MiniSeq TruSeq, MSv1 -- MiSeq v1, MSv3 -- MSv3 - MiSeq v3, NS50 -- NextSeq500 v2
                        CHOICES:
                            GA1
                            GA2
                            HS10
                            HS20
                            HS25
                            HSXn
                            HSXt
                            MinS
                            MSv1
                            MSv3
                            NS50
  --read_length [READ_LENGTH]
                        [OPTIONAL] Type: int; Default: 0
                        Read length. Sequencer -- Read Length Table: GenomeAnalyzer I -- [36, 44], GenomeAnalyzer II -- [50, 75], HiSeq 1000 -- [100], HiSeq 2000 -- [100], HiSeq 2500 -- [125, 150], HiSeqX PCR free -- [150], HiSeqX TruSeq -- [150], MiniSeq TruSeq -- [50], MiSeq v1 -- [250], MSv3 - MiSeq v3 -- [250], NextSeq500 v2 -- [75]
  --pair_end_fragment_length_mean [PAIR_END_FRAGMENT_LENGTH_MEAN]
                        [OPTIONAL] Type: int; Default: 0
                        [PE Only] The mean size of DNA/RNA fragments for paired-end simulations
  --pair_end_fragment_length_std [PAIR_END_FRAGMENT_LENGTH_STD]
                        [OPTIONAL] Type: int; Default: 0
                        [PE Only] The standard deviation of DNA/RNA fragment size for paired-end simulations.
  --is_pair_end
                        [OPTIONAL] Default: False
                        Whether to use Pair End (PE) Simulation
  --preserve_intermediate_files
                        [OPTIONAL] Default: False
                        Do not remove intermediate files.

yasim_sctcr#

yasim_sctcr convert_anndata#

convert_anndata -- Convert Scanpy AnnData object to TSV/Apache Parquet for ``scaffold``

SYNOPSIS: python -m yasim_sctcr convert_anndata. [-h] --adata ADATA --annotation_column_name ANNOTATION_COLUMN_NAME -o OUT

OPTIONS:
  -h, --help
                        [OPTIONAL]
                        show this help message and exit
  --adata ADATA
                        [REQUIRED] Type: str; No defaults
                        scRNA-Seq file readable by anndata
  --annotation_column_name ANNOTATION_COLUMN_NAME
                        [REQUIRED] Type: str; No defaults
                        Column name for cell type annotation defined in ``adata.obs``.
  -o OUT, --out OUT
                        [REQUIRED] Type: str; No defaults
                        Output file, should be in Apache Parquet or TSV format. Notice that if you want to read scRNA-Seq files in Apache Parquet format, you need to install Apache Arrow or FastParquet.

yasim_sctcr generate_tcr_cache#

generate_tcr_cache.py -- Generation of TCR Cache.

SYNOPSIS: python -m yasim_sctcr generate_tcr_cache [-h] --tcr_cdna_fa_path TCR_CDNA_FA_PATH --tcr_pep_fa_path TCR_PEP_FA_PATH -o [OUT]

OPTIONS:
  -h, --help
                        [OPTIONAL]
                        show this help message and exit
  --tcr_cdna_fa_path TCR_CDNA_FA_PATH
                        [REQUIRED] Type: str; No defaults
                        TCR cDNA FASTA path. The sequence name should conform TR[AB][VDJC].+ regular expression. Extracted one from human is available at ZENODO TODO.
  --tcr_pep_fa_path TCR_PEP_FA_PATH
                        [REQUIRED] Type: str; No defaults
                        TCR Peptide FASTA path. The sequence name should be same as `tcr_cdna_fa_path`. Extracted one from human is available at ZENODO TODO.
  -o [OUT], --out [OUT]
                        [REQUIRED] Type: str; No defaults
                        Output TCR Cache

yasim_sctcr generate_tcr_clonal_expansion#

generate_tcr_clonal_expansion.py -- Generate ground-truth TCR Contigs

SYNOPSIS: python -m yasim_sctcr generate_tcr_clonal_expansion [-h] -b [BARCODES] -i SRC_TCR_STATS_TSV -o DST_NT_FASTA [--alpha [ALPHA]]

OPTIONS:
  -h, --help
                        [OPTIONAL]
                        show this help message and exit
  -b [BARCODES], --barcodes [BARCODES]
                        [REQUIRED] Type: str; No defaults
                        Path to input barcode TXT. Can be compressed.
  -i SRC_TCR_STATS_TSV, --src_tcr_stats_tsv SRC_TCR_STATS_TSV
                        [REQUIRED] No defaults
                        Path to simulated TCR status TSV.
  -o DST_NT_FASTA, --dst_nt_fasta DST_NT_FASTA
                        [REQUIRED] No defaults
                        Output nucleotide FASTA for each barcode
  --alpha [ALPHA]
                        [OPTIONAL] Type: int; Default: 1
                        Zipf's Coefficient, larger for larger differences

yasim_sctcr generate_tcr_depth#

generate_tcr_depth.py -- Generate scTCR depth TSV without clonal expansion

SYNOPSIS: python -m yasim_sctcr generate_tcr_depth [-h] -b [BARCODES] -o [OUT] -d [DEPTH]

OPTIONS:
  -h, --help
                        [OPTIONAL]
                        show this help message and exit
  -b [BARCODES], --barcodes [BARCODES]
                        [REQUIRED] Type: str; No defaults
                        Path to input barcode TXT. Can be compressed.
  -o [OUT], --out [OUT]
                        [REQUIRED] Type: str; No defaults
                        Path to output depth TSV
  -d [DEPTH], --depth [DEPTH]
                        [REQUIRED] Type: int; No defaults
                        Simulated depth

yasim_sctcr scaffold#

scaffold.py -- Generate paired scRNA-Seq and scTCR-Seq data from configuration files or options

SYNOPSIS: python -m yasim_sctcr scaffold [-h] --transcript_gene_mapping TRANSCRIPT_GENE_MAPPING --src_sc_data SRC_SC_DATA [--n_genes N_GENES] [--mean_depth MEAN_DEPTH]
                                         --out OUT --t_cell_regex T_CELL_REGEX

OPTIONS:
  -h, --help
                        [OPTIONAL]
                        show this help message and exit
  --transcript_gene_mapping TRANSCRIPT_GENE_MAPPING
                        [REQUIRED] No defaults
                        A headless two-column TSV whose first column are transcript IDs and second column are gene IDs.
  --src_sc_data SRC_SC_DATA
                        [REQUIRED] No defaults
                        A TSV or Apache Parquet file, one cell per column, with gene names on 'FEATURE' column. The column names should contain cell type information that allows distinguish between T-cells and other cells using regular expression. Notice that if you want to read scRNA-Seq files in Apache Parquet format, you need to install Apache Arrow or FastParquet.If you're using anndata, please convert it using `convert_anndata`.
  --n_genes N_GENES
                        [OPTIONAL] Type: int; Default: 500
                        Number of genes to sample.
  --mean_depth MEAN_DEPTH
                        [OPTIONAL] Type: float; Default: 1.0
                        Mean sequencing depth.
  --out OUT
                        [REQUIRED] No defaults
                        Path to output directory.
  --t_cell_regex T_CELL_REGEX
                        [REQUIRED] No defaults
                        Regular expression for T-cells name