Command-Line Interfaces#
yasim_sc
#
yasim_sc
art
#
art.py -- Single-Cell LLRG adapter for ART, a NGS DNA-Seq simulator
SYNOPSIS: python -m yasim_sc art [-h] -F [FASTAS] [-j [JOBS]] [--simulator_name [SIMULATOR_NAME]] [-e [LLRG_EXECUTABLE_PATH]] -d [DEPTH] -o [OUT]
[--sequencer_name [{GA1,GA2,HS10,HS20,HS25,HSXn,HSXt,MinS,MSv1,MSv3,NS50}]] [--read_length [READ_LENGTH]]
[--pair_end_fragment_length_mean [PAIR_END_FRAGMENT_LENGTH_MEAN]] [--pair_end_fragment_length_std [PAIR_END_FRAGMENT_LENGTH_STD]]
[--is_pair_end] [--preserve_intermediate_files]
OPTIONS:
-h, --help
[OPTIONAL]
show this help message and exit
-F [FASTAS], --fastas [FASTAS]
[REQUIRED] Type: str; No defaults
Directory of transcribed cDNA sequences in FASTA format from `transcribe` step
-j [JOBS], --jobs [JOBS]
[OPTIONAL] Type: int; Default: 36
Number of LLRGs to be executed in parallel
--simulator_name [SIMULATOR_NAME]
[OPTIONAL] Type: str; Default: None
Custom simulator name. Used in FASTQ tags
This step is done in assemble, so if --not_perform_assemble is set, this option would be useless.
-e [LLRG_EXECUTABLE_PATH], --llrg_executable_path [LLRG_EXECUTABLE_PATH]
[OPTIONAL] Type: str; Default: art_illumina
Executable name or absolute path of art
-d [DEPTH], --depth [DEPTH]
[REQUIRED] Type: str; No defaults
Path to Isoform-Level Depth directory generated by `generate_barcoded_isoform_replicates`
-o [OUT], --out [OUT]
[REQUIRED] Type: str; No defaults
Path to output Directory.
--sequencer_name [{GA1,GA2,HS10,HS20,HS25,HSXn,HSXt,MinS,MSv1,MSv3,NS50}]
[OPTIONAL] Type: str; Default: HS25
Name of Illumina Sequencer to Simulate: GA1 -- GenomeAnalyzer I, GA2 -- GenomeAnalyzer II, HS10 -- HiSeq 1000, HS20 -- HiSeq 2000, HS25 -- HiSeq 2500, HSXn -- HiSeqX PCR free, HSXt -- HiSeqX TruSeq, MinS -- MiniSeq TruSeq, MSv1 -- MiSeq v1, MSv3 -- MSv3 - MiSeq v3, NS50 -- NextSeq500 v2
CHOICES:
GA1
GA2
HS10
HS20
HS25
HSXn
HSXt
MinS
MSv1
MSv3
NS50
--read_length [READ_LENGTH]
[OPTIONAL] Type: int; Default: 0
Read length. Sequencer -- Read Length Table: GenomeAnalyzer I -- [36, 44], GenomeAnalyzer II -- [50, 75], HiSeq 1000 -- [100], HiSeq 2000 -- [100], HiSeq 2500 -- [125, 150], HiSeqX PCR free -- [150], HiSeqX TruSeq -- [150], MiniSeq TruSeq -- [50], MiSeq v1 -- [250], MSv3 - MiSeq v3 -- [250], NextSeq500 v2 -- [75]
--pair_end_fragment_length_mean [PAIR_END_FRAGMENT_LENGTH_MEAN]
[OPTIONAL] Type: int; Default: 0
[PE Only] The mean size of DNA/RNA fragments for paired-end simulations
--pair_end_fragment_length_std [PAIR_END_FRAGMENT_LENGTH_STD]
[OPTIONAL] Type: int; Default: 0
[PE Only] The standard deviation of DNA/RNA fragment size for paired-end simulations.
--is_pair_end
[OPTIONAL] Default: False
Whether to use Pair End (PE) Simulation
--preserve_intermediate_files
[OPTIONAL] Default: False
Do not remove intermediate files.
yasim_sctcr
#
yasim_sctcr
convert_anndata
#
convert_anndata -- Convert Scanpy AnnData object to TSV/Apache Parquet for ``scaffold``
SYNOPSIS: python -m yasim_sctcr convert_anndata. [-h] --adata ADATA --annotation_column_name ANNOTATION_COLUMN_NAME -o OUT
OPTIONS:
-h, --help
[OPTIONAL]
show this help message and exit
--adata ADATA
[REQUIRED] Type: str; No defaults
scRNA-Seq file readable by anndata
--annotation_column_name ANNOTATION_COLUMN_NAME
[REQUIRED] Type: str; No defaults
Column name for cell type annotation defined in ``adata.obs``.
-o OUT, --out OUT
[REQUIRED] Type: str; No defaults
Output file, should be in Apache Parquet or TSV format. Notice that if you want to read scRNA-Seq files in Apache Parquet format, you need to install Apache Arrow or FastParquet.
yasim_sctcr
generate_tcr_cache
#
generate_tcr_cache.py -- Generation of TCR Cache.
SYNOPSIS: python -m yasim_sctcr generate_tcr_cache [-h] --tcr_cdna_fa_path TCR_CDNA_FA_PATH --tcr_pep_fa_path TCR_PEP_FA_PATH -o [OUT]
OPTIONS:
-h, --help
[OPTIONAL]
show this help message and exit
--tcr_cdna_fa_path TCR_CDNA_FA_PATH
[REQUIRED] Type: str; No defaults
TCR cDNA FASTA path. The sequence name should conform TR[AB][VDJC].+ regular expression. Extracted one from human is available at ZENODO TODO.
--tcr_pep_fa_path TCR_PEP_FA_PATH
[REQUIRED] Type: str; No defaults
TCR Peptide FASTA path. The sequence name should be same as `tcr_cdna_fa_path`. Extracted one from human is available at ZENODO TODO.
-o [OUT], --out [OUT]
[REQUIRED] Type: str; No defaults
Output TCR Cache
yasim_sctcr
generate_tcr_clonal_expansion
#
generate_tcr_clonal_expansion.py -- Generate ground-truth TCR Contigs
SYNOPSIS: python -m yasim_sctcr generate_tcr_clonal_expansion [-h] -b [BARCODES] -i SRC_TCR_STATS_TSV -o DST_NT_FASTA [--alpha [ALPHA]]
OPTIONS:
-h, --help
[OPTIONAL]
show this help message and exit
-b [BARCODES], --barcodes [BARCODES]
[REQUIRED] Type: str; No defaults
Path to input barcode TXT. Can be compressed.
-i SRC_TCR_STATS_TSV, --src_tcr_stats_tsv SRC_TCR_STATS_TSV
[REQUIRED] No defaults
Path to simulated TCR status TSV.
-o DST_NT_FASTA, --dst_nt_fasta DST_NT_FASTA
[REQUIRED] No defaults
Output nucleotide FASTA for each barcode
--alpha [ALPHA]
[OPTIONAL] Type: int; Default: 1
Zipf's Coefficient, larger for larger differences
yasim_sctcr
generate_tcr_depth
#
generate_tcr_depth.py -- Generate scTCR depth TSV without clonal expansion
SYNOPSIS: python -m yasim_sctcr generate_tcr_depth [-h] -b [BARCODES] -o [OUT] -d [DEPTH]
OPTIONS:
-h, --help
[OPTIONAL]
show this help message and exit
-b [BARCODES], --barcodes [BARCODES]
[REQUIRED] Type: str; No defaults
Path to input barcode TXT. Can be compressed.
-o [OUT], --out [OUT]
[REQUIRED] Type: str; No defaults
Path to output depth TSV
-d [DEPTH], --depth [DEPTH]
[REQUIRED] Type: int; No defaults
Simulated depth
yasim_sctcr
scaffold
#
scaffold.py -- Generate paired scRNA-Seq and scTCR-Seq data from configuration files or options
SYNOPSIS: python -m yasim_sctcr scaffold [-h] --transcript_gene_mapping TRANSCRIPT_GENE_MAPPING --src_sc_data SRC_SC_DATA [--n_genes N_GENES] [--mean_depth MEAN_DEPTH]
--out OUT --t_cell_regex T_CELL_REGEX
OPTIONS:
-h, --help
[OPTIONAL]
show this help message and exit
--transcript_gene_mapping TRANSCRIPT_GENE_MAPPING
[REQUIRED] No defaults
A headless two-column TSV whose first column are transcript IDs and second column are gene IDs.
--src_sc_data SRC_SC_DATA
[REQUIRED] No defaults
A TSV or Apache Parquet file, one cell per column, with gene names on 'FEATURE' column. The column names should contain cell type information that allows distinguish between T-cells and other cells using regular expression. Notice that if you want to read scRNA-Seq files in Apache Parquet format, you need to install Apache Arrow or FastParquet.If you're using anndata, please convert it using `convert_anndata`.
--n_genes N_GENES
[OPTIONAL] Type: int; Default: 500
Number of genes to sample.
--mean_depth MEAN_DEPTH
[OPTIONAL] Type: float; Default: 1.0
Mean sequencing depth.
--out OUT
[REQUIRED] No defaults
Path to output directory.
--t_cell_regex T_CELL_REGEX
[REQUIRED] No defaults
Regular expression for T-cells name