How to use the CLI

XspecT comes with a built-in command line interface (CLI), which enables quick classifications without the need to use the web interface. The command line interface can also be used to download and train models.

After installing XspecT, a list of available commands can be viewed by running:

xspect --help

In general, XspecT commands will prompt you for parameters if they are not provided. However, you can also provide them directly in the command line, for example when using scripts or tools such as Slurm. Simply run the command with the --help option to see all available parameters.

Model Management

At its core, XspecT uses models to classify and filter samples. These models are based on kmer indices trained on publicly available genomes as well as, possibly, a support vector machine (SVM) classifier.

To manage models, the xspect models command can be used. This command allows you to download, train, and view available models.

Viewing Available Models

To view a list of available models, run:

xspect models list

This will show a list of all available models, separated by their type (species, genus, MLST).

Downloading Models

To download a basic set of pre-trained models (Acinetobacter and Salonella), run:

xspect models download

Model Training

Models can be trained based on data from NCBI, which is automatically downloaded and processed by XspecT.

To train a model with NCBI data, run the following command:

xspect models train ncbi

If you would like to train models with manually curated data from a directory, you can use:

xspect models train directory

Your directory should have the following structure:

your-directory/
├── cobs
│   ├── species1
│   │   ├── genome1.fna
│   │   ├── genome2.fna
│   │   └── ...
│   ├── species2
│   │   ├── genome1.fna
│   │   ├── genome2.fna
│   │   └── ...
│   └── ...
├── svm
│   ├── species1
│   │   ├── genome1.fna
│   │   ├── genome2.fna
│   │   └── ...
│   ├── species2
│   │   ├── genome1.fna
│   │   ├── genome2.fna
│   │   └── ...
│   └── ...

To train models for MLST classifications, run:

xspect models train mlst

Classification

To classify samples, the command xspect classify can be used. This command will classify the sample based on the models available in your XspecT installation.

Genus Classification

To classify a sample based on its genus, run the following command:

xspect classify genus

XspecT will prompt you for the genus and path to your sample directory.

Species Classification

To classify a sample based on its species, run the following command:

xspect classify species

XspecT will prompt you for the genus and path to your sample directory.

Sparse Sampling

XspecT uses a kmer-based approach to classify samples. This means that the entire sample is analyzed, which can be time-consuming for large samples. To speed up the analysis, you can use the --sparse-sampling-step option to only consider every nth kmer:

Example:

xspect classify species --sparse-sampling-step 10 Acinetobacter path

This will only consider every 10th kmer in the sample.

MLST Classification

Samples can also be classified based on Multi-locus sequence type schemas. To MLST-classify a sample, run:

xspect classify mlst

Filtering

XspecT can also be used to filter samples based on their classification results. This is useful when analyzing metagenomic samples, for example when looking at genomic bycatch.

To filter samples, the command xspect filter can be used. This command will filter the samples based on the specified criteria.

Filtering by Genus

To filter samples by genus, run the following command:

xspect filter genus

XspecT will prompt you for the genus and path to your sample directory, as well as for a threshold to use for filtering.

Filtering by Species

To filter samples by species, run the following command:

xspect filter species

You will be prompted for the genus and path to your sample directory, as well for the species to filter by and for a threshold to use for filtering. Next to normal threshold-based filtering, you can also enter a threshold of -1 to only include contigs if the selected species is the maximum scoring species.