tfclass_predict.predictor ========================= .. py:module:: tfclass_predict.predictor Classes ------- .. autoapisummary:: tfclass_predict.predictor.ClassAUC tfclass_predict.predictor.Predictor Module Contents --------------- .. py:class:: ClassAUC(name='ClassAUC', **kwargs) Bases: :py:obj:`tensorflow.metrics.AUC` Metric used in training steps - needs to be kept for model usage. .. py:class:: Predictor(bed_data, tokenizer, model_path, genome_file) Predictor class takes care about the prediction / model execution. .. py:attribute:: tokenizer .. py:attribute:: bed_data .. py:attribute:: SequenceProcessor .. py:attribute:: model .. py:method:: _init_model(model_path) Load the TFClass model. Initializes the TFBert model. :param model_path: Path to TFClass model. :return: Initialized TFClass model. .. py:method:: predict_bed_data(subseq_length, stride_length, batch_size) Processes genomic sequences from the bed_data DataFrame, extracts subsequences, converts them into tokenized k-mers, and uses the TFClass model to make predictions on these sequences. The predictions are aggregated and associated with their corresponding sequence indices. Workflow: 1. Initializes lists to store aggregated predictions and their corresponding sequence indices. 2. Iterates over each row in the bed_data DataFrame. 3. For each row: - Extracts the genomic sequence based on 'seqnames', 'start', and 'end' with a desired length of 150. - Skips sequences that are empty or shorter than the desired length. - Generates subsequences from the full sequence. - Converts each subsequence into k-mers and then tokenizes them. - Accumulates tokenized sequences until the batch size is reached. - Uses a machine learning model to make predictions on the batch of tokenized sequences. - Stores the predictions and their corresponding indices in the aggregated lists. 4. Processes any remaining sequences that did not form a complete batch. :param subseq_length: Length in which a read should be split into subsequences (i.e. K-mer size). :param stride_length: Defines the number of basepairs a window will move in the next step. (=1 sliding window, =subseq_length k_mer splits) :param batch_size: Number of intervals that should be processed in one batch. :return: