• About
  • Documentation

  • More Universes
  • Recent Updates
  • Leader board

  • All repositories
  • All packages
  • All articles
  • All datasets
  • All system Libraries
tdhock
  • Builds
  • Packages
  • Articles
  • Datasets
  • Contribution
  • Badges
  • API
  • Feed

Links totdhock

directlabels - Direct Labels for Multicolor Plots

An extensible framework for automatically placing direct labels onto multicolor 'lattice' or 'ggplot2' plots. Label positions are described using Positioning Methods which can be re-used across several different plots. There are heuristics for examining "trellis" and "ggplot" objects and inferring an appropriate Positioning Method.

Last updated

11.66 score 88 stars 18 dependents 2.2k scripts 12k downloads

animint2 - Animated Interactive Grammar of Graphics

Functions are provided for defining animated, interactive data visualizations in R code, and rendering on a web page. The 2018 Journal of Computational and Graphical Statistics paper, <doi:10.1080/10618600.2018.1513367> describes the concepts implemented.

Last updated

9.61 score 76 stars 344 scripts 453 downloads

atime - Asymptotic Timing

Computing and visualizing comparative asymptotic timings of different algorithms and code versions. Also includes functionality for comparing empirical timings with expected references such as linear or quadratic, <https://en.wikipedia.org/wiki/Asymptotic_computational_complexity> Also includes functionality for measuring asymptotic memory and other quantities.

Last updated

7.15 score 9 stars 110 scripts 718 downloads

nc - Named Capture to Data Tables

User-friendly functions for extracting a data table (row for each match, column for each group) from non-tabular text data using regular expressions, and for melting columns that match a regular expression. Patterns are defined using a readable syntax that makes it easy to build complex patterns in terms of simpler, re-usable sub-patterns. Named R arguments are translated to column names in the output; capture groups without names are used internally in order to provide a standard interface to three regular expression 'C' libraries ('PCRE', 'RE2', 'ICU'). Output can also include numeric columns via user-specified type conversion functions.

Last updated

6.93 score 19 stars 1 dependents 62 scripts 693 downloads

mlr3resampling - Resampling Algorithms for 'mlr3' Framework

A supervised learning algorithm inputs a train set, and outputs a prediction function, which can be used on a test set. If each data point belongs to a subset (such as geographic region, year, etc), then how do we know if subsets are similar enough so that we can get accurate predictions on one subset, after training on Other subsets? And how do we know if training on All subsets would improve prediction accuracy, relative to training on the Same subset? SOAK, Same/Other/All K-fold cross-validation, <doi:10.1002/sam.70055> can be used to answer these questions, by fixing a test subset, training models on Same/Other/All subsets, and then comparing test error rates (Same versus Other and Same versus All). Also provides code for estimating how many train samples are required to get accurate predictions on a test set.

Last updated

cpp

6.38 score 6 stars 7 scripts 353 downloads

penaltyLearning - Penalty Learning

Implementations of algorithms from Learning Sparse Penalties for Change-point Detection using Max Margin Interval Regression, by Hocking, Rigaill, Vert, Bach <http://proceedings.mlr.press/v28/hocking13.html> published in proceedings of ICML2013.

Last updated

cpp

6.06 score 16 stars 2 dependents 119 scripts 892 downloads

WeightedROC - Fast, Weighted ROC Curves

Fast computation of Receiver Operating Characteristic (ROC) curves and Area Under the Curve (AUC) for weighted binary classification problems (weights are example-specific cost values).

Last updated

5.87 score 27 stars 136 scripts 463 downloads

PeakSegDisk - Disk-Based Constrained Change-Point Detection

Disk-based implementation of Functional Pruning Optimal Partitioning with up-down constraints <doi:10.18637/jss.v101.i10> for single-sample peak calling (independently for each sample and genomic problem), can handle huge data sets (10^7 or more).

Last updated

cpp

4.66 score 4 stars 38 scripts 673 downloads

binsegRcpp - Efficient Implementation of Binary Segmentation

Standard template library containers are used to implement an efficient binary segmentation algorithm, which is log-linear on average and quadratic in the worst case.

Last updated

cpp

4.26 score 7 stars 13 scripts 872 downloads

FLOPART - Functional Labeled Optimal Partitioning

Provides an efficient 'C++' code for computing an optimal segmentation model with Poisson loss, up-down constraints, and label constraints, as described by Kaufman et al. (2024) <doi:10.1080/10618600.2023.2293216>.

Last updated

cpp

3.70 score 3 scripts 555 downloads

PeakSegOptimal - Optimal Segmentation Subject to Up-Down Constraints

Computes optimal changepoint models using the Poisson likelihood for non-negative count data, subject to the PeakSeg constraint: the first change must be up, second change down, third change up, etc. For more info about the models and algorithms, read "Constrained Dynamic Programming and Supervised Penalty Learning Algorithms for Peak Detection" <https://jmlr.org/papers/v21/18-843.html> by TD Hocking et al.

Last updated

cpp

3.65 score 6 stars 37 scripts 287 downloads

inlinedocs - Convert Inline Comments to Documentation

Generates Rd files from R source code with comments. The main features of the default syntax are that (1) docs are defined in comments near the relevant code, (2) function argument names are not repeated in comments, and (3) examples are defined in R code, not comments. It is also easy to define a new syntax.

Last updated

3.57 score 2 stars 48 scripts 3.8k downloads

PeakSegJoint - Joint Peak Detection in Several ChIP-Seq Samples

Jointly segment several ChIP-seq samples to find the peaks which are the same and different across samples. The fast approximate maximum Poisson likelihood algorithm is described in "PeakSegJoint: fast supervised peak detection via joint segmentation of multiple count data samples" <doi:10.48550/arXiv.1506.01286> by TD Hocking and G Bourque.

Last updated

3.51 score 5 stars 65 scripts 711 downloads

plotHMM - Plot Hidden Markov Models

Hidden Markov Models are useful for modeling sequential data. This package provides several functions implemented in C++ for explaining the algorithms used for Hidden Markov Models (forward, backward, decoding, learning).

Last updated

cpp

3.30 score 4 scripts 171 downloads

aum - Area Under Minimum of False Positives and Negatives

Efficient algorithms <https://jmlr.org/papers/v24/21-0751.html> for computing Area Under Minimum, directional derivatives, and line search optimization of a linear model, with objective defined as either max Area Under the Curve or min Area Under Minimum.

Last updated

cpp

3.30 score 2 stars 3 scripts 624 downloads

PeakError - Compute the Label Error of Peak Calls

Chromatin immunoprecipitation DNA sequencing results in genomic tracks that show enriched regions or peaks where proteins are bound. This package implements fast C code that computes the true and false positives with respect to a database of annotated region labels.

Last updated

3.28 score 4 stars 1 dependents 16 scripts 236 downloads

neuroblastoma - Neuroblastoma Copy Number Profiles

Annotated neuroblastoma copy number profiles, a benchmark data set for change-point detection algorithms, as described by Hocking et al. <doi:10.1186/1471-2105-14-164>.

Last updated

2.70 score 1 stars 2 scripts 237 downloads

slurm - Running and Parsing Slurm Commands

User-friendly functions which parse output of command line programs used to query Slurm. Morris A. Jette and Tim Wickberg (2023) <doi:10.1007/978-3-031-43943-8_1> describe Slurm in detail.

Last updated

2.60 score 2 stars 7 scripts 545 downloads

PeakSegDP - Dynamic Programming Algorithm for Peak Detection in ChIP-Seq Data

A quadratic time dynamic programming algorithm can be used to compute an approximate solution to the problem of finding the most likely changepoints with respect to the Poisson likelihood, subject to a constraint on the number of segments, and the changes which must alternate: up, down, up, down, etc. For more info read <http://proceedings.mlr.press/v37/hocking15.html> "PeakSeg: constrained optimal segmentation and supervised penalty learning for peak detection in count data" by TD Hocking et al, proceedings of ICML2015.

Last updated

2.40 score 25 scripts 747 downloads

LOPART - Labeled Optimal Partitioning

Change-point detection algorithm with label constraints and a penalty for each change outside of labels. Read TD Hocking, A Srivastava (2023) <doi:10.1007/s00180-022-01238-z> for details.

Last updated

cpp

2.00 score 1 stars 2 scripts 207 downloads