TODOs
Regularized isotonic regression solver, square loss and penalized
(FPOP) parameterization.
2017.07.12
cast to double in PDPA as well.
2017.07.11
cast to double before taking log, to avoid compile error on solaris
(in PeakSegFPOP).
instead of printf, use Rprintf from R.h, to avoid CRAN check NOTE.
2017.06.20
Register native routines.
removed interval regression code, added dependency on penaltyLearning.
removed problems.R, which is now in the PeakSegPipeline pkg.
2017.04.17
PeakSegPDPAInf which always uses Inf as upper limit of intervals,
instead of max(data). This results in more intervals and is thus
slower than PeakSegPDPA by a constant factor.
2017.03.31
Import data.table >= 1.9.8 (from CRAN not GitHub).
2017.02.27
PREV_NOT_SET in min-less since set_prev_seg_end is always called after.
clarify docs, example shows how to recover solution in terms of (M,C)
variables.
mass.spec test case and bugfix.
problem.target now considers infeasible models (before, we assigned
infeasible models infinite cost).
problem.predict now allows predicting infeasible models (before, we
took the closest model inside the target interval). Now the target
interval is not used at all during prediction.
since model training now includes fitting a Gaussian model to the
log(bases) values of all peaks which are perfectly predicted in the
training data, we use that model during prediction to filter all peaks
outside of the 95% confidence interval.
2016.11.01
no longer printf when more than NEWTON_STEPS in root finding C++ code.
New functions oracleModelComplexity, PPN.cores, mclapply.or.stop,
problem.predict.allSamples
simplify arguments of problem.* functions.
more robust checking of penalty_segments.bed in problem.PeakSegFPOP
problem.coverage uses coverage.bigwig if it is present
2016.10.16
Many fixes for min-less/more/env bugs which were encountered while
running the PeakSegFPOP command line program on large data sets (test
cases in test-cosegData.R).
problems.R contains problem.* functions which run the PeakSegFPOP
command line program and parse its output (compute the target
interval, features, predicted peaks).
2016.09.26
Bugfix for min-less: only add a constant piece at the end if we need
to. New simulated data set in test-simulation.R which was crashing
before the bugfix.
2016.09.25
Bugfix in C++ Minimize function which sometimes returned infinite cost
when the minimum is exactly on the interval border.
2016.09.16
New test-simulation.R which previously was failing the run-time
min-more check. Fixed by not adding an interval at the end of
min-more, if we end on a degenerate linear function.
New H3K4me3_PGP_immune_chunk24 data set, one of the samples for which
the PeakSegPDPA solution with 13 segments was not as likely as the
PeakSegDP solution. I figured out that this was being caused by a bug
in the "not equal on the sides, zero crossing points" code. When we
compare the cost of the first interval (with min_log_mean at -Inf), we
still use the cost at the middle, but now we compute the middle in the
original space (so the smaller interval limit is at least 0, never
-Inf).
2016.08.06
Limited memory FPOP version at https://github.com/tdhock/PeakSegFPOP
To avoid root finding problems in large data sets, we now store the
mean cost rather than total cost in the C++ code. The R functions
still report the total cost.
2016.08.02
PeakSegFPOPall C++ function which can start or end either up or down.
PeakSegFPOPLog C++ function forces starts and ends down. This is used
by the R function PeakSegFPOP.
Bugfix in min_more computation for intersections between constant cost
and degenerate linear cost.
Bugfix in decoding: we no longer call Minimize for segments other than
the last one. Now we store the previous segment mean (which was
already computed by the min-less/more operator). No more bool
equality_constraint_active, instead test prev_seg_mean == INFINITY.
2016.07.13
FPOP version of PeakSeg constrained, Poisson loss problem.
2016.07.12
For numerical stability, the PeakSegPDPA intervals are in now the log
space. That means if the data min is 0, then there will be an interval
with -Inf for its lower limit. Smaller root finding is in the
f(m)=Linear*e^m+Log*m+Constant space, and larger root finding is in
the g(x)=Linear*x+Log*log(x)+Constant space.
2016.07.06
The Lambert W root finding method was numerically unstable, for
example the roots of the cost function -628*x+log(x)-776.140660 could
not be computed correctly since exp(-776) is exactly zero using
doubles. Instead I implemented a bound-constrained Newton root finding
method which seems to be more stable.
2016.07.03
Some bugs fixed in the min env and min less computations.
2016.06.18
First working C++ solver for Segment Neighborhood problem with Poisson
Loss and PeakSeg constraint (PeakSegPDPA function).