Modeling Noisy Time Series: Least Squares Spectral Analysis
Today we consider the problem of modeling periodic behavior in a noisy time series. This is an old problem from Astrophysics that a has taken center stage again because of it is used to detecting planets orbiting around distant stars. As usual, it is also important in modeling many other real world phenomena. We will prep the discussion with some simple examples and a brief review of traditional Fourier Analysis, and then look at the classic solution called the Lomb-Scargle Periodogram. We will look at an example using the Python AstroML toolkit . Later (next blog) we will will then look at in detail at some modern machine learning style solutions such as Basis Pursuit and Compressed Sensing. Finally, we use this approach to look at a hard problem–how to detect the onset of Discrete Scale Invariance and Log-Periodic behavior in a noisy time series. Stay tuned
Suppose that every so often we observe some werewolves and we note this down in our diary. Let us plot this data after a year of recording.
We suspect that the Werewolves only come out during a full moon. How much confidence do we have in this? Our goal is to determine if the distribution of Werewolves sightings is periodic, or if it is random.
As pointed out by my friend Mike Plotz, totally random sightings would most be governed by a Poisson process, meaning that the arrival times would be exponentially distributed. So we should plot the distribution of arrival times and check.
Although recent research in biophysics suggest that random Werewolves would actually exhibit scale invariance . We will look at this in our next post. Here, we look at a statistic that is also suitable for arrival times…
Rayleigh Power Statistic
The first thing we can do is simply check a simple, unbinned statistic for a given frequency, called the Rayleigh Power Statistic
By unbinned, we mean that we simply calculate the statistic for all data points without computing any histograms or using any kind of sampling.
What does this statistic measure? Suppose our times are uniformly distributed (i.e. werewolves come out at random). Then We can compute a confidence of periodicity by using the statistic for a uniform distribution
For this data set, S=8167 for N=885, and days, our confidence is
So we have a very high confidence that werewolves do come out every 28 days.
What if we did not know the exact frequency ? No problem, we can just plot vs and look for the peaks
So why not just do this? Where does this break down?
- does not work for binned / sampled data
- may perform poorly if the data is not evenly sampled?
- not entirely clearly how to handle multiple frequencies
Traditional Spectral Analysis
Given a signal , we are frequently interested in the power spectrum or energy signal density
where is the continuous Fourier Transform of
We usually would approximate with the discrete-time Fourier Transform (DFT)
although typically to obtain a good result we need to sample on equally spaced points
We then can now express the Periodogram
which is clearly the binned version of the Rayleigh Power Statistic, suitable for multiple frequencies.
We can express the Periodogram as the solution of the Least Squares (LS) optimization problem
Because our data is real, however, this expression is really
We see that LS problem includes a spurious term that does not depend on .
Consequently, in noisy time series, with uneven sampling, the standard Fourier approach will boost long-periodic noise. We need a different, more robust approach.
LSSA: Least Squares Spectral Analysis
What is LLSA? This approach is similar to the Periodogram above in that we formulate the problem as a LS optimization. We replace the spurious term above, however, by including an adjustable parameter in the cosine basis functions. Consider the following constrained minimization problem
where and , and we constrain so that all the frequency components are mutually orthogonal at the sample times
Rather than sample on regular time intervals, and look at all frequency components, we will use regression and / or machine learning techniques to find the optimal
We now can apply some old trigonometric identities. The constraint on can be expressed as
and we can transform the overall function minimization to a constrained linear optimization problem using the identity
This leads to
Let us now define the linear equation
This is now a linear constrained problem, normally solved by Ordinary Least Squares (OLS). We choose a quadratic loss function to illustrate, but, later, we will see we can also solve this problem as a Machine Learning optimization using an norm or any number of methods. Using OLS, we obtain
We can now express the Power Spectrum, or Periodgram, as
Without going into all the details yet, we can now express the spectral density in closed form:
where and are the empirical mean and standard deviation, resp.
I will work out the proof soon
Extensions: Weighted Least Squares Spectral Analysis
When we have the covariance of our sample data, or some other model of the noise, we can include this into the OLS problem (see ). For example, we model the covariance at each frequency as
where is the diagonal matrix
We may plug in any estimate of the covariance to the OLS problem–as long as it is not ill-conditioned–yielding the new quadratic minimization functional
This leads to an improved expression for
but the Periodgram stil has the form of an expectation value of
This should also lead to a similar, analytic expression for the Lomb-Scargle Periodgram [to come soon, see 11 and references therein]
Issues with LSSA
We note that neither FFT nor LLSA approaches are robust when the number of samples is small, when the sampling is uneven, or when the time between samples is longer than the period being sampled See . This case arises frequently in Astronomy, and needs to be handled using Machine Learning (or other) techniques such as a Maximum Likelihood Approach to the Rayleigh Power analysis (see ), or a Generalized Lomb-Scargle Periodogram 
Generalized Lomb-Scargle Periodogram: Weights and Offsets
There are several modern research papers , (more to come) discussing the success and limitations of LLSA and its variants. Here we describe the variant provided in the AstroML package (below, to come)
Example: AstroML Package
The AstroML package  is very new and is built upon the very popular Scikits Learn Machine Learning library for Python. AstroML implements both the classical Lomb Scargle Periodogram and the Generalized version . Here is an example
Here we have generated random data distributed around the function
and we compute the Periodogram for periods . We see the Periodgram is quite noisy, but there is a dominant peak at .
Can we do better?
Machine Learning Alternatives
Just a hint of whats to come. Using traditional (odl school) machine learning / Kernel SVM techniques, one can model the covariances and/or other noise by using a non-linear, kernel-SVM (i.e. ) But we note that we can also recast the quadratic optimization problem as a linear programming problem, called Basis Pursuit
or as a convex optimization, sometimes called Basis Pursuit Denoising
When would such approached be useful? The LSSA method would clearly be deficient when the OLS problem is pathological, or
is ill-conditioned. (See the Appendix of reference  for more details).
In particular, when the LSSA problem is underdetermined, then we need advanced techniques. Which leads us to the next blog, where we will discuss the modern variants of these old AstroPhysics and Geophysics methods–called Compressed Sensing. Stay Tuned.
 Chapter 5, Kernel Methods and Their Application to Structured Data
 2009 M. Zechmeister &M. K¨urster, The generalised Lomb-Scargle periodogram: A new formalism for the ﬂoating-mean and Keplerian periodograms