TradingView
Trendoscope
19 jan 2023 16:50

DataCorrelation 

Ethereum / TetherUSBinance

Beskrivning

Library "DataCorrelation"
Implementation of functions related to data correlation calculations. Formulas have been transformed in such a way that we avoid running loops and instead make use of time series to gradually build the data we need to perform calculation. This allows the calculations to run on unbound series, and/or higher number of samples

🎲 Simplifying Covariance

Original Formula
//For Sample Covₓᵧ = ∑ ((xᵢ-x̄)(yᵢ-ȳ)) / (n-1) //For Population Covₓᵧ = ∑ ((xᵢ-x̄)(yᵢ-ȳ)) / n


Now, if we look at numerator, this can be simplified as follows
∑ ((xᵢ-x̄)(yᵢ-ȳ)) => (x₁-x̄)(y₁-ȳ) + (x₂-x̄)(y₂-ȳ) + (x₃-x̄)(y₃-ȳ) ... + (xₙ-x̄)(yₙ-ȳ) => (x₁y₁ + x̄ȳ - x₁ȳ - y₁x̄) + (x₂y₂ + x̄ȳ - x₂ȳ - y₂x̄) + (x₃y₃ + x̄ȳ - x₃ȳ - y₃x̄) ... + (xₙyₙ + x̄ȳ - xₙȳ - yₙx̄) => (x₁y₁ + x₂y₂ + x₃y₃ ... + xₙyₙ) + (x̄ȳ + x̄ȳ + x̄ȳ ... + x̄ȳ) - (x₁ȳ + x₂ȳ + x₃ȳ ... xₙȳ) - (y₁x̄ + y₂x̄ + y₃x̄ + yₙx̄) => ∑xᵢyᵢ + n(x̄ȳ) - ȳ∑xᵢ - x̄∑yᵢ


So, overall formula can be simplified to be used in pine as
//For Sample Covₓᵧ = (∑xᵢyᵢ + n(x̄ȳ) - ȳ∑xᵢ - x̄∑yᵢ) / (n-1) //For Population Covₓᵧ = (∑xᵢyᵢ + n(x̄ȳ) - ȳ∑xᵢ - x̄∑yᵢ) / n


🎲 Simplifying Standard Deviation

Original Formula
//For Sample σ = √(∑(xᵢ-x̄)² / (n-1)) //For Population σ = √(∑(xᵢ-x̄)² / n)


Now, if we look at numerator within square root
∑(xᵢ-x̄)² => (x₁² + x̄² - 2x₁x̄) + (x₂² + x̄² - 2x₂x̄) + (x₃² + x̄² - 2x₃x̄) ... + (xₙ² + x̄² - 2xₙx̄) => (x₁² + x₂² + x₃² ... + xₙ²) + (x̄² + x̄² + x̄² ... + x̄²) - (2x₁x̄ + 2x₂x̄ + 2x₃x̄ ... + 2xₙx̄) => ∑xᵢ² + nx̄² - 2x̄∑xᵢ => ∑xᵢ² + x̄(nx̄ - 2∑xᵢ)


So, overall formula can be simplified to be used in pine as
//For Sample σ = √(∑xᵢ² + x̄(nx̄ - 2∑xᵢ) / (n-1)) //For Population σ = √(∑xᵢ² + x̄(nx̄ - 2∑xᵢ) / n)


🎲 Using BinaryInsertionSort library

Chatterjee Correlation and Spearman Correlation functions make use of BinaryInsertionSort library to speed up sorting. The library in turn implements mechanism to insert values into sorted order so that load on sorting is reduced by higher extent allowing the functions to work on higher sample size.

🎲 Function Documentation

chatterjeeCorrelation(x, y, sampleSize, plotSize)
  Calculates chatterjee correlation between two series. Formula is - ξnₓᵧ = 1 - (3 * ∑ |rᵢ₊₁ - rᵢ|)/ (n²-1)
  Parameters:
    x: First series for which correlation need to be calculated
    y: Second series for which correlation need to be calculated
    sampleSize: number of samples to be considered for calculattion of correlation. Default is 20000
    plotSize: How many historical values need to be plotted on chart.
  Returns: float correlation - Chatterjee correlation value if falls within plotSize, else returns na

spearmanCorrelation(x, y, sampleSize, plotSize)
  Calculates spearman correlation between two series. Formula is - ρ = 1 - (6∑dᵢ²/n(n²-1))
  Parameters:
    x: First series for which correlation need to be calculated
    y: Second series for which correlation need to be calculated
    sampleSize: number of samples to be considered for calculattion of correlation. Default is 20000
    plotSize: How many historical values need to be plotted on chart.
  Returns: float correlation - Spearman correlation value if falls within plotSize, else returns na

covariance(x, y, include, biased)
  Calculates covariance between two series of unbound length. Formula is Covₓᵧ = ∑ ((xᵢ-x̄)(yᵢ-ȳ)) / (n-1) for sample and Covₓᵧ = ∑ ((xᵢ-x̄)(yᵢ-ȳ)) / n for population
  Parameters:
    x: First series for which covariance need to be calculated
    y: Second series for which covariance need to be calculated
    include: boolean flag used for selectively including sample
    biased: boolean flag representing population covariance instead of sample covariance
  Returns: float covariance - covariance of selective samples of two series x, y

stddev(x, include, biased)
  Calculates Standard Deviation of a series. Formula is σ = √( ∑(xᵢ-x̄)² / n ) for sample and σ = √( ∑(xᵢ-x̄)² / (n-1) ) for population
  Parameters:
    x: Series for which Standard Deviation need to be calculated
    include: boolean flag used for selectively including sample
    biased: boolean flag representing population covariance instead of sample covariance
  Returns: float stddev - standard deviation of selective samples of series x

correlation(x, y, include)
  Calculates pearson correlation between two series of unbound length. Formula is r = Covₓᵧ / σₓσᵧ
  Parameters:
    x: First series for which correlation need to be calculated
    y: Second series for which correlation need to be calculated
    include: boolean flag used for selectively including sample
  Returns: float correlation - correlation between selective samples of two series x, y
Kommentarer
HALDRO
Hey man, are you a prodigy? your work so smart
Trendoscope
@HALDRO, I am too old to be prodigy!! Just learning and sharing whatever I can :)
akinloluojo1
🔥
What other scripts have you written specifically for better strategies in filtering out false positive
Trendoscope
@akinloluojo1, Thanks for the comment. I am afraid I haven't explored this much and only beginning my journey in this subject.
akinloluojo1
@HeWhoMustNotBeNamed, ok. I will keep in touch as always!
Thanks too for all your work
RomeoNovemberRN
I see nothing
Trendoscope
@RomeoNovemberRN, it's a library. Not an indicator :)
ConConMcG
Off the charts in terms of maths and converting it into code, but I would love to see you work more on Trading Strategies and finding profitable systems. (I don't mean to be disrespectful as your work is incredible, but if I had a fraction of the brainpower you possess I would be hammering pinescript with trading strategies for highest probability trades?)
Trendoscope
@conormcgurk1, It is all part of the game. These functions can play important role in confirming the defined edge and removing false positives.
Mer