Machine learning for algorithmic trading

  • Status: Closed
  • Prize: $250
  • Entries Received: 11
  • Winner: kebatex

Contest Brief

Develop a ML/AI model to predict result of trades. Value to be predicted shown in training data first column "result" (response variable). Underlying data and volatility shown in separate files. The underlying and volatility data are two time series to be used to predict the result. (You can use any other market data that you feel fit as well)

R2 is to be computed as leave-one-out cross validation (k-fold cross validation with number of folds equal to number of values). Please post this value to your entry (highest one is the winner). Please include code and results in your pdf file clearly showing your method, the leave-one-out prediction result for each value in the "results" column of the training set, your feature generation method (if applicable) and the overall R2. I should be able to replicate your method and verify your results from your document.

Data to be used for prediction is time-series in nature (as showing in the two underlying files), so CANNOT use any data after trade date (shown as start date column in training data) for prediction. Entries using any data from dates after trade date are not eligible.

Those with strong entries may be contracted for additional tasks.

Prize guaranteed.

Recommended Skills

Top entries from this contest

View More Entries

Public Clarification Board

  • ashikmohann
    ashikmohann
    • 1 year ago

    #extended #extended #extended #extended

    • 1 year ago
  • Bouidad
    Bouidad
    • 1 year ago

    #extended #extended #extended #extended #extended #extended

    • 1 year ago
  • casras
    casras
    • 1 year ago

    There are 2 different entries for same date 03/04/2016 in the train file.

    • 1 year ago
    1. casras
      casras
      • 1 year ago

      There are 9 dates with this problem. Used the first entry for all in the entry I submitted. I think the contest as it is is not well defined and encourages overfit solutions which aren't useful. Feel free to contact me after the contest if you want more details and what I'd propose for something more useful.

      • 1 year ago
    2. ts111
      Contest Holder
      • 1 year ago

      The contest states that the measure should be a leave-one-out R2 (or a k-fold method). This takes care of the overfitting issue. If there is any clarification on this metric, please contact me and I can explain more.

      • 1 year ago
  • casras
    casras
    • 1 year ago

    May you expand on the reason for choosing R2 as the contest metric as opposed to some more traditional trading metrics like total return, sharpe, drawdown?

    • 1 year ago
    1. ts111
      Contest Holder
      • 1 year ago

      The aim is not to evaluate trading strategies but to predict returns from one.6

      • 1 year ago
  • vijaykrishna0497
    vijaykrishna0497
    • 1 year ago

    Hi, I will make my entry soon. You can have a look on my profile also. Thank you.

    • 1 year ago
    1. ts111
      Contest Holder
      • 1 year ago

      Good luck

      • 1 year ago
  • wejdansolutions
    wejdansolutions
    • 1 year ago

    Dear sir, Thank you for posting the contest. May I confirmed if 1) the dates in all three excel files correspond to each other ie 7 October 2017 in excel 1,2,3 are one same day? 2) I understand one of the csv file is the price of share , how about the excel file name "underlying"? Does it indicate Dow Jones Indices volume? There's a market crash in 1987 and I wanted to know what does the attribute represents.

    • 1 year ago
    1. ts111
      Contest Holder
      • 1 year ago

      1. Dates all correspond, yes. 2. Underlying is the price of the share, underlying vol is the volatility and the data in training set it the the "result" of the trade that needs to be predicted.

      • 1 year ago
  • zizopixels
    zizopixels
    • 1 year ago

    Any chance for RL enteries?

    • 1 year ago
    1. zizopixels
      zizopixels
      • 1 year ago

      I can make it work but its metric/benchmark is different then r2.
      It computes stability, return, volatility, omega, sharpe, calmar and sortino ratio and mean as benchmarks.

      • 1 year ago
    2. ts111
      Contest Holder
      • 1 year ago

      I see, I won't be able to compare it apples-to-apples with the other entries then.

      • 1 year ago
  • Bouidad
    Bouidad
    • 1 year ago

    Could you explain more this part please? Data is time-series in nature, so CANNOT use data after trade date (shown as start date column in training data) for training/features/prediction. Entries using any data from dates after trade date are not eligible.

    • 1 year ago
    1. Bouidad
      Bouidad
      • 1 year ago

      Thank you, one last question please: what is the difference between Open, High Low, and Close columns of Underlying.csv and those of 'Underlying_vol.csv'?

      • 1 year ago
    2. ts111
      Contest Holder
      • 1 year ago

      Those are the market open, market close etc. values for each day. You can use just the 'close' column if you want.

      • 1 year ago
  • palthode
    palthode
    • 1 year ago

    Kindly make the contest #guaranteed

    • 1 year ago
    1. ts111
      Contest Holder
      • 1 year ago

      Prize guaranteed as long as valid entry received and contest will be updated to reflect so.

      • 1 year ago
    2. ts111
      Contest Holder
      • 1 year ago

      Guaranteed now

      • 1 year ago
  • Zubayerskd
    Zubayerskd
    • 1 year ago

    The data in underlying_vol and underlying is different though the date is same! Can you explain it?

    • 1 year ago
    1. ts111
      Contest Holder
      • 1 year ago

      They are two different time series (only for underlying and one for volatility). Both can be used for features.

      • 1 year ago
  • waqarrao498
    waqarrao498
    • 1 year ago

    CANNOT use data after trade date?? Kindly tell us the exact trade date ?

    • 1 year ago
    1. ts111
      Contest Holder
      • 1 year ago

      Each prediction is for a particular date (as shown in the training set). The time series' to use for prediction are in the other two files. While using the time series, you can not use any data after the trade date.

      • 1 year ago
  • Zubayerskd
    Zubayerskd
    • 1 year ago

    What to predict and which data will be given to predict? What's the result label in Training_set csv? Please explain us better!

    • 1 year ago
    1. ts111
      Contest Holder
      • 1 year ago

      Response variable is "result" in the training_set.csv -- this is what is to be predicted. The other two files contain the data to be used for training.

      • 1 year ago
  • Zubayerskd
    Zubayerskd
    • 1 year ago

    I don't know anything about tread! What is the result of a tread?

    • 1 year ago
    1. ts111
      Contest Holder
      • 1 year ago

      what is a tread?

      • 1 year ago
    2. Zubayerskd
      Zubayerskd
      • 1 year ago

      Sorry, it's trade!

      • 1 year ago
  • ahmedashraf13131
    ahmedashraf13131
    • 1 year ago

    1st : what is the data and columns to use and what is the label , 2nd : what is the Volume column

    • 1 year ago
    1. ahmedashraf13131
      ahmedashraf13131
      • 1 year ago

      i meant what is the label you want to predict in [Underlying and Underlying_vol]

      • 1 year ago
    2. ts111
      Contest Holder
      • 1 year ago

      None, they are the "features" files. Prediction only for the response variable.

      • 1 year ago
  • palthode
    palthode
    • 1 year ago

    Is it ok if i code the model in R ?

    • 1 year ago
    1. ts111
      Contest Holder
      • 1 year ago

      Sure, as long as I can understand/replicate and verify

      • 1 year ago
  • shahibur55
    shahibur55
    • 1 year ago

    working on it

    • 1 year ago
    1. ts111
      Contest Holder
      • 1 year ago

      good luck

      • 1 year ago
  • Zubayerskd
    Zubayerskd
    • 1 year ago

    Do you want just the python code or the model in (.model, .joblib, .h5) specific format?

    • 1 year ago
    1. ts111
      Contest Holder
      • 1 year ago

      both, but the model is more important

      • 1 year ago
  • Zubayerskd
    Zubayerskd
    • 1 year ago

    Can you please explain the project more clearer.

    • 1 year ago
    1. ts111
      Contest Holder
      • 1 year ago

      It is to write an ML algorithm to predict the result value, what do you want to clarify?

      • 1 year ago
  • Zubayerskd
    Zubayerskd
    • 1 year ago

    I am not so good at trading. How to calculate the result value?

    • 1 year ago
    1. ts111
      Contest Holder
      • 1 year ago

      The result value is not calculated, it is the given data

      • 1 year ago

Show more comments

How to get started with contests

  • Post your contest

    Post Your Contest Quick and easy

  • Get tons of entries

    Get Tons of Entries From around the world

  • Award the best entry

    Award the best entry Download the files - Easy!

Post a Contest Now or Join us Today!