Machine learning for algorithmic trading

  • Status: Closed
  • Prize: $250
  • Entries Received: 11
  • Winner: kebatex

Contest Brief

Develop a ML/AI model to predict result of trades. Value to be predicted shown in training data first column "result" (response variable). Underlying data and volatility shown in separate files. The underlying and volatility data are two time series to be used to predict the result. (You can use any other market data that you feel fit as well)

R2 is to be computed as leave-one-out cross validation (k-fold cross validation with number of folds equal to number of values). Please post this value to your entry (highest one is the winner). Please include code and results in your pdf file clearly showing your method, the leave-one-out prediction result for each value in the "results" column of the training set, your feature generation method (if applicable) and the overall R2. I should be able to replicate your method and verify your results from your document.

Data to be used for prediction is time-series in nature (as showing in the two underlying files), so CANNOT use any data after trade date (shown as start date column in training data) for prediction. Entries using any data from dates after trade date are not eligible.

Those with strong entries may be contracted for additional tasks.

Prize guaranteed.

Recommended Skills

Top entries from this contest

View More Entries

Public Clarification Board

  • ashikmohann
    ashikmohann
    • 3 years ago

    #extended #extended #extended #extended

    • 3 years ago
  • Bouidad
    Bouidad
    • 3 years ago

    #extended #extended #extended #extended #extended #extended

    • 3 years ago
  • casras
    casras
    • 3 years ago

    There are 2 different entries for same date 03/04/2016 in the train file.

    • 3 years ago
    1. casras
      casras
      • 3 years ago

      There are 9 dates with this problem. Used the first entry for all in the entry I submitted. I think the contest as it is is not well defined and encourages overfit solutions which aren't useful. Feel free to contact me after the contest if you want more details and what I'd propose for something more useful.

      • 3 years ago
    2. ts111
      Contest Holder
      • 3 years ago

      The contest states that the measure should be a leave-one-out R2 (or a k-fold method). This takes care of the overfitting issue. If there is any clarification on this metric, please contact me and I can explain more.

      • 3 years ago
  • casras
    casras
    • 3 years ago

    May you expand on the reason for choosing R2 as the contest metric as opposed to some more traditional trading metrics like total return, sharpe, drawdown?

    • 3 years ago
    1. ts111
      Contest Holder
      • 3 years ago

      The aim is not to evaluate trading strategies but to predict returns from one.6

      • 3 years ago
  • vijaykrishna0497
    vijaykrishna0497
    • 3 years ago

    Hi, I will make my entry soon. You can have a look on my profile also. Thank you.

    • 3 years ago
    1. ts111
      Contest Holder
      • 3 years ago

      Good luck

      • 3 years ago
  • wejdansolutions
    wejdansolutions
    • 3 years ago

    Dear sir, Thank you for posting the contest. May I confirmed if 1) the dates in all three excel files correspond to each other ie 7 October 2017 in excel 1,2,3 are one same day? 2) I understand one of the csv file is the price of share , how about the excel file name "underlying"? Does it indicate Dow Jones Indices volume? There's a market crash in 1987 and I wanted to know what does the attribute represents.

    • 3 years ago
    1. ts111
      Contest Holder
      • 3 years ago

      1. Dates all correspond, yes. 2. Underlying is the price of the share, underlying vol is the volatility and the data in training set it the the "result" of the trade that needs to be predicted.

      • 3 years ago
  • zizopixels
    zizopixels
    • 3 years ago

    Any chance for RL enteries?

    • 3 years ago
    1. zizopixels
      zizopixels
      • 3 years ago

      I can make it work but its metric/benchmark is different then r2.
      It computes stability, return, volatility, omega, sharpe, calmar and sortino ratio and mean as benchmarks.

      • 3 years ago
    2. ts111
      Contest Holder
      • 3 years ago

      I see, I won't be able to compare it apples-to-apples with the other entries then.

      • 3 years ago
  • Bouidad
    Bouidad
    • 3 years ago

    Could you explain more this part please? Data is time-series in nature, so CANNOT use data after trade date (shown as start date column in training data) for training/features/prediction. Entries using any data from dates after trade date are not eligible.

    • 3 years ago
    1. Bouidad
      Bouidad
      • 3 years ago

      Thank you, one last question please: what is the difference between Open, High Low, and Close columns of Underlying.csv and those of 'Underlying_vol.csv'?

      • 3 years ago
    2. ts111
      Contest Holder
      • 3 years ago

      Those are the market open, market close etc. values for each day. You can use just the 'close' column if you want.

      • 3 years ago
  • palthode
    palthode
    • 3 years ago

    Kindly make the contest #guaranteed

    • 3 years ago
    1. ts111
      Contest Holder
      • 3 years ago

      Prize guaranteed as long as valid entry received and contest will be updated to reflect so.

      • 3 years ago
    2. ts111
      Contest Holder
      • 3 years ago

      Guaranteed now

      • 3 years ago
  • Zubayerskd
    Zubayerskd
    • 3 years ago

    The data in underlying_vol and underlying is different though the date is same! Can you explain it?

    • 3 years ago
    1. ts111
      Contest Holder
      • 3 years ago

      They are two different time series (only for underlying and one for volatility). Both can be used for features.

      • 3 years ago
  • waqarrao498
    waqarrao498
    • 3 years ago

    CANNOT use data after trade date?? Kindly tell us the exact trade date ?

    • 3 years ago
    1. ts111
      Contest Holder
      • 3 years ago

      Each prediction is for a particular date (as shown in the training set). The time series' to use for prediction are in the other two files. While using the time series, you can not use any data after the trade date.

      • 3 years ago
  • Zubayerskd
    Zubayerskd
    • 3 years ago

    What to predict and which data will be given to predict? What's the result label in Training_set csv? Please explain us better!

    • 3 years ago
    1. ts111
      Contest Holder
      • 3 years ago

      Response variable is "result" in the training_set.csv -- this is what is to be predicted. The other two files contain the data to be used for training.

      • 3 years ago
  • Zubayerskd
    Zubayerskd
    • 3 years ago

    I don't know anything about tread! What is the result of a tread?

    • 3 years ago
    1. ts111
      Contest Holder
      • 3 years ago

      what is a tread?

      • 3 years ago
    2. Zubayerskd
      Zubayerskd
      • 3 years ago

      Sorry, it's trade!

      • 3 years ago
  • ahmedashraf13131
    ahmedashraf13131
    • 3 years ago

    1st : what is the data and columns to use and what is the label , 2nd : what is the Volume column

    • 3 years ago
    1. ahmedashraf13131
      ahmedashraf13131
      • 3 years ago

      i meant what is the label you want to predict in [Underlying and Underlying_vol]

      • 3 years ago
    2. ts111
      Contest Holder
      • 3 years ago

      None, they are the "features" files. Prediction only for the response variable.

      • 3 years ago
  • palthode
    palthode
    • 3 years ago

    Is it ok if i code the model in R ?

    • 3 years ago
    1. ts111
      Contest Holder
      • 3 years ago

      Sure, as long as I can understand/replicate and verify

      • 3 years ago
  • shahibur55
    shahibur55
    • 3 years ago

    working on it

    • 3 years ago
    1. ts111
      Contest Holder
      • 3 years ago

      good luck

      • 3 years ago
  • Zubayerskd
    Zubayerskd
    • 3 years ago

    Do you want just the python code or the model in (.model, .joblib, .h5) specific format?

    • 3 years ago
    1. ts111
      Contest Holder
      • 3 years ago

      both, but the model is more important

      • 3 years ago
  • Zubayerskd
    Zubayerskd
    • 3 years ago

    Can you please explain the project more clearer.

    • 3 years ago
    1. ts111
      Contest Holder
      • 3 years ago

      It is to write an ML algorithm to predict the result value, what do you want to clarify?

      • 3 years ago
  • Zubayerskd
    Zubayerskd
    • 3 years ago

    I am not so good at trading. How to calculate the result value?

    • 3 years ago
    1. ts111
      Contest Holder
      • 3 years ago

      The result value is not calculated, it is the given data

      • 3 years ago

Show more comments

How to get started with contests

  • Post your contest

    Post Your Contest Quick and easy

  • Get tons of entries

    Get Tons of Entries From around the world

  • Award the best entry

    Award the best entry Download the files - Easy!

Post a Contest Now or Join us Today!