SmartQuant Discussion

Automated Quantitative Strategy Development, SmartQuant Product Discussion and Technical Support Forums
It is currently Mon Nov 19, 2018 9:22 pm

All times are UTC + 3 hours




Post new topic Reply to topic  [ 7 posts ] 
Author Message
PostPosted: Sun Oct 21, 2012 12:44 am 
Offline

Joined: Thu Jun 08, 2006 3:56 pm
Posts: 537
Location: BC Canada
I'm wondering what the main hardware/software limitation is for backtesting speed. Some posts suggest that it is I/O from the market data storage database, because on most multi-core CPU machines these days, by far most (if not all) of the cores are severely underutilized. If we could somehow figure out how to bypass the current limits, it would probably make OQ faster, and a lot of people happier.

Q1. Has anyone done some experiments on the difference between backtesting speeds using SSD disks vs normal 7200rpm hard disks?

Q2. Anton, perhaps you could make some sort of "official" statement about your experiences with limits on backtesting speeds? Is it disk I/O? Single thread simulation engine limits? Or other limits? (This one seems unlikely -- many people seem to have both memory and CPU cores to spare...). thanks


Top
 Profile  
 
PostPosted: Sun Oct 21, 2012 10:00 am 
Offline

Joined: Tue Aug 05, 2003 3:43 pm
Posts: 6808
Hi Kevin,

I think backtesting speed issue in the current framework is more about hashtables and eventargs :roll: The framework is too flexible (FIX layer is based on FIX tag-value pairs, which means that Order.Qty and other similar properties are actually hashtable entries in a FIX message) and too generic (it tries to do too many things behind the scene to be proactive, for example emitting all kind of events, updating portfolios, calculating equity, filling data series, etc. on every tick).

But this should change soon since we are working on a new framework, which is order(s) of magnitude faster than the present one and supports multithreading, event queues, multicore optimization, etc.

Cheers,
Anton


Top
 Profile  
 
PostPosted: Sun Oct 21, 2012 5:00 pm 
Offline

Joined: Thu Jun 08, 2006 3:56 pm
Posts: 537
Location: BC Canada
Wow, what an interesting answer. The forum postings in the past were all thinking (and posting) about disk IO speed from the database as being the major factor, and now you say it's probably the FIX layer.

The new framework sounds like it will be lightning fast. I can hardly wait to try it out (even though I'm not bothered by long simulations (my longest one is only 5 or 10 minutes). But I do love to see efficient software designs do their magic!

How does the new framework bypass or change the FIX layer event/hash table problems? (If it's not too hard or proprietary to explain in a sentence or two...)


Top
 Profile  
 
PostPosted: Sun Oct 21, 2012 8:09 pm 
Offline

Joined: Tue Aug 05, 2003 3:43 pm
Posts: 6808
Hi Kevin,

actually the major factor depends on your strategy. I guess if it's a one instrument strategy which trades once a day, which doesn't keep open position for a long time and which you backtest with trade data, then IO is going to be the major factor. If it's a multi-instrument, multi time frame strategy, then the strategy engine (which distributes data streams between strategies and instruments) may consume enough CPU. If you open many positions, then portfolio and equity calculations performed on every tick may become a bottleneck. If you submit orders very frequently or have stop/limit orders hanging in the execution simulator, then the FIX layer starts playing the major role.

Regards,
Anton


Top
 Profile  
 
PostPosted: Sun Oct 21, 2012 8:32 pm 
Offline

Joined: Thu Jun 08, 2006 3:56 pm
Posts: 537
Location: BC Canada
Thanks for the detailed answer Anton. The only thing I can think of to add to this thread is to wonder if there is any easy way, or easy new feature, that could give users an idea of where their backtesting bottlenecks are.

For example, when I've done performance profiling in the past, I've often added one or two lines of code in key locations to record the entry/exit timestamps for important (ie time consuming) pieces of code. Then after a program execution (or after many, if I wrote the stats to disk with another few lines of code), I could see where (or what relative percentage) of time was spent on these key performance sections.

Do you think it is possible for you to add a couple of lines of metric timestamp code in the framework at key points, so you could show some relative numbers in the simulation stats? I can imagine one collection point being in front of database read operations (to monitor total IO cost). Maybe there's another one possible in front of FIX operations (as you mentioned in your previous post).

I see that you list 4 (database IO, strategy engine, portfolio calculations, and FIX). If you could identify 4 very high level timestamp collection points for those 4 areas, and print them out in the simulation stats, it would tell users what the bottlenecks are.

Timestamp collection and summarization is relatively cheap and fast and easy to implement, so I wonder if you'd be kind enough to think about the collection points in the background. Maybe it's an easy thing to do on your dev path, and it would be quite interesting (and probably productive) to compare sequential stats with your new parallel implementation. It seems to me this represents a small, easy to implement bit of work that offers high bang-per-buck invested. (Not to heap yet more work on Alex and Sergey, of course... :-)


Top
 Profile  
 
PostPosted: Sun Oct 21, 2012 9:11 pm 
Offline

Joined: Tue Aug 05, 2003 3:43 pm
Posts: 6808
I believe you can simply use Visual Studio profiler (shipped with MSVS starting from Professional Edition). It works very nicely. For example we can easily see that one of the problems with the new framework is slow BinaryReader.ReadInt32/Double (by the way c++/Qt streamers are much faster than .Net ones, though c++ native new operator is a show stopper in c++ framework).

Regards,
Anton


Top
 Profile  
 
PostPosted: Sun Oct 21, 2012 9:37 pm 
Offline

Joined: Thu Jun 08, 2006 3:56 pm
Posts: 537
Location: BC Canada
I was thinking of something much simpler, and self-contained within OQ, just to give users (and you in a support role) some basic info on what limited the speed of a back test.

Profiling is much more costly and complex than what I'm suggesting. If you have paid for the pro edition of VS, if you know how to use it, if you want to take the time to learn profiling, understand and summarize the results, ... you get the idea. And heavens help you if you're a non-technical trader instead of a technical programmer. Profiling is not easily accessible.

In contrast, imagine reading a paragraph in the manual that says

"Look at these 4 numbers on your statistics output.

If the IO number is 5x the size of the others, it means your backtest is more IO-bound than CPU bound. Consider buying an SSD disk, or using a ramdisk. If the CPU number is biggest, consider upgrading your computer to a faster CPU."

These numbers are simple, more accessible, and more understandable, in comparision to profiling, especially for non-technical users. I can even imagine a simple menu choice on the help menu called "For faster backtests...". It could pop up the numbers and paragraphs that I mention above.

Just my two bits for more user friendliness...(and you know I love the products already anyway...)


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 7 posts ] 

All times are UTC + 3 hours


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group