July 2010

Midas: Secondary Data

The end (of the first phase) is in sight. I started this project very myopic and wrote code with very small, short-term goals in mind, but I find great joy this morning in having the rest of the code conceived in my mind (and on paper) (and on MS Word). I know exactly what is left to be implemented — everything is sure.

Secondary Data

To date, Midas Data Miner coding efforts have focused on mining & storing raw data which I have aptly stored in the binary tree named “primary”.

Batting better than .452 (with sources)

I remain to be pleased with my ZODB implementation.

Since my last post, I have successfully downloaded all the market data I could find off of Yahoo! Finance. The program downloads all the market data for a specified ticker off of Yahoo’s servers, and then this information is saved into a binary tree in my ZODB, which is keyed by the tuple (date,ticker).

The download took 7 hours, and comprised 8 years of market data for all my tickers.

I’m not sure if everything is packed optimally or not, but the resulting ZODB pickle file is sixteen gigabytes.

Midas: ZODB and Binary Trees

So while sequential access and un-pickling/re-pickling market days of information between my hard drive and my RAM to conserve memory was a solution, there was a better one to be found.

I am now using Zope Object Database (ZODB) and its OOBTree class. ZODB is much more slick than my proprietary pickling scheme… and binary trees are much quicker data structures for what I am trying to do (I was using mere dictionaries before).