So the latest name for my historical stock screener project is Midas’ Data Miner… It’s good symbolism for what I hope to accomplish with an investing strategy some day.
Besides, mythology only ever contains a grain of truth(if that). For all we know, King Midas owned a MacBook and traded stocks online, and the bards merely fancied telling the tall tale of him turning trash into gold with his finger tips, rather than the dull truth of his amassing of wealth in bonds and shares— Not at all anachronistic, right?
I am now developing it in Python. Originally, I had planned on C++ but Python is much more fun to develop for, and performance isn’t going to be as much of an issue as before(discussed next).
Historical stock data sets are enormous. If I were to achieve my goal of decades of stock quotes for ten-thousand-plus firms, that’s a lot of data. Consider that each stock quote yields twenty or more distinct pieces of price or fundamental data, and multiply this by 300+ days per year, by however many thousands of firms, and however many years… and the pile of quotes grows.
It is great enough that I cannot hope to load it all in RAM. Fortunately, I’ve come around to the idea of accessing any data set one day at a time. And this is how data sets are implemented in MDM. I can’t access my PZTMDB databases randomly; MDM can only iterate through them sequentially, day after day, one day at a time.
It’s very elegant, and perfectly in line with the demands of this app, as historical portfolios will also navigate the data one day at a time.
So far, in writing my classes, I’ve really only played with one XLS (Excel Sheet) source for fundamental data going back to January 2001. However, any other class I might write to access a different data set(perhaps Yahoo! market data through the web) must use the same interface as the already-implemented XLS class, so there isn’t really any critical thinking remaining… Just peon work.
Speaking of which, the Database (iterator) class and its children all have three public members: a date object, a dictionary object, and a next() method
database.date is a datetime object for the iterator’s position in data setdatabase.dict is a keyed dictionary of the form { TICKER -> STOCK QUOTE DATA }database.next() cycles to next day in dataset, and updates the aforementioned date and dict membersSo, it becomes moot whether “Database” actually describes the information I’m working with as long as the object fulfills these three members. My next task is to try to write a copy of this class that accesses data over the web using the Python urllib module, and define a simulacrum iterator for Yahoo! data using only these three members.
Information is information, regardless whether printed on an Excel sheet or available on the net.
The next dull step of this project is to find any variety of market data sets freely available on the internet, and to implement each as Database child classes, whether Excel files or otherwise.
And one fine day, I shall have a plethora of market data in hand to do historical stock-trading system testing(Say that 10x fast).
Having completed the Data Mining feature(s), further features planned are enumerated below:
I would guarantee completing this before the end of Spring semester, but all bets are off if I can’t find the time(i.e., my status quo).
/s/ Patrick