Midas and Yahoo! Finance

Good news: I implemented a Yahoo database class that interfaces with my other defined classes (i.e. PZTMDB files) to import data.

Bad news: My import method takes forever.

Sequential access

Earlier design choices lead me toward saving my PZTMDB data as persistent python objects on the hard drive (pickling). As a consequence, I could only access data sequentially, and not randomly / at will.

This means that when I want to import data, I must begin at day zero and iterate through all the days, updating data when it makes sense to. I can start on January 1 and go through December 31 day by day — but it is impossible to retrieve or write to May 9 directly.

8000 Yahoo DB instances…

My initial implementation of Yahoo historical market data creates one-company-big database instances and downloads all the company’s data from Yahoo. The advantage of this is that I only have to store 1/8000 the amount of information on my system memory.

However, to import to the PZTMDB database, I must call the import method on the PZTMDB once for each company’s instance, resulting in ~8000 calls to import data.

It follows that, since the PZTMDB database was designed to be accessed sequentially, I must iterate through the 300 or so days in every year updating data, x 8000 companies. Not very efficient. This is a lot of CPU processing.

…becomes 1 Yahoo DB Instance?

Fortunately, it won’t be difficult to remediate this. I think I will have sufficient RAM to store all the historical data for companies sourced from Yahoo to one Yahoo DC instance.

Yahoo historical quotes are only 5 pieces of data, as opposed to the conceivable 20-30 pieces I may store in the full PZTMDB database. I believe it will be possible download all ~8000 companies’ historical quote CSV files in one burst, write them all to RAM, and then only perform one import method call on the PZTMDB (instead of 8000). By my rough math this only consumes 1.5GB of RAM at 30 years of market data(I only have sources for 10 years so far, regardless).

To-do

But I’m checking out for tonight/this week/whatever, so will pick up this problem next time I find time for the project.

/s/ Patrick