April 9, 2010

Data Sources and Software lists

In an effort to simplify things for me, I am making a "data sources" page to merge a large scattered set of references into one place. If you find this helpful for you, I'd be pleased to hear what it helped you with, and perhaps a pointer to the results.

In a similar vein, I have started a "software" page. There are so many moving parts in a complete analytical system that I find it helpful to have a list of which pieces can go where. From ETL (extract-transform-load) to presentation, with useful stops at analysis, programming, data management, and even operating systems and environments. And side notes on geodata/GIS capability if that needs to be in the system too. I hope to have a post at some point giving a couple of recommended flows through these, for people of various means and needs.

It has already made one thing clear: SAS, C/C++, SQL, C#, perl, Python, and friends cover a lot - at least up to the tens of terabytes - but I see I still have some gaps at the low-cost petabyte scale. I might do well to sharpen my Java skills, and add Pig (the programming language) skills.

