Sunday, August 24, 2014

Public Data Sets

Data has become vital part of our lives enabling us to become more knowledgeable and make right decisions. Some of the links directing to various data sets: 

  • List from Data Science Central   - http://www.datasciencecentral.com/profiles/blogs/big-data-sets-available-for-free  
  • 100+ Interesting Data Setsfor Statistics -   http://rs.io/2014/05/29/list-of-data-sets.html
  • Microsoft Azure Datasets -        http://datamarket.azure.com/
  • Google's Datasets Search Engine - https://www.google.com/cse/publicurl?cx=002720237717066476899:v2wv26idk7m&utm_source=hootsuite&utm_campaign=hootsuite
  • ImportIO - Make your own datasets from webpages - https://import.io/
  • Database Format of Wikipedia articles - http://wiki.dbpedia.org/Downloads39
  • Journalistic Datasets -  https://projects.propublica.org/data-store/
  • United States government data sets - http://www.data.gov/
  • United States government statistics sets  - http://www.usa.gov/Topics/Reference-Shelf/Data.shtml
  • United States weather data - http://www.ncdc.noaa.gov/
  • World Bank Data - https://finances.worldbank.org/
  • USA Financial Analysis from New York University http://pages.stern.nyu.edu/~adamodar/New_Home_Page/data.html
  • UCI Machine Learning Repository – probably one of the most popular and comprehensive sources around. 295 data sets in total.
  • Internet Measurement Data Catalog (DataCat) – the ultimate resource if you're looking for huge piles of data coming from internet traffic.
  • Wikipedia – this one might not be so obvious, but Wikipedia, aside from being one of the most popular web sites globally, is also committed to publish the metadata it collects. You can for example check out the Wikipedia's page view stats.
  • Data.gov – Catalog of U.S. Government data, consisting of more than 112 000 data sets. Not all of them are large, but definitely worth looking at. There are of course other local Open Data sources you might consider interesting as well (i.e. datahub project).
  • AWS' Public Data Sets –Amazon provides central data set repository with nice browsing capabilities.
  • Microsoft Azure MarketPlace – last but not least you can find some data inside Azure platform too. As the name implies not all of them are free though…
Original Sources of Information: Buckwoody and BigX Blogs