Up and Running with Public Data Sets

NOTES FROM

Up and Running with Public Datasets

Curt Frye

1. U.S. Census Bureau and Securities and Exchange Commission

American FactFinder

link: factfinder.census.gov

Most US residents know the United States Census Bureau as the government organization that counts the American population every 10 years. In fact, the Census Bureau, which is part of the Department of Commerce, is tasked with gathering a wide range of data for individuals, government entities at the national, state, and local levels, as well as industry.

You can see that on the main page there is an easy link to search for community facts where you can find popular facts such as population or income about a particular community.

State and County Quick Facts

link: quickfacts.census.gov

It’s easy to think that policy and business decisions are made based on national or international data, but most governments and businesses operate at the state and local level. As the name implies, this data focuses on information at the state and county level within the United States.

CenStats Databases

link: censtats.census.gov

The United States Census Bureau is well known for gathering information about U.S. citizens, but it also collects and analyzes many other categories of data. The Censtats Databases to find trade data that could help you analyze foreign and domestic markets. You can look at County Business Patterns, both by Standard Industrial Classification and by the North American Industry Classification System, the latter starting in 2003. And you can also look at International Trade Data.

Census Bureau geographic data sets

link: census.gov/geography.html

One very useful way to analyze your data is by plotting it geographically. If you don’t have access to a full updated geographic information system, you can download the TIGER map shape files from the United States Census Bureau’s geography website.

Census Bureau population projections

link: census.gov/population/projections

A lot of products and services appeal to certain demographic groups more than others. Some television advertisers covet the 18 to 29 year-old group because they tended to spend their disposable income on items that are more fun than practical, but other companies offer services to individuals who are over 50 years of age. Judging the size of each age group, even down to a single year, helps companies estimate the potential reach of their goods and services. This is a Census Bureau site, so you can see links to other data areas, such as Topics by Population or Economy, grouped by Geography, the Library, Data, and also information about the Census Bureau.

EDGAR

link: sec.gov/edgar.html

Every public company in the United States, meaning every company that offers shares of stock for sale on the Exchange, must file certain documents with the United States Securities and Exchange Commission. These filings include financial accounting statements, commentary on how the numbers in the statements were derived, and disclosures of executive compensation.

2. Other U.S. Government Agencies

Bureau of Justice Statistics

link: bjs.gov

The American judicial system covers a wide variety of areas, ranging from court hearings, to corrections, and with many areas in between. The US Department of Justice oversees the country’s programs at the federal level and through the Bureau of Justice Statistics gathers data at the federal and state levels.

Internal Revenue Service migration data

link: irs.gov/uac/Tax-Stats-2

No one likes to pay taxes, but the good news is that the filings generate a lot of useful data. In the U.S., the Internal Revenue Service provides data collections through its Tax Statistics service, which you can find on-line through the IRS’s website.

U.S. Bureau of Economic Analysis

link: bea.gov

If you do business in or with the United States, it’s important to keep careful track of the country’s economic trends. The Department of Commerce’s Bureau of Economic Analysis gathers statistics relating to the U.S. economy that let you gain useful insights into the state of personal and business economic health in the U.S.

FedStats

link: fedstats.sites.usa.gov

Started in 1997, FedStats is a U.S. government website that aggregates links to and statistics generated by government agencies. The benefit of looking for data through FedStats is that you don’t need to know which agency produced a particular statistic. In addition to the latest news, which you can see in this section here, you also have links to other U.S. government agencies: The Bureau of Economic Analysis, Bureau of Justice Statistics, Bureau of Labor Statistics, and so on.

U.S. Department of Education Data and Research

link: ed.gov/rschstat

Education provides the foundation for a productive society. The United States Department of Education gathers statistics from educational institutions around the U.S. and makes them available through the Department’s data and research site, enabling analysts to examine and evaluate education in the U.S.

U.S. Bureau of Labor Statistics

link: bls.gov

You can get information on the salaries paid in various fields, examine employment trends, and look through the Occupational Outlook Handbook which looks at the future prospects for a variety of professions. That and there’s a lot of other data available as well.

Bureau of Transportation Statistics

link: rita.dot.gov/bts

The Bureau of Transportation Statistics,which is part of the Department of Transportation, gathers statistics on highway, water, rail, and inter modal transportation in the US. If you work in manufacturing, or need a baseline for national and international transportation trends, this website provides the data you need to make good decisions.

U.S. Patent and Trademark Office

link: uspto.gov

The U.S. federal government, like almost all governments around the world,lets its citizens register inventions, trademarks, and other forms of intellectual property to protect those valuable ideas against unauthorized use. In general terms, patents protect processes while trademarks protect words, phrases, and images that are used to identify companies, products, and services. If you want to search for existing patents and trademarks in the U.S., you can go to the U.S. Patent and Trademark Office.

3. Non-U.S. Data Sources

World Bank

link: data.worldbank.org

Which gives you information on the world economy, health, such as life expectancy, education, and so on.

CIA World Factbook

link: cia.gov/library/publications/the-world-factbook

CIA, which is the United States Central Intelligence Agency, is tasked with gathering, analyzing, and assessing information about foreign countries. The goal of this gathering is to discover the current conditions in and intentions of countries other than the U.S. As part of its outreach to U.S. citizens, the CIA publishes its World Factbook, which offers basic information about the government, citizens, and economies of countries throughout the world.

United Nations

link: data.un.org

The United Nations or UN is an international organization that provides a form for its 193 member states to express their views on relevant issues and to coordinate action. As part of its mission, the UN provides access to national data services and also its own data collections.

Government of Canada Open Government Portal

link: statcn.gc.ca

Statistics Canada, or StatCan, as it’s known informally, is a Canadian government agency that gathers statistics from industry and government institutions around Canada, and makes those numbers and commentary available on their website.

EuroStat

link: ec.europa.edu/eurostat

Eurostat, which is the European Union’s Statistics Agency, provides access to data about the countries of the European Union, and as a subset of those entities, the countries of the Euro zone. The latter group is comprised of the countries such as Germany, France and Finland that have adopted the Euro as their national currency.

Organization for Economic Cooperation and Development

link: data.oecd.org

The Organisation for Economic Co-operation and Development or OECD, is an international organization that promotes policies to improve the economic and social well-being of people around the world. Data gathering in foreign policy creation and evaluation, of course, so the OECD shares their data online for free.

4. Data Search Engines and Portals

Quandl

link: quandl.com

Search engines provide ways to find data based on certain search terms. You might search for stock prices, oil production figures, or other benchmarks you use in your business. One search engine, Quandl, provides both a search interface and a set of curated data collections to streamline the discovery process.

The University of Maryland INFORUM

link: inforumweb.umd.edu

Inforum is the inter-industry forcasting project at the University of Maryland.The Inforum team builds models to forecast future performance of the US and other economies. On their website you can find details of the econometric models they use, the data they work with, and links to software that let you work with their data as well.

Google Public Data

link: google.com/publicdata

Google is an exceptionally popular search engine, but the company also makes public data available through their Google Public Data collection. As of this recording, the collection contains 136 data sets covering a wide variety of economic and technology topics.

Amazon Web Services Public Data Sets

link: aws.amazon.com/datasets

Amazon.com is best known as the internet’s book seller of choice, but it also provides Cloud Computing Services such as remote data storage and processing. As part of it’s operations, Amazon provides access to big data sets in several scientific fields.

Data.gov

link: data.gov

Data.gov is the United States Federal Government’s data clearing house,where you can find links to data from all federal websites and many state and local collections, as well. If you’re not sure which government site or bureau has the data you’re looking for, searching at data.gov is a great place to start.

Google Ngram Viewer

link: books.google.com/Ngrams

I mentioned Google’s public data service in another part of this course. In this movie I’d like to point you to the Google Ngram Viewer. An Ngram is a series of characters of a given length. For example, a two-character string is a bigram. A three-character string is a trigram, and so on. If you perform linguistic analysis, and want to search word usage in books published from 1800 to about 2008, the Google Ngram viewer is a great tool to keep in mind. Google Ngram viewer which finds the popularity of certain character strings and words in books that were publishedfrom about 1800 to 2008.

Corpus of Contemporary American English

link: corpus.byu.edu/coca

The corpus of contemporary American English which tracks English word usage in books,magazine, television, films and other media.

 

 

 

 

 

 

 

 

 

 

Advertisements

About thewisedeveloper

Hi all, I am a Junior/Senior Computer Science student at Worcester State University, Worcester, Massachusetts.
This entry was posted in Big Data Analytics, Computer Science, Favorite Quotes and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s