Back to Blog
5 Free Public Datasets Every New Analyst Should Practice With
Data Analysis

5 Free Public Datasets Every New Analyst Should Practice With

By Softcraft Studio ·

Jump-start your data analysis skills with these five free, real-world datasets — perfect for building your portfolio and practising everything from data cleaning to machine learning.

Introduction

“If you’re just starting out as a data analyst — or looking to sharpen your skills — one of the best ways to learn is by getting your hands dirty with real data.”

The challenge is finding datasets that are interesting enough to stay motivated, complex enough to teach you something, and accessible enough not to require special permissions or sign-ups. Here are five free public datasets that hit all three marks.

Dataset 1: Netflix Movies and TV Shows

Source: Kaggle — Netflix Titles Best For: Data cleaning, filtering, visual exploration

This collection encompasses thousands of Netflix titles with information including cast, genre, release year, ratings, and duration. It’s an excellent resource for developing data preparation abilities and conducting exploratory analysis.

Practice ideas:

  • Identify trends in genres over the years
  • Analyse content production by country
  • Build a dashboard showing what’s trending on Netflix
  • Clean and standardise the multi-value genre and cast columns

This dataset is popular for good reason — it’s messy enough to be realistic, structured enough to be approachable.

Dataset 2: Airbnb Listings

Source: Inside Airbnb (insideairbnb.com) Best For: Spatial analysis, pricing strategies, correlation work

This comprehensive dataset features listings from cities worldwide, including pricing, availability, location, and host information. It’s particularly valuable for understanding geographic data patterns and pricing mechanisms.

Practice ideas:

  • Map listings by price and location using Python or Tableau
  • Explore seasonal pricing trends
  • Predict which features most impact nightly rate (regression analysis)
  • Compare host response rates across neighbourhoods

The data is available for dozens of cities, so you can pick one you know well or compare cities across countries.

Dataset 3: COVID-19 Data Repository by Johns Hopkins University

Source: GitHub — JHU CSSE Best For: Time series analysis, real-world impact studies

“This is one of the most comprehensive COVID datasets, offering case numbers, deaths, and recovery statistics across countries and regions.” It demonstrates how to work with evolving, date-indexed data over extended periods.

Practice ideas:

  • Visualise global case trends over time using line charts
  • Analyse policy impact by region (e.g., lockdown start dates vs. case trajectories)
  • Build a rolling 7-day average to smooth daily noise
  • Forecast trends using time series models

Working with this dataset builds strong skills in time series manipulation — a core competency for any Data Analyst.

Dataset 4: Spotify Tracks Dataset

Source: Kaggle — Spotify Dataset 1921–2020 Best For: Data wrangling, audio feature exploration, clustering

This dataset includes extensive track metadata with characteristics such as tempo, energy, danceability, valence, and acousticness. It’s particularly helpful for working with mixed variable types and numerical feature analysis.

Practice ideas:

  • Cluster songs based on audio features using K-means
  • Track the evolution of music styles by decade
  • Build a simple song recommender based on feature similarity
  • Correlate audio features with popularity scores

This is a fun dataset that keeps you motivated — it’s hard not to stay curious when the data involves music you recognise.

Dataset 5: US Census and Demographic Data

Source: Data.gov Best For: Demographic analysis, dashboards, joining multiple datasets

“Data.gov offers a range of US Census datasets that you can use to explore population trends, income distribution, education levels, and more.” This resource supports investigation of socioeconomic patterns and integration of multiple data sources.

Practice ideas:

  • Create heatmaps by income level or education using Tableau or Power BI
  • Join with employment data for richer cross-variable insights
  • Build a report on socio-economic disparities across states
  • Practice data wrangling by joining county-level and state-level tables

This dataset is also great for practising SQL joins if you load it into a local database.

How to Make the Most of These Datasets

Don’t just download and explore casually. Treat each dataset as a real project:

  1. Frame a question — “Which Netflix genres have grown most since 2015?” is more useful than just “explore Netflix data”
  2. Clean before you analyse — Every dataset has issues. Find them before drawing conclusions
  3. Create a mock stakeholder report — Present findings as if to a real business audience
  4. Build something visual — A dashboard, a chart, or even a simple slide deck

This methodical approach develops practical expertise and gives you portfolio pieces to show prospective employers.

Final Thoughts

The gap between learning analysis theory and being a capable analyst is bridged by practice with real data. These five datasets cover a range of domains, complexity levels, and analytical techniques — enough to build solid skills across cleaning, exploration, visualisation, and modelling.

Pick one, frame a question, and get started. The best dataset to learn from is the one you actually open.


Looking for structured templates to guide your data work? Download the Data Cleaning Checklist from Softcraft Studio — a systematic guide covering nulls, duplicates, outliers, and formatting consistency, available on Etsy.