QUICK START LOCALLY

Select your preferences and run the install command. Stable represents the most currently tested and supported version of kaggledatasets. This should be suitable for many users. Preview is available if you want the latest, not fully tested and supported, 1.0 builds that are generated nightly. Please ensure that you have met the prerequisites below (e.g., numpy), depending on your package manager. You can also install previous versions of kaggledatasets.
Build
Stable
Preview (Nightly)
Your OS
Linux
Mac
Windows
Package
Conda
Pip
Source
Run this command:
# Download options will be available soon

Previous versions of kaggledatasets

INSTALLATION

Anaconda

condia install kaggledatasets

Pip

pip install kaggledatasets

Source

# get the kaggledatasets source
git clone --recursive https://github.com/kaggledatasets/kaggledatasets
cd kaggledatasets

# install the dependencies
pip install -r requirements.txt

# install kaggledatasets
python setup.py install

CONFIGURATION

This package uses Kaggle API. API Credentials Guide available here or you can follow below given steps:
  • Sign up for a Kaggle account at https://www.kaggle.com
  • Then go to the Account tab of your user profile (https://www.kaggle.com/<username>/account) and select Create API Token
  • This will trigger the download of kaggle.json, a file containing your API credentials
  • Setting up kaggle.json:
    • Linux/macOS:
      Place this file in the location ~/.kaggle/kaggle.json
      For your security, ensure that other users of your computer do not have read access to your credentials. You can do this with the following command:
      chmod 600 ~/.kaggle/kaggle.json
    • Windows:
      Place this file in the location C:\Users\<Windows-username>\.kaggle\kaggle.json

USAGE

import kaggledatasets as kd

heart_disease = kd.structured.HeartDiseaseUCI(download=True)

# Returns the pandas data frame to be used in Scikit Learn or any other framework
df = heart_disease.data_frame()

# Returns the tensorflow dataset type compatible with TF 2.0
dataset = heart_disease.load()
for batch, label in dataset.take(1):
    for key, value in batch.items():
        ...