Kaggle Datasets

QUICK START LOCALLY

Select your preferences and run the install command. Stable represents the most currently tested and supported version of kaggledatasets. This should be suitable for many users. Preview is available if you want the latest, not fully tested and supported, 1.0 builds that are generated nightly. Please ensure that you have met the prerequisites below (e.g., numpy), depending on your package manager. You can also install previous versions of kaggledatasets.

# Download options will be available soon

Previous versions of kaggledatasets

INSTALLATION

Anaconda

condia install kaggledatasets

Pip

pip install kaggledatasets

Source

# get the kaggledatasets source
git clone --recursive https://github.com/kaggledatasets/kaggledatasets
cd kaggledatasets

# install the dependencies
pip install -r requirements.txt

# install kaggledatasets
python setup.py install

CONFIGURATION

This package uses Kaggle API. API Credentials Guide available here or you can follow below given steps:

Sign up for a Kaggle account at https://www.kaggle.com
Then go to the Account tab of your user profile (https://www.kaggle.com/<username>/account) and select Create API Token
This will trigger the download of kaggle.json, a file containing your API credentials
Setting up kaggle.json:

Linux/macOS:
Place this file in the location ~/.kaggle/kaggle.json
For your security, ensure that other users of your computer do not have read access to your credentials. You can do this with the following command:
chmod 600 ~/.kaggle/kaggle.json
Windows:
Place this file in the location C:\Users\<Windows-username>\.kaggle\kaggle.json

USAGE

import kaggledatasets as kd

heart_disease = kd.structured.HeartDiseaseUCI(download=True)

# Returns the pandas data frame to be used in Scikit Learn or any other framework
df = heart_disease.data_frame()

# Returns the tensorflow dataset type compatible with TF 2.0
dataset = heart_disease.load()
for batch, label in dataset.take(1):
    for key, value in batch.items():
        ...