Tutorial#

Pushing a local dataset to the 🤗 hub#

We can upload an ImageFolder style dataset to the hub directly using hugit. An ImageFolder dataset is where the labels are encoded in the part of the folder structure. This often looks something like:

data/
    Dog/
        Image1.jpg
    Cat/
        Image1.jpg

Where dog and cat refer to the label of the images contained witin that folder. This type of folder structure is often used for sharing machine learning datasets. It is also one of the possible output formats we might have from an annotation tool. To upload our local data from our machine (or server) to the Hugging Face hub.

Let’s have a look at the help for the push_image_dataset command.

Usage: hugit push_image_dataset [OPTIONS] DIRECTORY

  Load an ImageFolder style dataset.

Options:
  --train-directory TEXT        Name of train directory
  --valid-directory TEXT        name of valid directory
  --test-directory TEXT         name of test directory
  --repo-id TEXT                Repo id for the Hugging Face Hub  [required]
  --private / --no-private      Whether to keep dataset private on the Hub
                                [default: private]
  --do-resize / --no-do-resize  Whether to resize images before upload
                                [default: do-resize]
  --size INTEGER                Size to resize image. This will be used on the
                                shortest side of the image i.e. the aspect rato
                                will be maintained  [default: 224]
  --help                        Show this message and exit.

As you can see we have to pass hugit some required arguments and some options.

hugit load_image_dataset cifar10 --repo-id davanstrin/cifar10

Configuration#

When we upload an image to the Hugging Face Hub using hugit we have a few settings we can configure. These settings include the hugginface hub ID for where the model will be stored e.g. davanstrien/CIFAR10 and whether to resize your images before uploading. There are two types of setting:

  • optional: these you can specify or not

  • required: these you must tell hugit about

There are two main ways in which we can specify these settings:

  • through the command line interface of hugit

  • through a TOML configuration file.

Passing settings through the Command-Line#

--do-resize

Storing settings in a configuration file#

You can also specify your setting in a TOML configuration file. TOML

As an example configuration

[tool.huggit]
hub_id = "davanstrien/CIFAR10"
do_resize = true
size = 224

Which format to use?#

The command line overwrites the toml configs settings which don’t change much can be stored in config

Example#