Hugit#

PyPI Status Python Version License

Read the documentation at https://hugit.readthedocs.io/ Tests Codecov

pre-commit Black

Warning: this code is very much a work in progress and is primarily being intended for a particular workflow. It may not work well (or at all) for your workflow.

hugit is a command line tool for loading ImageFolder style datasets into a 🤗 datasets Dataset and pushing to the 🤗 hub.

The primary goal of hugit is to help quickly get a local dataset into a format that can be used for training computer vision models. hugit was developed to support the workflow for flyswot where we wanted a quicker iteration between creating new training data, training a model, and using the new model inside flyswot.

hugit workflow diagram

Supported formats#

At the moment hugit supports ImageFolder style datasets i.e:

data/
    dog/
        dog1.jpg
    cat/
        cat.1.jpg

Features#

  • A command line interface for quickly loading a dataset stored on disk into a 🤗 datasets.Dataset

  • Push your local dataset to the 🤗 hub

  • Get statistics about your dataset. These statistics focus on ‘high level’ statistic that would be useful to include in Datasheets and Model Cards. Currently these statistics include:

    • label frequencies, organised by split

    • train, test, valid split sizes

Installation#

You can install Hugit via pip from PyPI, inside a virtual environment install hugit using

$ pip install hugit

Alternatively, you can use pipx to install hugit

$ pipx install hugit

Usage#

You can see help for hugit using hugit --help

Usage: hugit [OPTIONS] COMMAND [ARGS]...

  Hugit Command Line

Options:
  --help  Show this message and exit.

Commands:
  convert_images      Convert images in directory to `save_format`
  push_image_dataset  Load an ImageFolder style dataset.

To load an ImageFolder style dataset onto the 🤗 Hub you can use the push_image_dataset command.

Usage: hugit push_image_dataset [OPTIONS] DIRECTORY

  Load an ImageFolder style dataset.

Options:
  --repo-id TEXT                  Repo id for the Hugging Face Hub  [required]
  --private / --no-private        Whether to keep dataset private on the Hub
                                  [default: private]
  --do-resize / --no-do-resize    Whether to resize images before upload
                                  [default: do-resize]
  --size INTEGER                  Size to resize image. This will be used on the
                                  shortest side of the image i.e. the aspect
                                  rato will be maintained  [default: 224]
  --preserve-file-path / --no-preserve-file-path
                                  preserve_orginal_file_path  [default:
                                  preserve-file-path]
  --help                          Show this message and exit.

Under the hood hugit uses typed-settings, which means that configuration can either be done through the command line or through a TOML file. See [usage] for more detailed discussion of how to use hugit.

Contributing#

It is likely that Hugit may only work for our particular workflow. With that said if you have suggestions please open an issue.

License#

Distributed under the terms of the MIT license, Hugit is free and open source software.

Issues#

If you encounter any problems, please file an issue along with a detailed description.

Credits#

This project was generated from @cjolowicz’s Hypermodern Python Cookiecutter template.