Hugit#
Warning: this code is very much a work in progress and is primarily being intended for a particular workflow. It may not work well (or at all)ย for your workflow.
hugit
is a command line tool for loading ImageFolder style datasets into a ๐ค datasets
Dataset
and pushing to the ๐ค hub.
The primary goal of hugit
is to help quickly get a local dataset into a format that can be used for training computer vision models. hugit
was developed to support the workflow for flyswot
where we wanted a quicker iteration between creating new training data, training a model, and using the new model inside flyswot
.
Supported formats#
At the moment hugit supports ImageFolder style datasets i.e:
data/
dog/
dog1.jpg
cat/
cat.1.jpg
Features#
A command line interface for quickly loading a dataset stored on disk into a ๐ค
datasets.Dataset
Push your local dataset to the ๐ค hub
Get statistics about your dataset. These statistics focus on โhigh levelโ statistic that would be useful to include in Datasheets and Model Cards. Currently these statistics include:
label frequencies, organised by split
train, test, valid split sizes
Installation#
You can install Hugit via pip from PyPI, inside a virtual environment install hugit
using
$ pip install hugit
Alternatively, you can use pipx to install hugit
$ pipx install hugit
Usage#
You can see help for hugit
using hugit --help
Usage: hugit [OPTIONS] COMMAND [ARGS]...
Hugit Command Line
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Commands โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ convert_images Convert images in directory to `save_format` โ
โ push_image_dataset Load an ImageFolder style dataset. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
To load an ImageFolder style dataset onto the ๐ค Hub you can use the push_image_dataset
command.
Usage: hugit push_image_dataset [OPTIONS] DIRECTORY
Load an ImageFolder style dataset.
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ * --repo-id TEXT Repo id for the Hugging Face Hub [required] โ
โ --private/--no-private Whether to keep dataset private on the Hub [default: private] โ
โ --do-resize/--no-do-resize Whether to resize images before upload [default: no-do-resize] โ
โ --size INTEGER Size to resize image. This will be used on the shortest side of the image i.e. the aspect ratio will be โ
โ maintained โ
โ [default: 224] โ
โ --preserve-file-path/--no-preserve-file-path preserve original file path [default: preserve-file-path] โ
โ --ignore-verifications/--no-ignore-verifications Whether to perform verifications on the file before loading into dataset [default: ignore-verifications] โ
โ --huggingface-hub-token TEXT Hugging Face Hub authentication token [default: ***] โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Under the hood hugit
uses typed-settings
, which means that configuration can either be done through the command line or through a TOML
file. See [usage] for more detailed discussion of how to use hugit
.
Contributing#
It is likely that Hugit may only work for our particular workflow. With that said if you have suggestions please open an issue.
License#
Distributed under the terms of the MIT license, Hugit is free and open source software.
Issues#
If you encounter any problems, please file an issue along with a detailed description.
Credits#
This project was generated from @cjolowiczโs Hypermodern Python Cookiecutter template.