Hugit#
Warning: this code is very much a work in progress and is primarily being intended for a particular workflow. It may not work well (or at all) for your workflow.
hugit
is a command line tool for loading ImageFolder style datasets into a 🤗 datasets
Dataset
and pushing to the 🤗 hub.
The primary goal of hugit
is to help quickly get a local dataset into a format that can be used for training computer vision models. hugit
was developed to support the workflow for flyswot
where we wanted a quicker iteration between creating new training data, training a model, and using the new model inside flyswot
.
Supported formats#
At the moment hugit supports ImageFolder style datasets i.e:
data/
dog/
dog1.jpg
cat/
cat.1.jpg
Features#
A command line interface for quickly loading a dataset stored on disk into a 🤗
datasets.Dataset
Push your local dataset to the 🤗 hub
Get statistics about your dataset. These statistics focus on ‘high level’ statistic that would be useful to include in Datasheets and Model Cards. Currently these statistics include:
label frequencies, organised by split
train, test, valid split sizes
Installation#
You can install Hugit via pip from PyPI, inside a virtual environment install hugit
using
$ pip install hugit
Alternatively, you can use pipx to install hugit
$ pipx install hugit
Usage#
You can see help for hugit
using hugit --help
Usage: hugit [OPTIONS] COMMAND [ARGS]...
Hugit Command Line
Options:
--help Show this message and exit.
Commands:
convert_images Convert images in directory to `save_format`
push_image_dataset Load an ImageFolder style dataset.
To load an ImageFolder style dataset onto the 🤗 Hub you can use the push_image_dataset
command.
Usage: hugit push_image_dataset [OPTIONS] DIRECTORY
Load an ImageFolder style dataset.
Options:
--repo-id TEXT Repo id for the Hugging Face Hub [required]
--private / --no-private Whether to keep dataset private on the Hub
[default: private]
--do-resize / --no-do-resize Whether to resize images before upload
[default: do-resize]
--size INTEGER Size to resize image. This will be used on the
shortest side of the image i.e. the aspect
rato will be maintained [default: 224]
--preserve-file-path / --no-preserve-file-path
preserve_orginal_file_path [default:
preserve-file-path]
--help Show this message and exit.
Under the hood hugit
uses typed-settings
, which means that configuration can either be done through the command line or through a TOML
file. See [usage] for more detailed discussion of how to use hugit
.
Contributing#
It is likely that Hugit may only work for our particular workflow. With that said if you have suggestions please open an issue.
License#
Distributed under the terms of the MIT license, Hugit is free and open source software.
Issues#
If you encounter any problems, please file an issue along with a detailed description.
Credits#
This project was generated from @cjolowicz’s Hypermodern Python Cookiecutter template.