Usage of MaRDA Metadata Extractors Schema

Usage example

This repository is intended to be used as a git submodule to be cloned and used by your downstream code. As an example, we may look at the MaRDA Metadata Extractors Registry:

A screenshot of the MaRDA Metadata Extractors Registry showing "schemas" as a link to another repository.

A screenshot of MaRDA Metadata Extractors Registry. Note that schemas is a git submodule, pointing to the Schema repository at a certain commit (here: c03a732, corresponding to the 0.2 release).

After initializing and updating the submodule, the yaml files defining the FileType and Extractor schemas are available in the <submodule>/schemas/ directory.

Validation

The schema definitions contained in this repository can be used to locally validate your own FileTypes and Extractors. Several examples are provided for this purpose in the examples folder.

To get started, first make sure LinkML is installed in your python environment:

pip install linkml~=1.3

Then, you can check the validity of your filetype or extractor definition against the provided schemas using linkml-validate. For example, to validate the provided example FileType definition in netcdf.yml against the FileType schema, run:

linkml-validate -s <submodule>/schemas/filetype.yml -C FileType <submodule>/examples/filetype/netcdf.yml

If successful, you should see No problems found returned by linkml-validate.

Translation

The LinkML schemas provided here can be automatically translated to other formats, including JSONSchema, Python dataclasses, or Pydantic classes:

gen-json-schema <submodule>/schemas/filetype.yml >> filetype.json
gen-python <submodule>/schemas/filetype.yml >> filetype.py
gen-pydantic <submodule>/schemas/filetype.yml >> filetype.py

The generated files are intended to be used in downstream codes such as in the validation function of the MaRDA Metadata Extractors Registry.