Agent Skill
2/7/2026

jgi-lakehouse

Query JGI Lakehouse (Dremio) for genomics metadata across GOLD/IMG/Mycocosm/Phytozome and download genome files via IMG taxon OIDs.

F
fmschulz
0GitHub Stars
1Views
npx skills add fmschulz/omics-skills

SKILL.md

Namejgi-lakehouse
DescriptionQuery JGI Lakehouse (Dremio) for genomics metadata across GOLD/IMG/Mycocosm/Phytozome and download genome files via IMG taxon OIDs.

name: jgi-lakehouse description: Query JGI Lakehouse (Dremio) for genomics metadata across GOLD/IMG/Mycocosm/Phytozome and download genome files via IMG taxon OIDs.

JGI Lakehouse

Use JGI Lakehouse (Dremio) for metadata queries and the JGI filesystem for sequence downloads.

Instructions

  1. Authenticate to Dremio using a PAT.
  2. Explore schemas and tables to find the required metadata.
  3. Run SQL queries for project/sample/taxon discovery.
  4. Use IMG taxon OIDs to fetch genome packages from the filesystem.
  5. Validate outputs and record provenance.

Quick Reference

TaskAction
Auth setupSee docs/authentication.md
SQL cheatsheetSee docs/sql-quick-reference.md
Table catalogSee docs/data-catalog.md
GOLD explorationSee docs/explore_gold.md

Input Requirements

  • DREMIO_PAT token (for Lakehouse access)
  • Query intent (taxonomy, ecosystem, project IDs, etc.)
  • JGI filesystem access for downloads

Output

  • Query results (tables or CSVs)
  • Lists of taxon OIDs or accessions
  • Downloaded genome packages (FNA/FAA/GFF)

Quality Gates

  • SQL queries return expected row counts
  • Taxon OIDs map to existing filesystem packages
  • Downloaded files pass basic integrity checks

Examples

Example 1: Basic GOLD query

SELECT gold_id, project_name
FROM "gold-db-2 postgresql".gold.project
WHERE is_public = 'Yes'
LIMIT 5;

Troubleshooting

Issue: Authentication failures Solution: Re-create the PAT and confirm it is exported before querying.

Issue: Missing genome files Solution: Verify IMG taxon OIDs and filesystem path permissions.

Skills Info
Original Name:jgi-lakehouseAuthor:fmschulz