Agent Skill
2/7/2026jgi-lakehouse
Query JGI Lakehouse (Dremio) for genomics metadata across GOLD/IMG/Mycocosm/Phytozome and download genome files via IMG taxon OIDs.
F
fmschulz
0GitHub Stars
1Views
npx skills add fmschulz/omics-skills
SKILL.md
| Name | jgi-lakehouse |
| Description | Query JGI Lakehouse (Dremio) for genomics metadata across GOLD/IMG/Mycocosm/Phytozome and download genome files via IMG taxon OIDs. |
name: jgi-lakehouse description: Query JGI Lakehouse (Dremio) for genomics metadata across GOLD/IMG/Mycocosm/Phytozome and download genome files via IMG taxon OIDs.
JGI Lakehouse
Use JGI Lakehouse (Dremio) for metadata queries and the JGI filesystem for sequence downloads.
Instructions
- Authenticate to Dremio using a PAT.
- Explore schemas and tables to find the required metadata.
- Run SQL queries for project/sample/taxon discovery.
- Use IMG taxon OIDs to fetch genome packages from the filesystem.
- Validate outputs and record provenance.
Quick Reference
| Task | Action |
|---|---|
| Auth setup | See docs/authentication.md |
| SQL cheatsheet | See docs/sql-quick-reference.md |
| Table catalog | See docs/data-catalog.md |
| GOLD exploration | See docs/explore_gold.md |
Input Requirements
- DREMIO_PAT token (for Lakehouse access)
- Query intent (taxonomy, ecosystem, project IDs, etc.)
- JGI filesystem access for downloads
Output
- Query results (tables or CSVs)
- Lists of taxon OIDs or accessions
- Downloaded genome packages (FNA/FAA/GFF)
Quality Gates
- SQL queries return expected row counts
- Taxon OIDs map to existing filesystem packages
- Downloaded files pass basic integrity checks
Examples
Example 1: Basic GOLD query
SELECT gold_id, project_name
FROM "gold-db-2 postgresql".gold.project
WHERE is_public = 'Yes'
LIMIT 5;
Troubleshooting
Issue: Authentication failures Solution: Re-create the PAT and confirm it is exported before querying.
Issue: Missing genome files Solution: Verify IMG taxon OIDs and filesystem path permissions.
Skills Info
Original Name:jgi-lakehouseAuthor:fmschulz
Download