About & Methodology - Health AI Hub

Our Mission

Health AI Hub curates and summarizes the latest medical and health-related artificial intelligence research from arXiv, making cutting-edge findings accessible to researchers, clinicians, and healthcare professionals.

Collection Methodology

1. Paper Discovery

We automatically query arXiv daily for papers matching medical and health-related keywords across multiple categories including:

cs.LG - Machine Learning
cs.CV - Computer Vision (medical imaging)
cs.AI - Artificial Intelligence
q-bio - Quantitative Biology
physics.med-ph - Medical Physics
stat.ML - Machine Learning Statistics

2. Relevance Filtering

Each discovered paper is analyzed by AI to determine its medical and health relevance. Papers are scored on a 0-1 scale based on:

Direct medical applications
Health-related datasets or experiments
Clinical relevance and potential impact
Intersection with healthcare technology

Only papers scoring above our relevance threshold are included in the curation.

3. AI Summarization

Curated papers are processed through Google Gemini 2.5-flash, our primary AI provider, to generate:

Concise executive summaries (1-2 paragraphs)
Key findings and methodologies
Medical relevance analysis
Potential healthcare applications
Relevant medical domains (e.g., Oncology, Radiology, Public Health)
Extracted keywords for discoverability

We also support alternative AI providers (OpenAI GPT-4, Anthropic Claude, xAI Grok) for redundancy.

4. Domain Classification

Papers are automatically categorized into medical specialty domains to help specialists find relevant research:

Oncology
Radiology & Diagnostic Imaging
Drug Discovery & Pharmacology
Clinical Informatics
Public Health & Epidemiology
Pathology
Genomics & Personalized Medicine
...and 60+ more specialized domains

Data Storage & Updates

All curated papers are stored in a JSON-based database with metadata including:

arXiv ID, title, authors, abstract
Publication date and categories
AI-generated summary and analysis
Medical domains and keywords
Relevance score
Citation data (when available via Semantic Scholar API)

Update Frequency: The system runs daily via GitHub Actions, automatically discovering and processing new papers.

Quality Assurance

Automated Validation: Papers must pass relevance thresholds and domain matching
Source Verification: All papers link directly to authoritative arXiv sources
AI Provider Fallback: Multiple AI providers ensure continuous operation
Human Oversight: Regular review of filtering criteria and summary quality

Technology Stack

arXiv API: Official arXiv API for paper discovery
AI Providers: Gemini 2.5-flash (primary), GPT-4, Claude, Grok
Database: TinyDB (JSON-based, version-controlled)
Static Site Generation: Python + Jinja2 templates
Hosting: GitHub Pages with custom domain
Automation: GitHub Actions for daily updates

About the Curator

Bryan Tegomoh is the creator and maintainer of Health AI Hub. This project combines expertise in AI/ML, healthcare informatics, and software engineering to bridge the gap between cutting-edge research and practical medical applications.

Contact: bryan@arxiv-health.org

Twitter/X: @ArXiv_Health

Newsletter: Subscribe on Substack

Open Source

Health AI Hub is fully open source and available on GitHub. Contributions, suggestions, and discussions are welcome!

Repository: github.com/BryanTegomoh/arxiv-health

License: MIT License

Disclaimer

Health AI Hub provides research summaries for informational and educational purposes only. Content is AI-generated and may contain errors. Always consult original papers and qualified healthcare professionals for medical decisions. This site is not affiliated with arXiv.org.