About Health AI Hub

Our Methodology & Mission

Our Mission

Health AI Hub curates and summarizes the latest medical and health-related artificial intelligence research from arXiv, making cutting-edge findings accessible to researchers, clinicians, and healthcare professionals.

Collection Methodology

1. Paper Discovery

We automatically query arXiv daily for papers matching medical and health-related keywords across multiple categories including:

  • cs.LG - Machine Learning
  • cs.CV - Computer Vision (medical imaging)
  • cs.AI - Artificial Intelligence
  • q-bio - Quantitative Biology
  • physics.med-ph - Medical Physics
  • stat.ML - Machine Learning Statistics

2. Relevance Filtering

Each discovered paper is analyzed by AI to determine its medical and health relevance. Papers are scored on a 0-1 scale based on:

  • Direct medical applications
  • Health-related datasets or experiments
  • Clinical relevance and potential impact
  • Intersection with healthcare technology

Only papers scoring above our relevance threshold are included in the curation.

3. AI Summarization

Curated papers are processed through Google Gemini 2.5-flash, our primary AI provider, to generate:

  • Concise executive summaries (1-2 paragraphs)
  • Key findings and methodologies
  • Medical relevance analysis
  • Potential healthcare applications
  • Relevant medical domains (e.g., Oncology, Radiology, Public Health)
  • Extracted keywords for discoverability

We also support alternative AI providers (OpenAI GPT-4, Anthropic Claude, xAI Grok) for redundancy.

4. Domain Classification

Papers are automatically categorized into medical specialty domains to help specialists find relevant research:

  • Oncology
  • Radiology & Diagnostic Imaging
  • Drug Discovery & Pharmacology
  • Clinical Informatics
  • Public Health & Epidemiology
  • Pathology
  • Genomics & Personalized Medicine
  • ...and 60+ more specialized domains

Data Storage & Updates

All curated papers are stored in a JSON-based database with metadata including:

  • arXiv ID, title, authors, abstract
  • Publication date and categories
  • AI-generated summary and analysis
  • Medical domains and keywords
  • Relevance score
  • Citation data (when available via Semantic Scholar API)

Update Frequency: The system runs daily via GitHub Actions, automatically discovering and processing new papers.

Quality Assurance

  • Automated Validation: Papers must pass relevance thresholds and domain matching
  • Source Verification: All papers link directly to authoritative arXiv sources
  • AI Provider Fallback: Multiple AI providers ensure continuous operation
  • Human Oversight: Regular review of filtering criteria and summary quality

Technology Stack

  • arXiv API: Official arXiv API for paper discovery
  • AI Providers: Gemini 2.5-flash (primary), GPT-4, Claude, Grok
  • Database: TinyDB (JSON-based, version-controlled)
  • Static Site Generation: Python + Jinja2 templates
  • Hosting: GitHub Pages with custom domain
  • Automation: GitHub Actions for daily updates

About the Curator

Bryan Tegomoh is the creator and maintainer of Health AI Hub. This project combines expertise in AI/ML, healthcare informatics, and software engineering to bridge the gap between cutting-edge research and practical medical applications.

Contact: bryan@arxiv-health.org

Twitter/X: @ArXiv_Health

Newsletter: Subscribe on Substack

Open Source

Health AI Hub is fully open source and available on GitHub. Contributions, suggestions, and discussions are welcome!

Repository: github.com/BryanTegomoh/arxiv-health

License: MIT License

Disclaimer

Health AI Hub provides research summaries for informational and educational purposes only. Content is AI-generated and may contain errors. Always consult original papers and qualified healthcare professionals for medical decisions. This site is not affiliated with arXiv.org.