Our Mission
Health AI Hub curates and summarizes the latest medical and health-related artificial intelligence research from arXiv, making cutting-edge findings accessible to researchers, clinicians, and healthcare professionals.
Collection Methodology
1. Paper Discovery
We automatically query arXiv daily for papers matching medical and health-related keywords across multiple categories including:
- cs.LG - Machine Learning
- cs.CV - Computer Vision (medical imaging)
- cs.AI - Artificial Intelligence
- q-bio - Quantitative Biology
- physics.med-ph - Medical Physics
- stat.ML - Machine Learning Statistics
2. Relevance Filtering
Each discovered paper is analyzed by AI to determine its medical and health relevance. Papers are scored on a 0-1 scale based on:
- Direct medical applications
- Health-related datasets or experiments
- Clinical relevance and potential impact
- Intersection with healthcare technology
Only papers scoring above our relevance threshold are included in the curation.
3. AI Summarization
Curated papers are processed through Google Gemini 2.5-flash, our primary AI provider, to generate:
- Concise executive summaries (1-2 paragraphs)
- Key findings and methodologies
- Medical relevance analysis
- Potential healthcare applications
- Relevant medical domains (e.g., Oncology, Radiology, Public Health)
- Extracted keywords for discoverability
We also support alternative AI providers (OpenAI GPT-4, Anthropic Claude, xAI Grok) for redundancy.
4. Domain Classification
Papers are automatically categorized into medical specialty domains to help specialists find relevant research:
- Oncology
- Radiology & Diagnostic Imaging
- Drug Discovery & Pharmacology
- Clinical Informatics
- Public Health & Epidemiology
- Pathology
- Genomics & Personalized Medicine
- ...and 60+ more specialized domains
Data Storage & Updates
All curated papers are stored in a JSON-based database with metadata including:
- arXiv ID, title, authors, abstract
- Publication date and categories
- AI-generated summary and analysis
- Medical domains and keywords
- Relevance score
- Citation data (when available via Semantic Scholar API)
Update Frequency: The system runs daily via GitHub Actions, automatically discovering and processing new papers.
Quality Assurance
- Automated Validation: Papers must pass relevance thresholds and domain matching
- Source Verification: All papers link directly to authoritative arXiv sources
- AI Provider Fallback: Multiple AI providers ensure continuous operation
- Human Oversight: Regular review of filtering criteria and summary quality
Technology Stack
- arXiv API: Official arXiv API for paper discovery
- AI Providers: Gemini 2.5-flash (primary), GPT-4, Claude, Grok
- Database: TinyDB (JSON-based, version-controlled)
- Static Site Generation: Python + Jinja2 templates
- Hosting: GitHub Pages with custom domain
- Automation: GitHub Actions for daily updates
About the Curator
Bryan Tegomoh is the creator and maintainer of Health AI Hub. This project combines expertise in AI/ML, healthcare informatics, and software engineering to bridge the gap between cutting-edge research and practical medical applications.
Contact: bryan@arxiv-health.org
Twitter/X: @ArXiv_Health
Newsletter: Subscribe on Substack
Open Source
Health AI Hub is fully open source and available on GitHub. Contributions, suggestions, and discussions are welcome!
Repository: github.com/BryanTegomoh/arxiv-health
License: MIT License
Disclaimer
Health AI Hub provides research summaries for informational and educational purposes only. Content is AI-generated and may contain errors. Always consult original papers and qualified healthcare professionals for medical decisions. This site is not affiliated with arXiv.org.