Parsing GEPRIS for the list of funded NFDI projects with GEPRIS IDs and descriptions

Check out the the GEPRIS user interface for advanced search: https://gepris.dfg.de/gepris/OCTOPUS?task=doSearchExtended&context=projekt&keywords_criterion=NFDI&nurProjekteMitAB=false&findButton=Finden&person=&location=&fachlicheZuordnung=#&pemu=32&peu=#&zk_transferprojekt=false&teilprojekte=false&teilprojekte=true&bewilligungsStatus=&beginOfFunding=&gefoerdertIn=&oldContinentId=#&continentId=#&oldSubContinentId=##&subContinentId=##&oldCountryId=###&countryKey=###&einrichtungsart=-1

Getting HTML via requests

We use requests library to get HTML of that page into text variable and print first 36 characters of it.

import requests
GEPRIS_URL = "https://gepris.dfg.de/gepris/OCTOPUS"
params = {'keywords_criterion': '',
          'nurProjekteMitAB': 'false',
          'findButton': 'Finden',
          'task': 'doSearchExtended',
          'pemu': 32,
          'context': 'projekt',
          'language': 'en',
          'hitsPerPage': 50,
          'index': 0}
r = requests.get(GEPRIS_URL, params=params)
text = r.text
print(text[0:36])
<?xml version="1.0" encoding="utf-8"

Parsing search results from HTML via BeautifulSoup

We use the BeautifulSoup library. The number of pages for search results is

from bs4 import BeautifulSoup
try:
    pages = int(soup.find('span', id="result-info").find('strong').text.split()[0])
except:
    pages = 1
print(pages)
1

All search results are found via

soup = BeautifulSoup(text, 'html.parser')
results = soup.find_all("div", class_="results")
print(results)
[<div class="results">
<h2><a href="/gepris/projekt/441914366">GHGA – German Human Genome-Phenome Archive</a></h2> <span id="icons"><a href="/gepris/projekt/441914366?displayMode=print&amp;findButton=Finden&amp;hitsPerPage=50&amp;index=0&amp;keywords_criterion=&amp;language=en&amp;nurProjekteMitAB=false&amp;pemu=32" rel="nofollow" target="_blank" title="Open print view"><img alt="Print View" src="/gepris/images/iconPrint.gif"/></a></span></div>, <div class="results">
<h2><a href="/gepris/projekt/441926934">NFDI4Cat – NFDI for Catalysis-Related Sciences</a></h2> <span id="icons"><a href="/gepris/projekt/441926934?displayMode=print&amp;findButton=Finden&amp;hitsPerPage=50&amp;index=0&amp;keywords_criterion=&amp;language=en&amp;nurProjekteMitAB=false&amp;pemu=32" rel="nofollow" target="_blank" title="Open print view"><img alt="Print View" src="/gepris/images/iconPrint.gif"/></a></span></div>, <div class="results">
<h2><a href="/gepris/projekt/441958017">NFDI4Culture – Consortium for research data on material and immaterial cultural heritage</a></h2> <span id="icons"><a href="/gepris/projekt/441958017?displayMode=print&amp;findButton=Finden&amp;hitsPerPage=50&amp;index=0&amp;keywords_criterion=&amp;language=en&amp;nurProjekteMitAB=false&amp;pemu=32" rel="nofollow" target="_blank" title="Open print view"><img alt="Print View" src="/gepris/images/iconPrint.gif"/></a></span></div>, <div class="results">
<h2><a href="/gepris/projekt/441958208">NFDI4Chem – Chemistry Consortium in the NFDI</a></h2> <span id="icons"><a href="/gepris/projekt/441958208?displayMode=print&amp;findButton=Finden&amp;hitsPerPage=50&amp;index=0&amp;keywords_criterion=&amp;language=en&amp;nurProjekteMitAB=false&amp;pemu=32" rel="nofollow" target="_blank" title="Open print view"><img alt="Print View" src="/gepris/images/iconPrint.gif"/></a></span></div>, <div class="results">
<h2><a href="/gepris/projekt/442032008">NFDI4BioDiversity – Biodiversity, Ecology &amp; Environmental Data</a></h2> <span id="icons"><a href="/gepris/projekt/442032008?displayMode=print&amp;findButton=Finden&amp;hitsPerPage=50&amp;index=0&amp;keywords_criterion=&amp;language=en&amp;nurProjekteMitAB=false&amp;pemu=32" rel="nofollow" target="_blank" title="Open print view"><img alt="Print View" src="/gepris/images/iconPrint.gif"/></a></span></div>, <div class="results">
<h2><a href="/gepris/projekt/442077441">DataPLANT – Data in PLANT research</a></h2> <span id="icons"><a href="/gepris/projekt/442077441?displayMode=print&amp;findButton=Finden&amp;hitsPerPage=50&amp;index=0&amp;keywords_criterion=&amp;language=en&amp;nurProjekteMitAB=false&amp;pemu=32" rel="nofollow" target="_blank" title="Open print view"><img alt="Print View" src="/gepris/images/iconPrint.gif"/></a></span></div>, <div class="results">
<h2><a href="/gepris/projekt/442146713">NFDI4Ing – National Research Data Infrastructure for Engineering Services</a></h2> <span id="icons"><a href="/gepris/projekt/442146713?displayMode=print&amp;findButton=Finden&amp;hitsPerPage=50&amp;index=0&amp;keywords_criterion=&amp;language=en&amp;nurProjekteMitAB=false&amp;pemu=32" rel="nofollow" target="_blank" title="Open print view"><img alt="Print View" src="/gepris/images/iconPrint.gif"/></a></span></div>, <div class="results">
<h2><a href="/gepris/projekt/442326535">NFDI4Health – National Research Data Infrastructure for Personal Health Data</a></h2> <span id="icons"><a href="/gepris/projekt/442326535?displayMode=print&amp;findButton=Finden&amp;hitsPerPage=50&amp;index=0&amp;keywords_criterion=&amp;language=en&amp;nurProjekteMitAB=false&amp;pemu=32" rel="nofollow" target="_blank" title="Open print view"><img alt="Print View" src="/gepris/images/iconPrint.gif"/></a></span></div>, <div class="results">
<h2><a href="/gepris/projekt/442494171">KonsortSWD – Consortium for the Social, Behavioural, Educational, and Economic Sciences</a></h2> <span id="icons"><a href="/gepris/projekt/442494171?displayMode=print&amp;findButton=Finden&amp;hitsPerPage=50&amp;index=0&amp;keywords_criterion=&amp;language=en&amp;nurProjekteMitAB=false&amp;pemu=32" rel="nofollow" target="_blank" title="Open print view"><img alt="Print View" src="/gepris/images/iconPrint.gif"/></a></span></div>, <div class="results">
<h2><a href="/gepris/projekt/460033370">Text+</a></h2> <span id="icons"><a href="/gepris/projekt/460033370?displayMode=print&amp;findButton=Finden&amp;hitsPerPage=50&amp;index=0&amp;keywords_criterion=&amp;language=en&amp;nurProjekteMitAB=false&amp;pemu=32" rel="nofollow" target="_blank" title="Open print view"><img alt="Print View" src="/gepris/images/iconPrint.gif"/></a></span></div>, <div class="results">
<h2><a href="/gepris/projekt/460036893">NFDI4Earth - NFDI Consortium Earth System Sciences</a></h2> <span id="icons"><a href="/gepris/projekt/460036893?displayMode=print&amp;findButton=Finden&amp;hitsPerPage=50&amp;index=0&amp;keywords_criterion=&amp;language=en&amp;nurProjekteMitAB=false&amp;pemu=32" rel="nofollow" target="_blank" title="Open print view"><img alt="Print View" src="/gepris/images/iconPrint.gif"/></a></span></div>, <div class="results">
<h2><a href="/gepris/projekt/460037581">BERD@NFDI - NFDI for Business, Economic and Related Data</a></h2> <span id="icons"><a href="/gepris/projekt/460037581?displayMode=print&amp;findButton=Finden&amp;hitsPerPage=50&amp;index=0&amp;keywords_criterion=&amp;language=en&amp;nurProjekteMitAB=false&amp;pemu=32" rel="nofollow" target="_blank" title="Open print view"><img alt="Print View" src="/gepris/images/iconPrint.gif"/></a></span></div>, <div class="results">
<h2><a href="/gepris/projekt/460129525">NFDI4Microbiota - National Research Data Infrastructure for Microbiota Research</a></h2> <span id="icons"><a href="/gepris/projekt/460129525?displayMode=print&amp;findButton=Finden&amp;hitsPerPage=50&amp;index=0&amp;keywords_criterion=&amp;language=en&amp;nurProjekteMitAB=false&amp;pemu=32" rel="nofollow" target="_blank" title="Open print view"><img alt="Print View" src="/gepris/images/iconPrint.gif"/></a></span></div>, <div class="results">
<h2><a href="/gepris/projekt/460135501">MaRDI - Mathematical Research Data Initiative</a></h2> <span id="icons"><a href="/gepris/projekt/460135501?displayMode=print&amp;findButton=Finden&amp;hitsPerPage=50&amp;index=0&amp;keywords_criterion=&amp;language=en&amp;nurProjekteMitAB=false&amp;pemu=32" rel="nofollow" target="_blank" title="Open print view"><img alt="Print View" src="/gepris/images/iconPrint.gif"/></a></span></div>, <div class="results">
<h2><a href="/gepris/projekt/460197019">FAIRmat – FAIR Data Infrastructure for Condensed-Matter Physics and the Chemical Physics of Solids</a></h2> <span id="icons"><a href="/gepris/projekt/460197019?displayMode=print&amp;findButton=Finden&amp;hitsPerPage=50&amp;index=0&amp;keywords_criterion=&amp;language=en&amp;nurProjekteMitAB=false&amp;pemu=32" rel="nofollow" target="_blank" title="Open print view"><img alt="Print View" src="/gepris/images/iconPrint.gif"/></a></span></div>, <div class="results">
<h2><a href="/gepris/projekt/460234259">NFDI4DS - NFDI for Data Science and Artificial Intelligence</a></h2> <span id="icons"><a href="/gepris/projekt/460234259?displayMode=print&amp;findButton=Finden&amp;hitsPerPage=50&amp;index=0&amp;keywords_criterion=&amp;language=en&amp;nurProjekteMitAB=false&amp;pemu=32" rel="nofollow" target="_blank" title="Open print view"><img alt="Print View" src="/gepris/images/iconPrint.gif"/></a></span></div>, <div class="results">
<h2><a href="/gepris/projekt/460247524">NFDI-MatWerk - National Research Data Infrastructure for Materials Science &amp; Engineering</a></h2> <span id="icons"><a href="/gepris/projekt/460247524?displayMode=print&amp;findButton=Finden&amp;hitsPerPage=50&amp;index=0&amp;keywords_criterion=&amp;language=en&amp;nurProjekteMitAB=false&amp;pemu=32" rel="nofollow" target="_blank" title="Open print view"><img alt="Print View" src="/gepris/images/iconPrint.gif"/></a></span></div>, <div class="results">
<h2><a href="/gepris/projekt/460248186">PUNCH4NFDI - Particles, Universe, NuClei and Hadrons for the NFDI</a></h2> <span id="icons"><a href="/gepris/projekt/460248186?displayMode=print&amp;findButton=Finden&amp;hitsPerPage=50&amp;index=0&amp;keywords_criterion=&amp;language=en&amp;nurProjekteMitAB=false&amp;pemu=32" rel="nofollow" target="_blank" title="Open print view"><img alt="Print View" src="/gepris/images/iconPrint.gif"/></a></span></div>, <div class="results">
<h2><a href="/gepris/projekt/460248799">DAPHNE4NFDI - DAta from PHoton and Neutron Experiments for NFDI</a></h2> <span id="icons"><a href="/gepris/projekt/460248799?displayMode=print&amp;findButton=Finden&amp;hitsPerPage=50&amp;index=0&amp;keywords_criterion=&amp;language=en&amp;nurProjekteMitAB=false&amp;pemu=32" rel="nofollow" target="_blank" title="Open print view"><img alt="Print View" src="/gepris/images/iconPrint.gif"/></a></span></div>]

Let’s process a bit those results.

consortia = []
for result in results:
    a = result.find('a')
    t = a.get_text().replace(' – ', ' - ')
    try:
        [title, description] = t.split(' - ')
    except:
        [title, description] = [t, '']
    consortia.append(["https://gepris.dfg.de" + a.get('href'), title, description])
print(consortia)
[['https://gepris.dfg.de/gepris/projekt/441914366', 'GHGA', 'German Human Genome-Phenome Archive'], ['https://gepris.dfg.de/gepris/projekt/441926934', 'NFDI4Cat', 'NFDI for Catalysis-Related Sciences'], ['https://gepris.dfg.de/gepris/projekt/441958017', 'NFDI4Culture', 'Consortium for research data on material and immaterial cultural heritage'], ['https://gepris.dfg.de/gepris/projekt/441958208', 'NFDI4Chem', 'Chemistry Consortium in the NFDI'], ['https://gepris.dfg.de/gepris/projekt/442032008', 'NFDI4BioDiversity', 'Biodiversity, Ecology & Environmental Data'], ['https://gepris.dfg.de/gepris/projekt/442077441', 'DataPLANT', 'Data in PLANT research'], ['https://gepris.dfg.de/gepris/projekt/442146713', 'NFDI4Ing', 'National Research Data Infrastructure for Engineering Services'], ['https://gepris.dfg.de/gepris/projekt/442326535', 'NFDI4Health', 'National Research Data Infrastructure for Personal Health Data'], ['https://gepris.dfg.de/gepris/projekt/442494171', 'KonsortSWD', 'Consortium for the Social, Behavioural, Educational, and Economic Sciences'], ['https://gepris.dfg.de/gepris/projekt/460033370', 'Text+', ''], ['https://gepris.dfg.de/gepris/projekt/460036893', 'NFDI4Earth', 'NFDI Consortium Earth System Sciences'], ['https://gepris.dfg.de/gepris/projekt/460037581', 'BERD@NFDI', 'NFDI for Business, Economic and Related Data'], ['https://gepris.dfg.de/gepris/projekt/460129525', 'NFDI4Microbiota', 'National Research Data Infrastructure for Microbiota Research'], ['https://gepris.dfg.de/gepris/projekt/460135501', 'MaRDI', 'Mathematical Research Data Initiative'], ['https://gepris.dfg.de/gepris/projekt/460197019', 'FAIRmat', 'FAIR Data Infrastructure for Condensed-Matter Physics and the Chemical Physics of Solids'], ['https://gepris.dfg.de/gepris/projekt/460234259', 'NFDI4DS', 'NFDI for Data Science and Artificial Intelligence'], ['https://gepris.dfg.de/gepris/projekt/460247524', 'NFDI-MatWerk', 'National Research Data Infrastructure for Materials Science & Engineering'], ['https://gepris.dfg.de/gepris/projekt/460248186', 'PUNCH4NFDI', 'Particles, Universe, NuClei and Hadrons for the NFDI'], ['https://gepris.dfg.de/gepris/projekt/460248799', 'DAPHNE4NFDI', 'DAta from PHoton and Neutron Experiments for NFDI']]

Finally, we create a pandas-dataframe

import pandas as pd
nfdi = pd.DataFrame(consortia, columns=['GEPRIS', 'Title', 'Description'])
nfdi
GEPRIS Title Description
0 https://gepris.dfg.de/gepris/projekt/441914366 GHGA German Human Genome-Phenome Archive
1 https://gepris.dfg.de/gepris/projekt/441926934 NFDI4Cat NFDI for Catalysis-Related Sciences
2 https://gepris.dfg.de/gepris/projekt/441958017 NFDI4Culture Consortium for research data on material and i...
3 https://gepris.dfg.de/gepris/projekt/441958208 NFDI4Chem Chemistry Consortium in the NFDI
4 https://gepris.dfg.de/gepris/projekt/442032008 NFDI4BioDiversity Biodiversity, Ecology & Environmental Data
5 https://gepris.dfg.de/gepris/projekt/442077441 DataPLANT Data in PLANT research
6 https://gepris.dfg.de/gepris/projekt/442146713 NFDI4Ing National Research Data Infrastructure for Engi...
7 https://gepris.dfg.de/gepris/projekt/442326535 NFDI4Health National Research Data Infrastructure for Pers...
8 https://gepris.dfg.de/gepris/projekt/442494171 KonsortSWD Consortium for the Social, Behavioural, Educat...
9 https://gepris.dfg.de/gepris/projekt/460033370 Text+
10 https://gepris.dfg.de/gepris/projekt/460036893 NFDI4Earth NFDI Consortium Earth System Sciences
11 https://gepris.dfg.de/gepris/projekt/460037581 BERD@NFDI NFDI for Business, Economic and Related Data
12 https://gepris.dfg.de/gepris/projekt/460129525 NFDI4Microbiota National Research Data Infrastructure for Micr...
13 https://gepris.dfg.de/gepris/projekt/460135501 MaRDI Mathematical Research Data Initiative
14 https://gepris.dfg.de/gepris/projekt/460197019 FAIRmat FAIR Data Infrastructure for Condensed-Matter ...
15 https://gepris.dfg.de/gepris/projekt/460234259 NFDI4DS NFDI for Data Science and Artificial Intelligence
16 https://gepris.dfg.de/gepris/projekt/460247524 NFDI-MatWerk National Research Data Infrastructure for Mate...
17 https://gepris.dfg.de/gepris/projekt/460248186 PUNCH4NFDI Particles, Universe, NuClei and Hadrons for th...
18 https://gepris.dfg.de/gepris/projekt/460248799 DAPHNE4NFDI DAta from PHoton and Neutron Experiments for NFDI

Let’s save the dataframe to CSV-file.

nfdi.to_csv("../../../data/GEPRIS_NFDI_all.csv", index=False, encoding='utf-8')