Parsing DFG for the list of funded NFDI consortia with homepages and descriptions¶
We google “accepted NFDI consortia” and find the page with accepted consortia: https://www.dfg.de/en/research_funding/programmes/nfdi/funded_consortia/index.html.
Getting HTML via requests¶
We use requests library to get HTML of that page into text
variable and print first 33 characters of it.
import requests
NFDI_URL = "https://www.dfg.de/en/research_funding/programmes/nfdi/funded_consortia/index.html"
r = requests.get(NFDI_URL)
text = r.text
print(text[0:33])
<!DOCTYPE html>
<html lang="en">
Parsing all tables from HTML via pandas¶
Indeed text
variable contains HTML. We can parse all tables from it using the pandas library.
import pandas as pd
pd.set_option('display.width', 1000)
df_list = pd.read_html(text.encode('latin1').decode('utf8'))
for df in df_list:
print(df)
Titel Link
0 DataPLANT - Data in Plant research (Biology)Ex... Externer Linkhttp://nfdi4plants.de/
1 GHGA - German Human Genome Archive (Medicine)E... Externer Linkhttps://ghga.dkfz.de/
2 KonsortSWD - Consortium for the Social, Behavi... Externer Linkhttps://www.konsortswd.de/
3 NFDI4BioDiversität - Biodiversity, Ecology & E... Externer Linkhttps://www.nfdi4biodiversity.org/
4 NFDI4Cat - NFDI for Catalysis-Related Sciences... Externer Linkhttp://gecats.org/NFDI4Cat.html
5 NFDI4Chem - Chemistry Consortium in the NFDI (... Externer Linkhttps://www.nfdi4chem.de/
6 NFDI4Culture - Consortium for research data on... Externer Linkhttps://nfdi4culture.de/
7 NFDI4Health - National Research Data Infrastru... Externer Linkhttps://www.nfdi4health.de/
8 NFDI4Ing - National Research Data Infrastructu... Externer Linkhttps://nfdi4ing.de/
Titel Link
0 BERD@NFDI - NFDI for Business, Economic and Re... Externer Linkhttps://www.berd-nfdi.de/
1 DAPHNE4NFDI - DAta from PHoton and Neutron Exp... Externer Linkhttps://www.sni-portal.de/de/daph...
2 FAIRmat - FAIR Data Infrastructure for Condens... Externer Linkhttps://www.fair-di.eu/fairmat/fa...
3 MaRDI - Mathematical Research Data Initiative ... Externer Linkhttps://www.mardi4nfdi.de/
4 NFDI4DataScience - NFDI for Data Science and A... No website yet
5 NFDI4Earth - NFDI Consortium Earth System Scie... Externer Linkhttps://www.nfdi4earth.de/
6 NFDI4Microbiota - National Research Data Infra... Externer Linkhttps://nfdi4microbiota.de/
7 NFDI-MatWerk - National Research Data Infrastr... Externer Linkhttps://nfdi-matwerk.de/
8 PUNCH4NFDI - Particles, Universe, NuClei and H... Externer Linkhttps://www.punch4nfdi.de/
9 Text+ - Language and Text Based Research Data ... Externer Linkhttps://www.text-plus.org/
The column Titel
(aka ‘title’) contains both titles and descriptions. The column Link
contains the string “Externer Link” and the links to homepages of NFDI consortia.
Processing tables¶
Let’s process and clean those tables. We also replace “No website yet” with empty string and “NFDI4BioDiversität” with “NFDI4BioDiversity”.
for df in df_list:
df['Description'] = df['Titel'].apply(lambda x: x.split(' - ')[-1].replace('Externer Link- Project in GEPRIS', ''))
df['Titel'] = df['Titel'] .apply(lambda x: x.split(' - ')[0].replace('NFDI4BioDiversität', 'NFDI4BioDiversity'))
df['Link'] = df['Link'].apply(lambda x: x.replace('Externer Link', '').replace('No website yet',''))
df
The NFDI consortia funded in 2020:
df_list[0]
Titel | Link | Description | |
---|---|---|---|
0 | DataPLANT | http://nfdi4plants.de/ | Data in Plant research (Biology) |
1 | GHGA | https://ghga.dkfz.de/ | German Human Genome Archive (Medicine) |
2 | KonsortSWD | https://www.konsortswd.de/ | Consortium for the Social, Behavioural, Educat... |
3 | NFDI4BioDiversity | https://www.nfdi4biodiversity.org/ | Biodiversity, Ecology & Envi-ronmental Data (B... |
4 | NFDI4Cat | http://gecats.org/NFDI4Cat.html | NFDI for Catalysis-Related Sciences (Chemistry) |
5 | NFDI4Chem | https://www.nfdi4chem.de/ | Chemistry Consortium in the NFDI (Chemistry) |
6 | NFDI4Culture | https://nfdi4culture.de/ | Consortium for research data on ma-terial and ... |
7 | NFDI4Health | https://www.nfdi4health.de/ | National Research Data Infrastructure for Pers... |
8 | NFDI4Ing | https://nfdi4ing.de/ | National Research Data Infrastructure for Engi... |
The NFDI consortia funded in 2021:
df_list[1]
Titel | Link | Description | |
---|---|---|---|
0 | BERD@NFDI | https://www.berd-nfdi.de/ | NFDI for Business, Economic and Related Data (... |
1 | DAPHNE4NFDI | https://www.sni-portal.de/de/daphne-nfdi/daphn... | DAta from PHoton and Neutron Experiments for N... |
2 | FAIRmat | https://www.fair-di.eu/fairmat/fairmat_/consor... | FAIR Data Infrastructure for Condensed-Matter ... |
3 | MaRDI | https://www.mardi4nfdi.de/ | Mathematical Research Data Initiative (Mathema... |
4 | NFDI4DataScience | NFDI for Data Science and Artificial Intellige... | |
5 | NFDI4Earth | https://www.nfdi4earth.de/ | NFDI Consortium Earth System Sciences (Geoscie... |
6 | NFDI4Microbiota | https://nfdi4microbiota.de/ | National Research Data Infrastructure for Micr... |
7 | NFDI-MatWerk | https://nfdi-matwerk.de/ | National Research Data Infrastructure for Mate... |
8 | PUNCH4NFDI | https://www.punch4nfdi.de/ | Particles, Universe, NuClei and Hadrons for th... |
9 | Text+ | https://www.text-plus.org/ | Language and Text Based Research Data Infrastr... |
Let’s save the dataframes into CSV-files.
df_list[0].to_csv("../../../data/DFG_NFDI_2020.csv", index=False, encoding='utf-8')
df_list[1].to_csv("../../../data/DFG_NFDI_2021.csv", index=False, encoding='utf-8')