Hello everyone,
My name is Jerome and I am currently working on the creation of an embedding model for traits encoding. I am currently parsing the traits visible on the EOL pages and I have create a modest application to retrieve the traits patterns and statistical distributions from these pages.
There are my main problems today:
1- As many of us noticed, the web site is returning a lot of 500 errors.
2- I cannot find an efficient way to list all the pages URLs. I have already downloaded the traits bulk data where there is a pages.csv file but it contains “only” 2.4 millions page ids, and it seems that it is just a fraction of what is available.
So, is it possible to download the entire collection of pages as bulk data?
It could be a win-win situation here, because I am intended to share the refactored data in the way of a web application with strong backend that would be open to public.
Of course, in accordance to the IP, copyrights and licensing that EOL and contributors will share with me.
Thanks a lot
Best regards
Jerome
Thanks for inquiring, @jeromemassot . I’m sorry about the site performance; our server upgrade has come up against some unexpected delays; we’re hopeful that we can stabilize our direct services in the next couple of months, but I suspect everything you want is in the zenodo files.
We don’t have a service for the unrecognized taxon pages; they are placeholders for data that our import process could not map to a recognized taxonomic name; most of them have either misspelled or otherwise uninterpretable taxonomy. The 2.4M pages represented in the all-traits archive are the ones that can be reliably interpreted as known species or higher taxa.
Good luck!
Jen
Hi Jen,
Thanks for the reply and information. Are you interested by having a discussion about how my organization could help EOL.org in its mission to serve data to the public?
My work is for the moment very grounded with the EOL data, the 2.4 millions pages (hopefully if I can successfully retrieve them
) but also all the other pages where some cross-checking and additional data compilation may be needed.
We have certainly some resources for a win-win collaboration.
Please let me know if you are interested.
Best regards
Jerome
Additionally, you mentioned the zenodo files, but so fa I have found the traits data, used a lot of the ontologies to serve the data. The “translation” of there ontological artifacts in plain English words is extremely long to do. It is perfect for knowledge graph search but not optimal for feeding a LLM for example.
Are the 2.4M pages located somewhere in the zenodo repository?
If yes, it is a great news for me 
Thanks
Best regards
Jerome
If you’re finding it tedious to refer to each separate ontology from which we borrow you might prefer our own terms file, which covers our whole borrowed vocabulary.
I’m not sure I understand what you mean by “Are the 2.4M pages located somewhere in the zenodo repository?“ We don’t have an archive of html files. Which parts of the taxon pages are you interested in? For the most part, different data types are covered by different services. I’m sorry the zenodo community is so hard to navigate. We’ve learned a lot since we migrated our resources in there. I’d quite like to wipe the slate and start over, but of course deleting something once it’s posted is not an option on zenodo.
I’ll alert my leadership to your offer; they probably won’t be ready to discuss new collabs until the dust has settled from the server upgrade, so it is unlikely to be quick.
Jen