Easy way to download trait data for specified taxa?

Hello, I’m trying to download the general trait data for a few particular insect taxa for a manuscript I am working on with some other people. Of most interest to me is the trophic guild, but other listed trait data would also be nice to have. There are few enough taxa that I can get away with just reading through all of them, but the site has been rather slow and has been giving me a lot of 503 errors. Based on other posts in this forum, I believe this is because the migration still hasn’t taken place, but it would be good to know if it’s some other issue.

At first my plan was to click the traitbank, filter for particular taxa, and click “download tsv.” It took a while to process, and unfortunately the downloaded file didn’t seem to include trait data at all, and only had the taxa names.

There were some R packages that I thought might allow me to get the data directly into R like the traits package, but that package recently got taken off of CRAN because one the dependencies of its dependencies was no longer supported. I dug out the function from the github, but the function requires an API key. The only place where I’ve found instructions on acquiring one (here) doesn’t seem to be working, as all I get is:

{
  "title": "You are not authorized to use the web services.",
  "status": "403 Unauthorized"
}

Notably, this is a different message from what it was when I wasn’t logged in. I think something may have changed and now it’s harder to get an API key? Or API keys aren’t relevant anymore? I see some mentions of the “old API” in the forum but I am not sure what it really means.

I also tried to download the trait data from the open data portal, “All trait data” in particular, but after load.csv’ing it into r and filtering scientific_name only to those taxa I was interested in, all I had left was a paleobiology portal saying the families were still extent, and GBIF saying they were extant in the Republic of Mozambique (that’s how I interpreted it, at least). I think what might be going on is the trait data on the web pages largely coming from different taxonomic categories (i.e. all beetles get labeled as holometabolous and bilaterally symmetric because they are both beetles and bilaterians). If that’s the main barrier, is there a way around it?

In short, I’m having inordinate trouble with this, and that’s everything I have tried. Solutions or more information regarding any of those would be much appreciated.

Best,
Shane

Hi, Shane! Sorry the services are so convoluted. It’s hard to make trait data simple, and some things are in transition atm. You should be able to get the records that you’re missing from the all traits archive by including the inferred.csv file. All the beetles from your example are listed individually there.

We recently migrated our downloadables from the old CKAN instance into zenodo, so you’ll find altraits here now. Nothing is very far out of sync yet but the CKAN will be decommissioned sometime this year.

If you do find that you want the cypher service, and if I understand you right that you don’t have a key for it then you can request one by email per the instructions.

:slight_smile:
Jen

(oh, and if you want records rather than a taxon list from the user interface, select the “records” radio button when constructing your query. But really, the interface is going to be very frustrating to use for the next month or two, until the server migration. Sorry!)

Thank you so much for the quick reply!

Looking into it more, it seems downloading the trait data only lets you actually see the traits the codes correspond to if you’re using neo4j? I’m not at all familiar with that software and not interested in graph databases, is there any way other way to get the trait data? It doesn’t seem like any of the files in the linked download contain a real link between the codes under the inferred_trait column of inferred.csv and the actual meaning of those traits. Sorry if there’s something obvious I’m missing, this really isn’t something I’m used to.

Hello, I noticed you are a new user and my curiosity gets the best of me :sweat_smile: what specific insects taxonomy/specie are you looking for?

Mostly it’s a big list of different families, particularly taxa in the area we’re sampling that regularly show up on the sticky traps (and are identifiable in that state).

The idea is to look at the overall composition of the insects in terms of these kinds of ecological roles/traits, kind of like how they look at functional feeding groups when assessing streams.

Ah I see so your looking at a specific taxa(s) role ecologically, from viewing insects that have been got on “sticky traps.” Very interesting!

No, you shouldn’t need neo4j; the columns of the assorted files all meet somewhere, let me just have a look at the archive. We should really include a schema file in that product. Maybe I’ll just add something to the description for now…

OK, I have confirmed this with one example anyway, though I can see how hard it would be to detect by inspection. The inferred_trait column connects to the eol_pk column in the traits file. That column contains identifiers of at least two vintages that don’t look the same, but if you try, for instance, R499-PK289647419, from the top of the inferred file, you’ll find it wayyyyy down in eol_pk.

Please let me know if that doesn’t help, or if you find examples from inferred that are missing from traits. You never know, we might have a sync problem or a job that choked and produced an incomplete file.

:slight_smile:

Jen

Hmmm. When I used View(filter(traits,eol_pk == "R499-PK289647419")) after importing the data into R, it just showed up blank. Which is weird, because filtering it that way with some of the other eol_pk values works fine. Even when I open it in notepad and directly search for that string it says it isn’t in there.

Redownloading the files, there’s some weird discrepancies in how the file sizes are displayed. Within the zip file and on the website, it’s 1.9 GB. When you click properties when it’s zipped or on the right-hand side when it’s unzipped, it says it’s 1,880,260 KB. But when you click properties when it’s unzipped, it says the size is “1.79 GB (1,925,385,402 bytes).” I think some of the data might have been lost to errors? When trying to redownload and re-extract it I realized there was a data error along the way while doing so. Windows’ native unzipping program refused to unzip it, so I had to use 7-zip, but it says there’s an error when extracting it (like in this thread). I think this is a common occurrence considering I’ve had those errors across multiple devices and others are posting about it. I’m not sure how reliable the file size estimations usually are so that might just be a red herring, but I do think there’s something up with the data. Hypothetically I should be able to just merge the files but that doesn’t seem to work.

This link worked for me:

1 Like

That’s it, it works for me, thanks! I’m getting the right number of characters now, too.

However, I don’t think it’s just an issue with the zip compression. This traits.csv file is 5.79 GB, which is over three times bigger than the file in the zenovo (according to the website). When I had uncompressed it, To me, this indicates that on top of whatever issues there are in uncompressing the gif (I only had a little over 3.5 million lines originally, so the number people are getting is variable) there are also issues in what data has actually been uploaded. I might be wrong if zenovo is only displaying it zipped or something.

The November issue strikes again? Thanks for that confirmation! I was never able to reproduce the problem myself. I haven’t tested the zenodo file summary display at all, but I think it’s plausible that we can’t trust it. The zip issue remains mysterious. The original file is the same. I just downloaded it from zenodo, unzipped and zipped it again locally and stuck it in my google drive.

All our processes are late this winter (#server_upgrade) but when we finally publish the next update of this file I’ll make a note here in case anyone wants to try again and see if that one behaves…

Hello i am curious, are you the receiver of my email from the Australian Museum?

I’m sorry, I don’t understand what you mean.

Autocorrect is very bad for me. My apologies :sweat_smile: I sent an email to the Australia Museum, and told them to reply on the EOL If they got it, are you from there?

Ah, I get it. No, it’s completely unrelated, I’m not from Australia.

Ok. Thank you anyway.