25 March 2017

Finding Open Access moves on

Some time ago in 2013 the OA button arrived and I wrote a bit about it.

Back then it was a bookmarklet that sat with your browser bookmarks. . You could use it to identify when and where you were trying to access a paywalled article.

4 years on it is a browser extension (I'm using Chrome - have not checked Firefox or others) you can use on the web to locate open access versions of articles, or if none found to request an OA version be made available. The requests are forwarded to researchers for legal open access copies to be archived in a repository. There is no guarantee that your request will be satisfied, but it helps to communicate to researchers the demand and importance of OA.

I have used it from our Library discovery system (Primo) and it worked OK, I assume using the DOI in the article record I was viewing. DOI, PMID and some other identifiers may be used.

There is also an Unpaywall extension. It's official launch is April 4 2017 - so it is a less mature product. This one is designed to automatically display an indicator of whether an OA version is available while viewing an article metadata page. This one is not working with my Library discovery system, nor a ResearchGate, nor the Australian Library Journal on Taylor and Francis pages, but it does work with some journal sites.

Ex Libris has planned to incorporate oadoi as an option in Alma's uresolver. This will provide a similar kind of finding option to locate open access versions where a DOI is available in a citation in Primo.  I'm looking forward to adding that one to our interface. It will be interesting to see what if any impact this has on our document delivery service.

For news about these extensions..
Follow @oaDOI_org
Follow Unpaywall

9 September 2016

Research Data Thing 23/23 - Making Connections

This is the last thing! Woot!

I have:
And for now I think that's enough. No doubt opportunities and ideas will arise from this experience.

Thank you ANDS and fellow thingers.

Research Data Thing 22/23 - What's in a name

The penultimate thing!

I've been listening to more podcasts lately, so instead of sharing videos as suggested in the thing, here are some podcasts that might be interesting on big data topics.

  • Data Skeptic - short episodes exploring data concepts and longer interviews with practitioners on data science.

4 September 2016

Research Data Thing 21/23 - Tools of the (dirty data) trade

Thing 21 is about dirty data and some strategies and tools for fixing data issues.

Having been involved in implementing data systems at work which involved data migration and establishing feeds from other systems with transformations eg. building an organisation code structure in a new system based on partial strings from a payroll system; sourcing person records from two separate systems and deduplicating (people who were both staff and students), the pitfalls of dirty data is quite familiar. The problems soon started appearing during testing phase, particularly as we looked at report generation and business processes that relied on choosing a specific record.

One of the difficulties was individuals that had name variations between the two systems but were in fact the same person. Sometimes the only way these were found was through someone knowing that staff member had changed their name, or used a diminutive in their student record. This led to changing some business processes to help identify persons between the two systems.

This thing talks about using Google Spreadsheets and a scraping extension to gather data tabular data from websites. In the past, when websites used
tags in the html it was relatively easy to import tables directly into Excel using the method in this video. I was hoping to try it again, but could not find a suitable table to play with. (They mostly seem to use these for ads!, and alternative methods for tabular data)

The feature to do this is available in Excel 2016 in the data ribbon.

This is my first time at trying Google spreadsheets for scraping data. So here is a table from the Wikipedia page on Australia at the Olympics.

Medals by Summer Games

In the wikipedia page the column "Totals" has bold text. In the data scraped the wiki encoding for bold has been captured as asterisks surrounding each value - a prime candidate for some cleansing.

I was going to have a go with openRefine, but it was downloaded on a different computer and I can't be bothered shifting gears to finish this on the other one.