Last week I was lucky enough to spend a couple of days in Salzburg in order to attend the FutureTDM Symposium. Text and data mining is a hot topic in academic libraries at the moment and I was flattered to be asked to discuss the skill set that both researchers and librarians might need in order to make the most of it.
What is TDM?
Text and data mining (TDM) is the process of electronically analysing large amounts of text in order to identify trends. This process has traditionally been done by people sitting down and going through text which is of course incredibly time consuming and laborious. There is also a limit to the amount of information which can be analysed and the number of trends which can be found in this way. Using electronic analysis means that vast volumes of information can be mined and patterns that might not be apparent to the human eye can be found. For example researchers can discover all mentions of a disease in the literature they have access to and find connections to possible treatments.
This obviously opens up a lot of exciting new opportunities for both researchers and librarians but they may need to develop their skill set to take full advantage.
Library staff have a wide variety of skills which fit in well with the TDM agenda including technical, data and teaching skills. This makes them ideally placed to offer help to the research community with TDM. Of course depending on their involvement they may not need to be experts but having a basic awareness of the concept of TDM and its associated areas is important in order to signpost those looking for help.
The actual skills needed will vary by discipline. What is crucial is the need to embed the skills needed within the areas researchers are already looking to develop to avoid a situation where TDM is seen as “yet another thing” they have to learn about. This is especially true in light of the mobile nature of the career researcher who moves between institutions and has to learn new technologies and systems every time.
Skills to develop
So which skills should librarians and researchers focus on:
- Copyright - as TDM involves accessing material which may be protected by copyright a solid grasp of copyright laws and exemptions is important. There is still a perception amongst many that if you can access a resource online then it’s free to use (I’m sure many librarians will have had this conversation with their users!). An understanding of copyright is important for both librarians and users in order to make sure that TDM projects adhere to the law. A knowledge of the different licences available for material and how they operate is also important in order to get the best balance between the rights of the author and the work that researchers want to undertake.
- Data skills - solid research data management is the basis of the TDM of the future. If we can take care to manage, label and share the data that is currently being produced then it will be in good order for future researchers to mine. Librarians already have the skills needed to advise on preparing and managing data but perhaps need to be more proactive in offering this help to researchers. In order to take full advantage of TDM data needs to have good metadata attached to it. Poor metadata reduces the visibility of the data and means that computers struggle to process it. Again, librarians are ideally placed to help advise researchers on the skills and schema they will need to use to work with metadata.
- Technical skills - technical skills are vital for this type of work but tend not to be present unless the individual works to develop them themselves outside of their formal education. Both librarians and researchers need to have the knowledge of the applications used to actually undertake this work, from a basic awareness of the tools available to being able to operate expert support. Again, this will likely vary by discipline. Knowledge of the different file formats available is also important as this can help to solve problems. For example digitised books are often saved as images files of the pages which is easily readable by the human eye but not by a machine. Skills in data analysis and programming are becoming more common in the library sector and these will also be important as we look to advise researchers.
- Negotiation - TDM is still relatively new to many people and there is still work to be done on making sure that the current exemptions to copyright law work for the majority. It’s important that researchers and librarians are able to negotiate licences where needed, especially if it they don’t make explicit provision for TDM. A basic understanding of how to interpret existing contracts is also important. I would argue that these skills are particularly important for librarians. It’s likely to be too onerous for researchers to negotiate with all of the different rights holders they would need to contact and this provides an opportunity for librarians to act as intermediaries. In order to do this successfully they need to develop strong negotiating skills.
- Future planning and adaptability - TDM is a constantly changing landscape and everyone needs to be able to respond to these changes and plan for the future rather than taking a reactive approach when it’s too late. Being able to look at the current landscape and using knowledge to try and predict future trends will help to ensure that both librarians and the research community are well positioned for the future.
The above is by no means an exhaustive list. I'm as new to TDM as many of my colleagues so am still learning as I go. If you want to explore TDM in more detail I would suggest following the reports from the FutureTDM project and checking out this blog post from CILIP for more information. It’s an exciting new area which is likely to feature heavily in the future of both the academic library and librarian.