29 June 2020
25 June 2020
23 June 2020
David's just done a guest post for the VirtualHumans.org site on "Intelligent Virtual Personas and Digital Immortality", pulling together some of our current work on Virtual Personas with David's writings on Digital Immortality and the site's interest in Virtual Influencers.
You can read the full article here: https://www.virtualhumans.org/article/intelligent-virtual-personas-and-digital-immortality
11 June 2020
- COVID19 and 3D Immersive Learning - With corporate training and academic syllabuses and delivery being revised to cope with the challenges of social distancing stretching out until mid 2021 at least, to what extent will trainers and educators look again at the potential of 3D immersive learning and virtual reality - or will they fallback on the more "traditional" approaches of VLEs and Zoom?
- Virtual Conferences - It's not just in virtual training and learning that immersive 3D can help - several organisations are now using immersive 3D conference and meeting environments to give participants more sense of "being there" and encouraging more serendipitous networking than yet another Zoom webinar. David reports on two recent events he attended.
- Trainingscapes 2.0 Sneak Peak - We're getting close to the launch of version 2.0 of Trainingscapes - see some screenshots of the new-look application.
- Plus snippets of other things we've been up to in the last 6 months - like being named one of the West Midlands Top 50 most innovative companies.
8 June 2020
On a recent project we had difficulties in scraping the summary paragraph from Wikipedia article pages and Beautiful Soup was suggested as a possible tool to help with this. The Beautiful Soup Python library has functions to iterate, search and update the elements in the parsed tree of a html (and xml) document.
So download and install the library do a quick test was to fetch the URL of the web page we’re interested using the ‘requests’ HTTP library to make things easy. The http document is then passed to create a ‘soup’ object,.
result = requests.get("https://en.wikipedia.org/wiki/HMS_Sheffield_(D80)")
src = result.content
soup = BeautifulSoup(src, 'lxml')
The prettify # makes the html more readable by indenting the parent and sibling structure
Searching for tag types (such as ‘a’ for anchor links) is simple using ‘find’ (first instance) or ‘find_all’, this shows all internal (Wikimedia links) and external links (“https://”)
Lets just get links that refer to “HMS …”
Now lets get the text paragraphs we’re interested in, this can be done using the ‘p’ tag
Dedicated Wikipedia Library
While Beautiful Soup is a good generic tool for parsing web pages, it turns out that for Wikipedia there are dedicated python utilities for dealing with the content such as the Wikipedia library (https://pypi.org/project/wikipedia/) which wraps the Wikimedia API simply
wp.search(“HMS Sheffield”) returns the Wikipedia pages for all incarnations of HMS Sheffield, and we can use wp.summary(“HMS Sheffield (D80)”) to give hte element from page we’re interested in.
The wp.page(“HMS Sheffield (D80)”) also gives the full text content in a readable form with headings.
Again we can select the first paragraph for the summary (exclude URL), and possible use other paragraphs using the headings as index/topic markers.
Smart Quotes! While trying this out I also found a useful function to get rid of those pesky Microsoft smart quotes causing trouble in RDF definitions on the same task. Unicode, Dammit converts Microsoft smart quotes to HTML or XML entities: