Information

Making sense out of the visual representation of transcription

Making sense out of the visual representation of transcription


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Most people are familiar with the following diagram. Some genomic DNA with a promoter region, exons and introns. This is transcribed into RNA that is then translated into a polypeptide.

When we look closer at the strand that is being transcribed we can distinguish between the two as the sense and anti-sense strands.

So the transcription factors and RNA polymerase bind and begins transcribing mRNA in the 5' to 3' direction and thus reading the anti-sense strand in the 3' to 5' direction and have the same sequence as the sense DNA strand, substituting U for T in the mRNA.

My question would be, shouldn't the exons be numbered in the reverse order as shown in the first picture I provided. So instead of Promoter -> Exon1 -> Intron -> Exon 2, should it be, Promoter -> Exon N -> Intron -> Exon N-1?

Also, in bioinformatic sites are the gene sequences listed in the sense or anti-sense strand? I have noticed in some bioinformatic tools, to determine what polypeptide will result from a DNA sequence, one must input the sense strand in 5'to 3' orientation and not the anti-sense strand.


All visual representations and nearly all coordinate systems are based on the sense strand. The polymerase machinery has no clue about what is sense and what is antisense, because each is the antisense of the other.

For visual representation this makes much more sense and relays more information as it removes the extra complicated layer of information. And, in most cases the gene structures are declared in the order of the reference genome which is always the positive or sense strand.

Next coming to the bioinformatics part, most of your databases such as UCSC, Ensemble and NCBI maintain gene coordinates on the reference genome, But, there's a catch, when reporting the information through a bed file,

Negative stranded genes are reported aschromosome stop startby NCBI (last I used it was a one a half years ago), UCSC will provide thechromosome start stopand both will report the strandedness, UCSC expects that you the bioinformatician will create the reverse complement when you find the strand information, while NCBI expects that your program will fail a sanity check becausestop - startwill come as negative, implying that you cannot make a mistake while parsing NCBI bed files. Furthermore, UCSC indexes are maintained as 0 based, while NCBI is maintained as 1 based.

I would urge you to validate this information

But why not just keep a negative strand as well? while keeping the gene coordinates and elements for the antisense in the format you just mentioned.

Because speaking from a Computational point of view it just makes more sense, this system would consume more storage (please remember the entire system was formulated before storage became as cheap as it is today) it would consume more memory during tasks (exactly double of what it is consuming today). So it's just better to have a positive strand reference genome and all genes and elements based on that.

Just an example of how alignment of sequencing reads works,

  1. You align your read to the reference genome
  2. Aligns? If yes it has mapped to the positive strand
  3. No? Reverse complement the read and align back
  4. Aligns? If yes it has mapped to the negative strand
  5. No? Possibly an erroneous read or other artefacts.

Gene sequences are annotated in the sense strand. So, these diagrams are correct.


In the top diagram, the ATG start is on the left, and the stop codon is on the right. Exon one is the left most exon. That's all correctly illustrated. If you look up a transcript sequence in, say, ensembl, it will be the same way, the start sequence will be in the beginning, the stop codon at the end. That's the convention.


Making sense out of massive data by going beyond differential expression

With the rapid growth of publicly available high-throughput transcriptomic data, there is increasing recognition that large sets of such data can be mined to better understand disease states and mechanisms. Prior gene expression analyses, both large and small, have been dichotomous in nature, in which phenotypes are compared using clearly defined controls. Such approaches may require arbitrary decisions about what are considered “normal” phenotypes, and what each phenotype should be compared to. Instead, we adopt a holistic approach in which we characterize phenotypes in the context of a myriad of tissues and diseases. We introduce scalable methods that associate expression patterns to phenotypes in order both to assign phenotype labels to new expression samples and to select phenotypically meaningful gene signatures. By using a nonparametric statistical approach, we identify signatures that are more precise than those from existing approaches and accurately reveal biological processes that are hidden in case vs. control studies. Employing a comprehensive perspective on expression, we show how metastasized tumor samples localize in the vicinity of the primary site counterparts and are overenriched for those phenotype labels. We find that our approach provides insights into the biological processes that underlie differences between tissues and diseases beyond those identified by traditional differential expression analyses. Finally, we provide an online resource (http://concordia.csail.mit.edu) for mapping users’ gene expression samples onto the expression landscape of tissue and disease.

Although gene expression microarrays have been a standard, widely utilized biological assay for many years, we still lack a comprehensive understanding of the transcriptional relationships between various tissues and disease states. Even with the hundreds of thousands of expression array datasets available through public repositories such as National Center for Biotechnology Information’s (NCBI’s) Gene Expression Omnibus (1) (GEO), the lack of standardized nomenclature and annotation methods has made large-scale, multiphenotype analyses difficult. Thus, expression analyses have typically used the decade-old approach of comparing expression levels across two states (e.g., case vs. control) or a limited number of phenotype classes (2 ⇓ –4). Even recent large-scale gene expression investigations, whether they have attempted to elucidate phenotypic signals (5 ⇓ –7) or applied those signals for downstream analyses such as drug repurposing (8, 9), involve comparisons between two states or classes.

Comparative analyses, where transcriptional differences are directly measured between two phenotypes, inherently impose subjective decisions about what constitutes an appropriate control population. Importantly, such analyses are fundamentally limited in scope and cannot differentiate between biological processes that are unique to a particular phenotype or part of a larger process that is common to multiple phenotypes (e.g., a generic “cancer pathway”). Moreover, the results of such comparative analyses can be limited in generalizability as they make assumptions about the phenotypes being compared (10). Alternatively, in a data-rich environment, we can take a holistic view of gene expression analyses.

In this paper we introduce scalable and robust statistical approaches that leverage the full expression space of a large diverse set of tissue and disease phenotypes to accurately perform and glean biological insights from both sample- and gene-centric analyses. By viewing a given phenotype in the context of this comprehensive transcriptomic landscape, we circumvent the need for predefined control groups and presupposed relationships between phenotypes (Fig. 1A). We devise, implement and validate the accuracy of an enrichment statistic that provides detailed phenotypic information for new samples when they are mapped onto and compared with the transcriptomic landscape (http://concordia.csail.mit.edu).

Comprehensive view of gene expression. (A) A comprehensive perspective on expression analysis enables the elucidation of biological signals that are thematically coherent but provide an alternative view to traditional dichotomous approaches. For example, the gene-signature for “breast cancer” is enriched for breast-specific development and carbohydrate and lipid metabolism in our comprehensive approach, as opposed to being dominated by a more general “cancer” signal. (B) The gene expression landscape, as represented by the first two principal components of the expression values of 20,252 genes from 3,030 microarray samples separates into three distinct clusters: blood, brain, and soft tissue. The shading of the regions corresponds to the amount of data located in that particular region of the landscape such that the darker the color, the more data exists at that location. Interestingly, the area where the soft tissue intersects the blood tissue corresponds to bone marrow samples, and where it intersects the brain tissue, mostly corresponds to spinal cord tissue samples. (C) There is a clear separation of reproductive and gastrointestinal tissue samples in the soft-tissue cluster.

Our perspective on interpreting gene expression space helps uncover phenotype-specific marker genes beyond those discovered by traditional dichotomous views of gene expression. We introduce a method based on a finite impulse response filter (11) used in signal processing to reveal, for instance, marker genes involved in carbohydrate and lipid metabolism as key processes in breast cancer. Such findings are in contrast to those of traditional over- and underexpression based analyses, which focus on generic cancer processes not specific to breast cancer such as cell cycle and cell adhesion (12). Capitalizing on the hierarchical nature of the phenotypic labels associated with our samples, we also demonstrate that genes previously linked to specific types of carcinomas may actually be part of a broader “carcinoma” process. Finally, we illustrate how metastasized tumor samples are transcriptomically more proximal to other cancer samples from their respective primary sites, as opposed to cancerous tissue from the metastasis sites from which the samples were resected.


European healthcare systems and the potential for big data

Medicine has traditionally been a science of observation and experience. For thousands of years, clinicians have integrated the knowledge of preceding generations with their own life-long experiences to treat patients according to the oath of Hippocrates mostly based on trial and error. Knowledge generation is changing dramatically. The digitalization of medicine allows the comparison of disease progression or treatment responses from patients worldwide. Whole-genome sequencing allows searching and comparing one’s own genome to millions and soon billions of other human genomes. Eventually, the entire world population could be used as a reference population in order to link genome information with many other types of physiological, clinical, environmental, and lifestyle data. For many, this is a vision full of opportunities, whereas for others it provides a wealth of technical challenges, unanticipated consequences, and loss of privacy and autonomy.

The quality of conclusions on the etiology of diseases follows a law of large numbers. Cross-sectional cohort studies of 30,000 to 50,000 or more cases are required to separate the signal from noise and to detect genomic regions associated with a given trait in which disease-related genes or susceptibility factors are located [1, 2]. Whole-genome sequencing studies often identify only a few genomic regions that contain elements with large effects on the penetrance or expressivity of gene products but hundreds of genomic regions that have small effects and are highly dependent on genetic background, environmental factors, or social and lifestyle determinants [3]. There is also a need to study disease pathogenesis on genome, epigenome, transcriptome, proteome, and metabolome levels and combine these dimensions through multi-omics research. Furthermore, individual variation responsible for normal and disease phenotypes is high as a result of somatic mutations or variation in transcription, splicing, or allele-specific gene expression between individuals [4–6].

Vast amounts of temporal and spatial parameter data are now available. But what are we going to do with the data? It takes hard work to condense useful information from big data and turn this information into knowledge and action. The challenge will be to make a smart choice between situations when less is more versus less is less but also when more is more versus more is less.

Here, we briefly describe the key challenges that result from making sense out of big data in health and using these data for the benefit of the patient and the healthcare system. We also highlight key technical, legal, and ethical issues that we face to develop evidence-based personalized medicine. Finally, we put forward five recommendations for the European Union (EU) and member states’ policy makers to serve as a framework for an EU action plan that could help to reach this ambitious goal.


Making Sense out of Large Graphs: Bridging HCI with Data Mining

PI: Christos Faloutsos, Carnegie Mellon University , phone 412-268.1457, FAX: 412-268.5576, email: christos AT cs.cmu.edu
Co-PI: Aniket Kittur, Carnegie Mellon University, 412-268.7505, nkittur AT cs.cmu.edu
Co-PI: Duen Horng (Polo) Chau, Georgia Tech, 404-385.7682, polo AT gatech.edu

1. GENERAL INFORMATION

1.1. Abstract ( Link to NSF abstract)

The goal of this research project is to help people make sense of large graphs, ranging from social networks to network traffic. The approach consists of combining two complementary fields that have historically had little interaction -- data mining and human-computer interaction -- to develop interactive algorithms and interfaces that help users gain insights from graphs with hundreds of thousands of nodes and edges. The goal of the project is to develop mixed-initative machine learning, visualization, and interaction techniques in which computers do what they are best at (sifting through huge volumes of data and spotting outliers) while humans do what they are best at (recognizing patterns, testing hypotheses, and inducing schemas). This research addresses two classes of tasks: first, attention routing -- using machine learning to direct an analyst's attention to interesting nodes or subgraphs that do not conform to normal behavior. Second, sensemaking -- helping analysts build in-depth representations and mental models of a specific areas or aspects of a graph. Evaluation of the tools will involve both controlled laboratory studies as well as long-term field deployments.

As large graphs appear in many settings -- national security, intrusion detection, business intelligence (recommendation systems, fraud detection), biology (gene regulation), and academia (scientific literature) -- the potential benefits of new tools for making sense of graphs is far reaching. Project results, including open-source software and annotated data sets, will be disseminated via the project web site (http://kittur.org/large_graphs/) and incorporated into educational activities.

1.2. Keywords

1.3. Funding agency

2. Research Highlights & Major Activities


Figure 1. The CrowdScape interface. (A) is a scatter plot of aggregate behavioral features. Brush on the plot to filter behavioral traces. (B) shows the distribution of each aggregate feature. Brush on the distribution to filter traces based a range of values. (C) shows behavioral traces for each worker/output pair. Mouseover to explore a particular worker's products. (D) encodes the range of worker outputs. Brush on each axis to select a subset of entries. Next to (B) is a control panel where users can switch between parallel coordinates and a textual view of worker outputs (left buttons), put workers into groups (colored right buttons), or find points similar to their colored groups (two middle button sets).

2.1. Interactive Visualization

One area in which it is challenging to make sense of large data is understanding how people interact with an interface: each mouse movement, click, scroll, keypress, etc. generates potentially useful information but can quickly become overwhelming. Mining such data could be especially useful for the growing field of crowdsourcing, in which employers have little control or visibility into how crowd workers are accomplishing tasks and often have quality control issues. We developed a novel system (CrowdScape) which helps researchers to visualize the behavioral traces of crowd workers and interactively group them into clusters using machine learning. Each worker's behavior -- clicks, scrolls, typing, delays, etc. -- is summarized in a compact row, allowing many workers to be easily compared and made sense of, with dynamical queries providing an interactive overview and filtering mechanism. Furthermore, once a worker's behavior has been identified as high or low quality, CrowdScape uses machine learning models to propagate these labels to similar work. This research won the Best Paper award at UIST 2012.

Another significant contribution is our novel decomposition of canonical graph visualization techniques (e.g., PivotGraph, semantic substrate) into reusable, atomic interactive operations/building blocks (e.g., ranking horizontally, grouping nodes into super nodes). The user can then flexibly combine these operations to summon graph visualization techniques on demand and to potentially creating new ones. This investigation has led to an InfoVis'14 paper and formed the foundation of the thesis of CS PhD student Chad Stolper (Georgia Tech).

2.2. Multitouch visualization

In the case of highly multivariate data, it is difficult with desktop-based systems to examine more than two dimensions of variables due to their structured approach. Tablets provide a different set of affordances compared to desktops, and might allow us to interact with data in new ways. We have used multitouch gestures on a tablet combined with physics-driven models to create new interaction techniques that help users to explore more dimensions and make deeper analyses. Figures 1 and 2 below demonstrate some of the utility of these techniques. This research was published as an extended abstract at CHI 2013, where it was also shown as a demo and part of the video showcase.

2.3. Graph mining algorithms

We are also working on automatically determining important structures in a graph. The idea is to use the so-called Minimum Description Language (MDL) to describe 'important' subgraphs. A subgraph is 'important', if we can compress it easily: for example a clique of n nodes can be easily compressed similarly for a chain, and for a star. Ms. Danai Koutra is working on the topic, developing scalable heuristics to solve the combinatorial problem of subgraph selection. Once such subgraphs are chosen, we plan to show them to the user, in a compressed form, for example, using a 'box' glyph, to represent a clique, or a 'star' glyph to represent a star.

Some of the research highlights include the TimeCrunch paper in kdd'15 for summarizing and understanding time-evolving graphs, and the 'Perseus' graph visualization demo in vldb'15. The over-arching ideas are (a) the use of MDL (minimum description language) to summarize and understand large static and time evolving graphs and (b) to present only a few items to the user each time ('Perseus') and (c) to strive for attention routing: show the user a quick summary (through MDL), and draw attention to the most blatant outliers ('Perseus').

Another highlight is our 2015 survey paper that summarizes the latest sensemaking challenges and opportunities in scalable graph exploration and visualization [Pienta'15], pinpointing the state of the art accomplishments (including some of our own) at the intersection of data mining and HCI and the open problems that the data mining and HCI research communities may want to join forces to solve.

Through this project, we also discover a new, simple, and practical way to scale up interactive analytics to billion-scale graphs on a single PC that uses the universally available virtual memory / memory mapping (MMap) capability on all operating systems to store graph data that would otherwise be too big to fit in RAM. Our approach significantly outperforms other existing approaches. We published our initial results at IEEE BigData'14. It could become a promising way to scale up general machine learning and data mining techniques without requiring developers and users to "re-learn" or reimplementing them using custom software frameworks.


Making sense out of the visual representation of transcription - Biology

Principal Investigator

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, Guildford, Surrey, GU2 7XH, UK

Professor of Signal Processing in the Centre for Vision, Speech and Signal Processing (CVSSP) at the University of Surrey. He has investigated a wide range of audio signal processing methods, including automatic music transcription (Abdallah & Plumbley, 2006) and audio source separation (Nesbit et al, 2011), using techniques such as sparse representations (Plumbley et al 2010), and high- resolution NMF (Badeau & Plumbley, 2014). He led the D-CASE data challenge on Detection and Classification of Acoustic Scenes and Events (Stowell et al, 2015), his work on bird sound classification (Stowell and Plumbley, 2014) was widely featured in news media, and collaborative article on Best practices for scientific computing (Wilson et al, 2014) was the most-read PLoS Biology article of 2014. He is PI on the EPSRC project Musical Audio Repurposing using Source Separation, is the lead academic on the Innovate UK projects Audio Data Exploration and Advanced Smart Microphone, with the company Audio Analytic, and has been PI on several other EPSRC grants. Before joining Surrey (Jan 2015), Plumbley was Director of the world-leading Centre for Digital Music (C4DM) at Queen Mary University of London.

Co-Investigators

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, Guildford, Surrey, GU2 7XH, UK

Senior Lecturer at CVSSP and leads the Machine Audition Lab within CVSSP. His research in audio and speech processing has contributed to projects (e.g., Columbo, BALTHASAR, DANSA, QESTRAL, UDRC and POSZ) in acoustics of speech production (Jackson & Shadle, 2001), audio source separation (Liu et al, 2013 Alinaghi et al, 2014), audio-visual processing for speech enhancement and visual speech synthesis, and spatial aspects of subjective sound quality evaluation (Coleman et al, 2014 Conetta et al, 2014), and has over 100 academic publications. He is Surrey's technical lead on the EPSRC-funded S3A Programme Grant on spatial audio, responsible for object-based audio production.

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, Guildford, Surrey, GU2 7XH, UK

Reader in Signal Processing at CVSSP and Co-Director of the Machine Audition Lab in CVSSP. His research interests include audio source separation (Liu et al, 2013 Alinaghi et al, 2014), blind dereverberation (Jan & Wang, 2012), sparse signal processing (Dai et al, 2012), machine learning (Barnard et al, 2014), and audio-visual signal processing (Kilic et al, 2015). He has over 150 publications in these areas including two books: Machine Audition: Principles, Algorithms and Systems (2010) and Blind Source Separation: Advances in Theory, Algorithms and Applications (2014). He has been a PI and Co-I on several EPSRC projects, including Multimodal Blind Source Separation for Robot Audition (EPSRC EP/H012842/1) and Audio and Video Based Speech Separation of Multiple Moving Sources (EPSRC EP/H050000/1).

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, Guildford, Surrey, GU2 7XH, UK

Reader in Robot Vision in CVSSP. He has more than 80 publications in top-tier computer vision and machine learning forums, in areas such as feature extraction, object recognition, and tracking, including kernel based methods (Yan et al, 2011), audio-visual annotation (Yan et al, 2014), and deep canonical correlation analysis (Yan & Mikolajczyk, 2015). He received Longuet-Higgins Prize 2014 for his contribution to invariant local image descriptors. He chaired BMVC 2012 and the IEEE AVSS 2013. His team regularly comes top in retrieval challenges, including in TRECVid 2008, Visual Object Classes Challenge 2008 & 2010, and ImageCLEF 2010.

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, Guildford, Surrey, GU2 7XH, UK

Director of Digital World Research Centre (DWRC) at Surrey and Professor of Interaction Design. He joined the Centre in 2005 from HP Labs, to establish a new research agenda on user- centred innovation in digital media technology. His work explores a variety of new media futures relating to digital storytelling, personal media collections, and community news and arts. David has a longstanding interest in sound through his work on Audiophotography as a new media form (Frohlich, 2004 Frohlich & Fennell, 2007): this led to the first published study of the domestic soundscape on the Sonic Interventions project (Oleksik et al, 2008), and the Com-Me toolkit (Frohlich et al, 2012) for creating audiophoto narratives on the Community Generated Media project (EP/H042857/1), and the design of audiopaper documents on the Interactive Newsprint and Light Tags projects (EP/I032142/1, EP/K503939/1).

School of Computing, Science & Engineering
University of Salford, Salford M5 4WT, UK

Expert in soundscapes, room acoustics and perception. He has 27 years’ experience in acoustic and psychoacoustic research methods. He led the EPSRC Positive Soundscape Project (EP/E011624/1), a large consortium project to develop new ways of evaluating soundscapes, involving artists, social scientists and acousticians. Project outputs included new methods to measure soundscape perception, advice for urban planners, a major art exhibition, radio programmes, and a soundscape sequencer toy for people to gain a greater understanding of their aural environment (Davies et al, 2013). He edited a special edition of Applied Acoustics on soundscapes, and sits on ISO TC43/SC1/WG54 producing standards on soundscape assessment. He led Defra project (NANR200) to advise on UK soundscape policy (Payne et al, 2009). Davies is currently Vice-President (International) of the Institute of Acoustics, and leads work on perception of complex auditory scenes on the EPSRC-funded S3A Programme Grant.

School of Computing, Science & Engineering
University of Salford, Salford M5 4WT, UK

Professor of Acoustic Engineering and a past President of the Institute of Acoustics (IOA). He was awarded the IOA’s Tyndall Medal in 2004. He has been an investigator on EPSRC projects on room acoustics, signal processing, perception and public engagement. These include EP/J013013/1 on the perception and automatic detection of audio recording errors using perceptual testing and blind signal processing methods (Jackson et al, 2014). Cox leads the qualitative and quantitative perceptual work on the EPSRC S3A Programme Grant. He was an EPSRC Senior Media Fellow and has presented 21 science documentaries on BBC Radio, authored articles for The Guardian, New Scientist and Sound on Sound. His popular science book Sonic Wonderland was published in 2014. He pioneered psychoacoustic testing as a method for engaging the public, working with BBC R&D and the British Science Association on theme tunes (Davies et al, 2011), a technique that will be exploited in this project.

Postdoctoral Researchers

School of Computing, Science & Engineering
University of Salford, Salford M5 4WT, UK

A Research Fellow with research interests in sound perception and auditory science. He studied for his PhD at the University of Manchester, researching the neural basis for individual differences in the perception of musical consonance with Chris Plack. He then took a postdoctoral research position with Patrick Wong at the Chinese University of Hong Kong researching pitch perception in tone-language speakers, before joining the Acoustic Research Centre at the University of Salford as a Research Fellow working with Bill Davies and Trevor Cox.

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, Guildford, Surrey, GU2 7XH, UK

A Research Fellow now working on the project of Making Sense of Sound. He started his PhD research in speech recognition and natural language processing at the University of East Anglia. He has worked on several EPSRC projects on speech recognition, information retrieval, sports video analysis and multimodal dialogue system. He is now focusing on multimodal information processing for sound understanding.

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, Guildford, Surrey, GU2 7XH, UK

Research Fellow working at the University of Surrey. He received his PhD degree from University of Science and Technology of China (USTC) in 2015. He once visited Georgia Institute of Technology, USA from Sept., 2014 to May, 2015. He had a short intern in Bosch research center, USA. He also worked in IFLYTEK company from April, 2015 to April, 2016. He serves as a reviewer for ICASSP, IJCNN, EUSIPCO, DSP, Eusipco, Audio Engineering Society conference, IEEE /ACM Transactions on Audio, Speech and Language Processing, IEEE signal processing letters, Speech communication. His papers now have 455 citations.

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, Guildford, Surrey, GU2 7XH, UK

Research fellow at CVSSP of the University of Surrey. He received his Ph.D. degree in 2013 at the Institute of Acoustics, Chinese Academy of Sciences, majoring in active noise control of sound and vibration. He then went to Brigham Young University and conducting post-doctoral research from 2013 to 2015. After that, he held a position at the Institute of Acoustics, Chinese Academy of Sciences, and served as an associate research fellow. He serves as a reviewer for the Journal of Acoustical Society of America, Applied Acoustics, Noise Control Engineering Journal. His research interests range from air acoustics to signal processing.

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, Guildford, Surrey, GU2 7XH, UK

Research fellow at CVSSP of the University of Surrey. He received a MEng in 2009 and a PhD in 2014 in Electrical Engineering from the University of Birmingham, UK, where he also worked as a post-doctorate researcher for a year in 2015. He then joined the University of Hertfordshire, UK, from 2016 to 2017 to work as a research fellow in the EU H2020 project 'Objective Control for TAlker VErification'. After the project ended, he joined the University of Surrey in 2018 for another EU H2020 Project entitled 'Audio Commons'. He developed a framework to automatically predict the perceived level of reverberation directly from the audio files in an uncontrolled recording environments. He has published over 30 peer-reviewed papers in the leading journal and international conferences in the areas of speech processing, machine learning, human perception, accent recognition and automatic speaker characterisation.

School of Computing, Science & Engineering
University of Salford, Salford M5 4WT, UK

Research Fellow with research interests in acoustics, particularly in speech acoustics, psychoacoustics, room acoustics, spatial audio, and signal processing of speech and audio. Previously, she was a Research Associate at the Acoustics Research Unit (ARU) at the University of Liverpool, where she worked with Carl Hopkins on speech security, speech intelligibility and speech enhancement. Prior to joining the ARU, she was a Research Associate in the Voice Biomechanics and Acoustics Laboratory at Michigan State University, where she worked on a National Institutes of Health (NIH) funded project concerning speech accommodation in occupational settings and acoustical environments. She is a member of the Institute of Acoustics (IOA) and the Acoustical Society of America (ASA), and is on the steering committee of the UK Acoustics Network. She was the recipient of a Young Investigator Award for promising young acousticians (Acoustical Society of America, 2015).

Research Software Developer

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, Guildford, Surrey, GU2 7XH, UK

Cognitive scientist with focus on algorithm development. He was awarded an MA and PhD in Phonetics and Speech Communication with Logic as minor by the Ludwig-Maximilian-Universität (Munich, Germany). Over the last two decades he has conducted research in Germany (Ludwig-Maximilian-Universität), Japan (ATR, Kyoto), USA (Haskins Laboratories, New Haven, CT), Australia (Western Sydney University, Sydney & Curtin University, Perth) and the UK, spanning cognitive science, artificial intelligence, robotics and the arts.

Associated Researchers

School of Computing, Science & Engineering
University of Salford, Salford M5 4WT, UK

Technician supporting a team of research fellows on two large EPSRC-funded projects, Making Sense of Sounds and S3A, primarily responsible for gathering perceptual data in listening experiments. Lara has industry experience in R&D of electronic safety products, aeroacoustics, and automotive infotainment, but has always been most interested in audio technology. She gained a PhD from the Institute of Sound and Vibration Research, University of Southampton, researching the objective and perceptual assessment of bass reproduction accuracy in mix monitors. Lara subsequently contributed to a chapter on this subject in the second edition of the textbook Loudspeakers: For Music Recording and Reproduction (Newell and Holland, 2018. Focal Press).

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, Guildford, Surrey, GU2 7XH, UK

Centre for Vision, Speech and Signal Processing (CVSSP)
University of Surrey, Guildford, Surrey, GU2 7XH, UK

PhD student. Working on non-speech audio processing using deep learning methods.


Readers interested in an example from physics might want to consider Hughes’ (1999) description of the Ising model, a model that is not faithful to reality but nevertheless has explanatory utility in physics.

See also Harre (1986) for his discussion of models that are used to explore “possibilities” and “impossibilities”.

Prediction in this context is used to refer to the practice of predicting future events with some accuracy. This is different from the reasoning strategy of imagining the implications if a given model were true, often also referred to as the predictions of a model. This type of reasoning is more closely related to exploring possibilities as discussed in Sect. 1.2.2.


Deep learning for plant genomics and crop improvement

Our era has witnessed tremendous advances in plant genomics, characterized by an explosion of high-throughput techniques to identify multi-dimensional genome-wide molecular phenotypes at low costs. More importantly, genomics is not merely acquiring molecular phenotypes, but also leveraging powerful data mining tools to predict and explain them. In recent years, deep learning has been found extremely effective in these tasks. This review highlights two prominent questions at the intersection of genomics and deep learning: 1) how can the flow of information from genomic DNA sequences to molecular phenotypes be modeled 2) how can we identify functional variants in natural populations using deep learning models? Additionally, we discuss the possibility of unleashing the power of deep learning in synthetic biology to create novel genomic elements with desirable functions. Taken together, we propose a central role of deep learning in future plant genomics research and crop genetic improvement.


Students’ Communicative Resources in Relation to Their Conceptual Understanding—The Role of Non-Conventionalized Expressions in Making Sense of Visualizations of Protein Function

This study examines how students explain their conceptual understanding of protein function using visualizations. Thirteen upper secondary students, four tertiary students (studying chemical biology), and two experts were interviewed in semi-structured interviews. The interviews were structured around 2D illustrations of proteins and an animated representation of water transport through a channel in the cell membrane. In the analysis of the transcripts, a score, based on the SOLO-taxonomy, was given to each student to indicate the conceptual depth achieved in their explanations. The use of scientific terms and non-conventionalized expressions in the students’ explanations were investigated based upon a semiotic approach. The results indicated that there was a positive relationship between use of scientific terms and level of education. However, there was no correlation between students’ use of scientific terms and conceptual depth. In the interviews, we found that non-conventionalized expressions were used by several participants to express conceptual understanding and played a role in making sense of the visualizations of protein function. Interestingly, also the experts made use of non-conventionalized expressions. The results of our study imply that more attention should be drawn to students’ use of scientific and non-conventionalized terms in relation to their conceptual understanding.

This is a preview of subscription content, access via your institution.


Direct Instruction -Dazzling Details

Students have already taken Lecture Notes and have been introduced to the processes of protein synthesis (transcription and translation) in a previous lesson. Due to the complexity of the process and the need for students to follow the transition from DNA to mRNA (transcription) to a protein chain (translation) it is vital to reinforce this content in multiple lessons.

The teacher will review the processes in the front of the room in a whole-group discussion with step-by-step details. Students are encouraged to ask for clarification at any time! If you have models of DNA to serve as manipulatives, it will support the students' ability to visualize this multi-step process.

Sample Analogy to Conceptualize Translation : My students were struggling to conceptualize how mRNA was translated into the amino acid and how the amino acids join together to form a protein chain. An effective analogy for my students was to compare the ribosome (made of ribosomal RNA) to a conveyor belt at the grocery store. The mRNA arrives at the ribosome (the conveyor belt) and there is a imaginary "scanner" at on end like the grocery store. The mRNA sequence moves along the conveyor belt and the "scanner" will read the mRNA codons in three letter increments like the grocery store reads the bar code on the groceries. Then transfer RNA will match up the complimentary anticodon with the messenger RNA's codon so that the correct amino acid is delivered to the "conveyor belt" (ribsome). Once the amino acid has been delivered then the mRNA strand moves forward so the next codon can be read by the "scanner". The process continues until the "scanner" reads a "stop" codon message on the mRNA. The covalently-bonded amino acid chain has now created a protein which is released so that it can complete its specific function in the cell.


Making connections among multiple visual representations: how do sense-making skills and perceptual fluency relate to learning of chemistry knowledge?

To learn content knowledge in science, technology, engineering, and math domains, students need to make connections among visual representations. This article considers two kinds of connection-making skills: (1) sense-making skills that allow students to verbally explain mappings among representations and (2) perceptual fluency in connection making that allows students to fast and effortlessly use perceptual features to make connections among representations. These different connection-making skills are acquired via different types of learning processes. Therefore, they require different types of instructional support: sense-making activities and fluency-building activities. Because separate lines of research have focused either on sense-making skills or on perceptual fluency, we know little about how these connection-making skills interact when students learn domain knowledge. This article describes two experiments that address this question in the context of undergraduate chemistry learning. In Experiment 1, 95 students were randomly assigned to four conditions that varied whether or not students received sense-making activities and fluency-building activities. In Experiment 2, 101 students were randomly assigned to five conditions that varied whether or not and in which sequence students received sense-making and fluency-building activities. Results show advantages for sense-making and fluency-building activities compared to the control condition only for students with high prior chemistry knowledge. These findings provide new insights into potential boundary conditions for the effectiveness of different types of instructional activities that support students in making connections among multiple visual representations.

This is a preview of subscription content, access via your institution.


Watch the video: From DNA to protein - 3D (May 2022).