Scientific Program

Conference Series Ltd invites all the participants across the globe to attend 3rd Global Summit and Expo on Multimedia & Artificial Intelligence Holiday Inn Lisbon – continental, Lisbon, Portugal.

Day 2 :

Keynote Forum

Anton Nijholt

University of Twente, Netherlands

Keynote: Playful Multimedia in Smart and Playable Cities

Time : 09:00-09:40

OMICS International Multimedia 2017 International Conference Keynote Speaker Anton Nijholt  photo
Biography:

Anton Nijholt received his PhD in computer science from the Vrije Universiteit in Amsterdam. He held positions at various universities, both inside and outside the Netherlands. In 1989 he was appointed full professor at the University of Twente in the Netherlands. His main research interests are human-computer interaction with a focus on playful interfaces, entertainment computing, and humor generation. He edited various books, most recently on playful interfaces, entertainment computing and playable cities. Nijholt acted as program chair and general chair of many large international conferences on affective computing, entertainment computing, virtual agents, and multimodal interaction. He is chief editor of the section Human-Media Interaction of the journals Frontiers in Psychology, Frontiers in Digital Humanities, and Frontiers in ICT. He is co-editor of the Springer Book Series Gaming Media and Social Effects. Since 2015 he is also Global Research Fellow at the Imagineering Institute in Malaysia.

Abstract:

In research on smart cities the emphasis is on the use of sensors that collect information about a city’s inhabitants’ use of resources, their (real-time) behavior, and, using actuators, provide feedback to its citizens or a city’s management, and make changes to the environment that allow for more efficient use of a city’s resources . Management, efficiency and sustainability are keywords. Smartness in smart cities addresses ways to control energy consumption, increase safety, manage real-time traffic and public events, and manage other ways to make cities more efficient.

There is more to city life than efficiency. Sensors and actuators that make a city smart can be used to introduce smart playful and humorous situations, urban games, and other games that are meant to provide playful experiences or playful participation and contribution to urban design and development. Rather than have sensors and actuators to be introduced for making city life and management more efficient, they can as well be introduced to make city life more playful, increasing playfulness and introducing playful experiences during a citizen’s daily activities. We can talk about playful cities and when citizens are given the opportunity to introduce and configure sensor and actuator networks themselves we can also talk about playable cities.

Playable cities allow inhabitants to introduce their own playful applications. They need access to sensors, actuators and microprocessors. Introducing playfulness and humor in smart environments requires knowledge about humor theories. We discuss the theories and make a transition from the usual verbal humor theories to design principles that allow and stimulate the creation of humor in smart environments. We discuss accidental and intentional occurrences of humor and embed them in a framework of humor creation in smart and digitally enhanced physical environments.

OMICS International Multimedia 2017 International Conference Keynote Speaker Richard Jiang photo
Biography:

I am currently a Senior Lecturer in Computer and Information Science at the Univ. of Northumbria, Newcastle. I received a BSc in Electronics from Huazhong Univ. of Science & Tech. in China and a PhD in Computer Science from Queen’s Univ. Belfast, where its computer science was brought up by Turing laureate Sir Tony Hoare since 1960s. After my PhD study, I joined Brunel Univ. in July 2007 as a RA on an EU-FP6 project (RUSHES) on video indexing. Following this I worked as a RA at Loughborough Univ. (TSB project CrimeVis, 03/2009~09/2010), then at Swansea Univ. (EPSRC project on Sports Visualization, 10/2010~09/2011), Univ. of Bath (TSB project on Video Codec, 10/2011~09/2012) and Univ. of Sheffield (EPSRC project BIMPA, 10/2012~04/2013). I joined Univ. of Northumbria as a Lecturer in May 2013.Currently in Northumbria Univ., I am leading a research team of 5 PhDs ( as 1st Supervisor) and 1 Postdoc on Biometrics, Smart Cities, Medical Diagnosis, and Financial Computing in Dept. Computer and Information Science. I authored or co-authored 21 refereed journal papers and 24 conference papers/books/book chapters. I am a Fellow of Higher Education Academy. I served as the publication co-chair of EUVIP 2016 and the leading editor of a Springer book on biometric big data in 2016.

Abstract:

Biometrics in modern computer science is defined as the automated use of biological properties to identify individuals. The early use of biometrics can be dated back to nearly 4000 years ago when the Babylon Empire legislated the use of fingerprints to protect a legal contract against forgery and falsification by having the fingerprints impressed into the clay tablet on which the contract had been written. Nowadays, the wide use of the Internet and mobile devices has brought out the booming of the biometric applications, and research on biometrics has been drastically expanded into many new domains.

With the booming of internet and mobile applications, rapid increase of biometric data from billions of users on internet/mobile has been facing the challenge of big data issue, especially when many new applications are linked to city-scale domains in smart cities. These new applications have created a huge market of billion dollars for biometric technologies and the industry needs comes back to push the research further and vigorously. In this talk we will address the challenges and opportunities in the era of big data within the background of smart cities.

Break: Networking & Refreshments Break 10:20-10:40 @ Foyer
  • Workshop

Session Introduction

Mounîm A El-Yacoubi

University Paris Saclay, France

Title: Can handwriting analysis be helpful for Alzheimer detection?
Speaker
Biography:

Mounîm A. El-Yacoubi (PhD,University of Rennes, France, 1996) was with the Service de Recherche Technique de la Poste (SRTP) at Nantes, France, from 1992 to 1996, where he developed software for Handwritten Address Recognition that is still running in Automatic French mail sorting machines. He was a visiting scientist for 18 months at the Centre for Pattern Recognition and Machine Intelligence (CENPARMI) in Montréal, Canada, and then an associated professor (1998-2000) at the Catholic University of Parana (PUC-PR) in Curitiba, Brazil. From 2001 to 2008, he was a Senior Software Engineer at Parascript, Boulder (Colorado, USA), a world leader company in automatic processing of handwritten and printed documents (mail, checks, forms). Since June 2008, he is a Professor at Telecom SudParis, University of Paris Saclay. His main interests include Machine Learning, Human Gesture and Activity recognition, Human Robot Interaction, Video Surveillance and Biometrics, Information Retrieval, and Handwriting Analysis and Recognition.

Abstract:

Handwriting recognition has become, in the last two decades, a mature technology with successful applications like automatic mail sorting, bank check processing and natural interaction with mobile devices (Tablet PC, smartphones, etc.).  We propose a novel technique to characterize patients with early-stage Alzheimer (ESAD) w.r.t Healthy Controls (HC) by analyzing the kinematics of online handwriting on the 4l (llll) series task (Figure 1). Our approach addresses the limits of current state of the art in several ways. Instead of comparing ESAD and HC based on global parameters (speed or acceleration average, etc.), we perform such a comparison based on the full dynamics of these kinematics parameters. To do so, we first automatically segment the 4l series into individual loops (Figure 2). To characterize the variability of loops over the two classes, we define a dictionary of prototype loops (medoids, Figure 3) by a clustering scheme based on the k-medoid algorithm, with a DTW (Dynamic Time Warping) dissimilarity measure that accommodates the sequential aspect of the loops. Each cluster thus generated consists of a set of loops pertaining to the two classes in different proportions, reflecting the cluster power in discriminating the two classes. To leverage all the loops generated by a given person in the test phase, we consider a Bayesian formalism that probabilistically aggregates the contribution of each loop before making a classification decision over the two classes (ESAD and HC). This formalism has the advantage of offering a sound mechanism for rejecting the persons with ambiguous HW, for which it is better not to make a hard automatic decision. We have tested our approach on a database acquired at Broca Hospital in Paris, from patients with ESAD and from HC, and we obtain promising results, reflected by an encouraging classification performance using the leave-one-out validation scheme.

Figure 1: Segmentation of a 4l series into individual loops.

Figure 2: Evolution over l loop of vertical speed, acceleration, and jerk, from blue (low) to red (high).

Figure 3: Medoids of the loops based on their speed dynamics.

 

  • Multimedia applications and services | Multimedia communications and networking | Virtual Reality | Computer Games Design & Development | Visualization & Human Computer Interaction | Audio, Video, Speech & Signal Processing| Multimedia & AI in Healthcare

Session Introduction

Leonel Antonio Toledo Díaz

Barcelona Supercomputer Center, Spain

Title: Interactive complex virtual environments using XML configuration files

Time : 11:40-12:05

Speaker
Biography:

Leonel Toledo recieved his Ph.D from Instituto Tecnológico de Estudios Superiores de Monterrey Campus Estado de México in 2014, where he was a full-time professor from 2012 to 2014. He was an assistant professor and researcher and has devoted most of his research work to crowd simulation and visualization optimization. He has worked at the Barcelona Supercomputing Center using general purpose graphics processors for high performance graphics. His thesis work was in Level of detail used to create varied animated crowds. Currently he is a researcher at Barcelona Supercomputer Center.

Abstract:

The process of designing virtual environments is typically an expensive task in both terms of resources and processing power. It is a complex process to create immersive experiences in simulations or video games, even though hardware capabilities are constantly increasing, allowing developers to create impressive scenes, sometimes is not enough. The constant technological advances rely on heavy GPU computations for developers to be able to represent virtual environments that are composed of millions of polygons to represent highly realistic scenes, nevertheless sometimes developers are faced with an important tradeoff between realism and performance. Recently there has been a remarkable increase in the number of middlewares and frameworks that try to solve the technical requirements of complex 3D. For instance, scenes that have several thousands of characters are computationally expensive as well as memory consuming. To attempt to solve this problem, several techniques must be implemented such as level of detail, illumination, collision avoidance, animation transfer, audio management just to mention a few. Most approximate rendering algorithms ignore perception, or use early vision based perceptual metrics to accelerate performance. Visual perception in computer graphics has received a lot of attention over the past few years. By understanding the limitations of the human visual systems, rendering algorithms can be modified to eliminate unnecessary computations which will produce image with no perceivable difference to the observer. For instance, it is known that observers do not require a physically accurate simulation of the illumination in order to perceive a scene as realistic. Optimizing the rendering stage for any given simulation is a complex process and there are many possible ways that can be used to reduce the detail of a geometric mesh, having different advantages and draw-backs for its implementation within a GPU.
 

 

Speaker
Biography:

Aykut Koc completed his BS in Electrical Engineering at Bilkent University in 2005; PhD in Electrical Engineering, MS in Electrical Engineering and MS in Management Science at Stanford University. Following his PhD, he worked briefly in the Silicon Valley and then started to work for ASELSAN. He was in the founding team of ASELSAN Research Center and worked on its initial founding process from ground up. He is currently managing one of the research departments of ASELSAN Research Center, which can be considered a pioneer for corporate research labs in Turkey. He also teaches Fourier Optics course part-time at Middle East Technical University (METU), Electrical Engineering department. Throughout his career, he worked on digital algorithms for optics and image processing, visual target tracking algorithms as well as natural language processing.

Abstract:

The vast amount of user uploaded visual content available online makes automated visual classification a critical research problem. While existing studies for visual classification mainly focus on recognition of generic objects such as vehicles, plants, food and animals, recently, studies have also been presented for exploring a more challenging research problem, fine grained object classification, aiming to distinguish fine subcategories within coarse object categories, such as types of vehicles, flowers and kinds of food. Here, another fine grained categorization problem important for multimedia applications, categorizing in-building scenes and their architectural styles, is attempted which will be beneficial for applications related to real estate and interior decoration. In-building scenes are divided into five coarse categories; kitchen, bathroom, living room, bedroom and dining room. As fine categories, each in-building scene has been assigned an architectural style such as Asian, Contemporary, Victorian, Rustic and Scandinavian. On a database consisting of a large number of in-building images, descriptive patterns corresponding to types of scenes and specific architectural styles are learned globally by utilizing deep convolutional neural network based models that have proven success in visual categorization. Moreover, local scene elements and objects which provide further clues for identifying architectural styles are discovered: Scene objects with unique architectural style characteristics carry more discriminative power, whereas co-existing objects visible among various types of scenes are less discriminative. As potential useful applications, several scenarios for classification and retrieval of in-building images are investigated. Experiments show that using only the learned deep representations are effective in identifying scene types while they perform poorly for architectural styles. Nonetheless, revealing key local scene objects ameliorates their performance for both classification and retrieval tasks for architectural styles.

Break: Lunch Break 12:30-13:[email protected] Restaurant
Speaker
Biography:

J J Joshua Davis is experienced as a Decision Analyst and Strategic Planner for banks, oil companies, consulting firms and family business. He lectured for several years in the fields of Systems Thinking, Computer Simulation, Chaos Theory, Fractal Geometry, Decision Making and Systems Dynamics. From 1994 onwards, after a set of meaningful spiritual experiences, he spent many years travelling as an Ambassador of Peace around the world. Since 1998, he has worked in research concerning decision making and consciousness and published a thesis, “The Brain of Melchizedek, A Cognitive Neuroscience Approach to Spirituality”. More recently, he has been researching in close collaboration with Grant Gillett, Robert Kozma, Walter Freeman and Paul Werbos in the areas of Cognitive Neuroscience, Philosophy, Quantum Physics and Biophysics of Peace.

Abstract:

This presentation describes the development and use of the art of encephalography in a new and more advanced way, whereby this qualitative tool where large quantities of brain data images are processed and converted into brain dynamics movie and then displayed for the purpose of visual discrimination associated with the different brain cognitive states, as well as the different stages of cognitive processes related to the cycle of creation of knowledge and meaning. The methodology we present is inspired by the art of encephalography, where this art is enhanced from the mere plotting of brain signals in the time domain to spatio-temporal frames that when presented in a sequence of plots, produces a brain dynamics movie which allows to visualize different patterns of behavior in different conditions produced by different stimuli based on experimental data. By careful observation of each of these movies, we learn to identify different structures and visual patterns where large-scale synchronizations and de-synchronizations are observed across different frequency bands. These movies also allow us to explore the temporal evolution of these spatial brain patterns where we can identify the different stages in the manifestation of the hypothesized cycle of creation of knowledge and meaning. We conjecture that movie viewing, will allow a better understanding of learning and adaptation. In summary, we can say that viewing brain dynamics movies will allow a significant impression of: Brain events for different measurement; brain events across bands and; the different stages of the cycle of creation of knowledge and meaning. The research team at The Embassy of Peace in Whitianga, New Zealand accomplished this work in close collaboration with Walter J. Freeman and Robert Kozma.

Speaker
Biography:

Takayoshi Iitsuka completed his Master's degree in Science and Technology at University of Tsukuba in Japan. From 1983 to 2003, he was a Researcher and Manager of optimizing and parallelizing compiler for supercomputers in Central Research Laboratory and Systems Development Laboratory of Hitachi. From 2003 to 2015, he was in strategy and planning department of several IT divisions. He retired Hitachi in October 2015 and started study and research of Artificial Intelligence in May 2016. In October, he achieved top position of Montezuma’s revenge in OpenAI gym. His current research interests include Deep Learning, Deep Reinforcement Learning and Artificial General Intelligence based on whole brain architecture.

Abstract:

Games with little chance of scoring such as Montezuma’s revenge are difficult for Deep Reinforcement Learning (DRL) because there is little chance to train Neural Network (NN), i.e. no reward, no learning. DeepMind indicated that pseudo-count based pseudo-reward is effective for learning of games with little chance of scoring. They achieved over 3000 points in Montezuma’s revenge by combination with Double-DQN. On contrary, its average score was only 273.70 point in combination with A3C (it is called A3C+). A3C is very fast training method and getting high score with A3C+ is important. I propose new training methods: Training Long History on Real Reward (TLHoRR) and Diverse Hyper Parameters in Threads (DHPT) for combination with A3C+. TLHoRR trains NN with long history just before getting score only when game environment returns real reward i.e. training length by real reward is over 10 times longer than that of pseudo-reward. This is inspired by reinforcement of learning with dopamine in human brain. In this case, real score is very valuable reward in brain and TLHoRR strongly trains NN like dopamine does. DHPT changes hyper parameters of learning in each thread and make diversity in threads actions. DHPT was very effective for stability of training by A3C+. Without DHPT, average score is not recovered from zero when it is dropped to zero. With TLHoRR and DHPT in combination with A3C+, average score of Montezuma’s revenge almost reached 2000 points. This combination made exploration of game state better than that of DeepMinds’s paper. In Montezuma’s revenge, five rooms are newly visited by TLHoRR and DHPT; they were not visited by DeepMinds’s pseudo-count based pseudo-reward only. Furthermore, with TLHoRR and DHPT in combination with A3C+, I got and kept top position in Montezuma’s revenge in OpenAI gym environment from October 2016 to March 2017.

Speaker
Biography:

Mrouj M Almuhajri is a Lecturer at Saudi Electronic University, KSA. She completed her Bachelor degree in Computer Science at Umm Al-Qura University, Saudi Arabia and, Master degree in Computer Science at Concordia University, Montreal, Canada.

Abstract:

Social media play a significant role among younger generations and students. They use it to communicate with the public, spread news, and share their thoughts using different content forms like text, audio, image, and video. Multimedia makes the transfer of information much easier. This paper details the results of a semester-long experiment that detect the effects of integrating Twitter with e-learning tools on the education process. More specifically, the experiment studies the ability to enhance the students’ understanding of the taught material and improve communication between the students and the instructor. The study was done in participation with sophomore SEU students taking CS141 (computer programming) and IT241 (operating systems) courses for computing and informatics majors. The study was conducted using the Twitter account @seugeeks. A total of 114 subscribers followed the account during the semester of the study. Twitter account was used for many activities, such as announcements, video tutorials, questions, and discussions. To assess the impact of using twitter in the teaching process, an online survey was published at the conclusion of the semester. A total of 39 students participated in the survey. The results reflected that all participants have twitter account, and the majority of them (65%) were using it for more than three years. Statistical analysis of Likert scale data revealed positive results of utilizing Twitter in the learning process. Both students and instructor were able to communicate with each other in an easier way creating a collaborative environment. In fact, 96% of the participants supported utilizing the same methodology with other courses. In conclusion, this study provides evidence that Twitter is a useful tool in the educational process especially when different forms of media are combined. The study demonstrates Twitter’s ability to provide a collaborative platform for both faculty and students.

Masohiro Suzuki

Kanagawa Institute of Technology, Japan

Title: Technique of obtaining visually perceived positions using movements of users’ bodies

Time : 14:35-15:00

Speaker
Biography:

Masahiro Suzuki received his B.A., M.A., and Ph.D. degrees in psychology from Chukyo University in Nagoya, Aichi, Japan in 1994, 1996, and 2002 respectively. He joined the Imaging Science and Engineering Laboratory of Tokyo Institute of Technology in Yokohama, Kanagawa, Japan in 2003 as a postdoctoral   researcher. He then joined the Human Media Research Center of Kanagawa Institute of Technology in Atsugi, Kanagawa, Japan in 2006 as a postdoctoral researcher. He will join the Department of Psychology of Tokiwa University in Mito, Ibaraki, Japan in April 2017 as an assistant professor. He is currently engaged in research on 3-D displays and augmented reality. Dr. Suzuki is a member of Japan Society of Kansei Engineering, Japanese Cognitive Science Society, Japanese Psychological Association, Optical Society of Japan, and Vision Society of Japan.

Abstract:

We proposed a technique of obtaining the visually perceived positions of virtual objects presented in front of the screens of 3-D displays, and evaluated it. Applications where users’ own bodies, which are actually seen by users unlike video captured images, interact with virtual objects are attractive applications of 3-D displays. Users expect interactions to be executed when their bodies are seen at the same positions of virtual objects because it is natural for them. Executing interactions when users’ bodies are at the visually perceived positions of virtual objects is the crucial requirement to interactions between the bodies and objects. Conventional techniques execute interaction when users’ bodies are at the positions calculated from binocular disparity of virtual objects. However, the visually perceived positions often differ from the positions calculated from binocular disparity, so that conventional techniques make it difficult to meet the requirement. In contrast to conventional techniques, the proposed technique can meet the requirement by obtaining the visually perceived positions of virtual objects from body movements. According to previous studies on body movements, the velocity of reaching movements as a function of time follows a bell curve. In the proposed technique, the velocity of reaching movements when users reach out to virtual objects is first fitted into a Gaussian function. The final positions of reaching movements are then obtained based on the fitted functions before the movements are finished because virtual objects are seen there. Therefore, the requirement is fulfilled by executing interactions when users’ bodies are at the positions obtained in last step. In the evaluation, we demonstrated the feasibility of the proposed technique by examining the accuracy and precision of the positions obtained with the proposed technique. We also demonstrated the usefulness of the proposed technique by examining the exactness of interaction executed with the proposed technique.

Yufang Tang

Shandong Normal University, China

Title: Sparse representation for image classification

Time : 15:00-15:25

Speaker
Biography:

Yufang Tang has been a Lecturer at School of Communication of Shandong Normal University in China since 2015. He obtained his Bachelor’s degree in Computer Science and Technology (2007) and Master’s degree in Computer Application Technology (2010) at Shandong Normal University, and received his Doctorate degree in Signal and Information Processing at Beijing University of Posts and Telecommunications (2015). He is engaged in the research on Computer Vision, Machine Learning, Artificial Intelligence and Data Mining, etc.

Abstract:

As a new theory of signal sampling, sparse representation derived from compressed sensing, which is obviously different from Nyquist sampling theory. More and more image classification methods based on sparse representation have been proved to be effectively used in different fields, such as face recognition, hyper spectral image classification, handwriting recognition, medical image processing, etc. Image classification methods based on sparse representation has become a hotspot of research topic in recent years. Not only the research institutes, but also the governments and the militaries have invested lots of energy and finance in this attractive task. In this presentation, we intend to review its history and development tendency, and reveal our latest research progress on sparse representation for image classification.

Li Liu

University of Shanghai for Science and Technology, China

Title: Generating graphs from key points for near-duplicate document image matching

Time : 15:25-15:50

Speaker
Biography:

Li Liu is a lecturer at the University of Shanghai for Science and Technology. She received the Ph.D. degree in pattern recognition and intelligent system from East China Normal University, Shanghai, China, in 2015. She was with the Centre for Pattern Recognition and Machine Intelligence (CENPARMI), Concordia University, Montreal, Quebec, Canada, from 2013 to 2014 as a visiting doctoral student. Her research interests include pattern recognition, machine learning and image analysis

Abstract:

We propose a novel near-duplicate document image matching approach. Some keypoints are first detected from the image using the difference-of-Gaussian function. We then present a clustering method, based on which the keypoints are clustered into several groups. The number of clusters is determined automatically according to the distributions of the keypoints.  Afterwards, a graph is generated whose nodes correspond to the obtained clusters and the edges describe the relationships between two clusters. Consequently, the problem of image matching is transformed to graph matching. To compute the similarity between two graphs, we build their association graph and then find the maximum weight clique. A thorough evaluation of the performance of the proposed approach is conducted on two different datasets. Promising experimental results demonstrate the effectiveness and validity of this method.

Break: Networking & Refreshments Break 15:50-16:10 @ Foyer
  • Young Reseacher Forum

Session Introduction

Metehan Unal

Ankara University, Turkey

Title: A distant augmented reality system for cultural heritage sites using drones

Time : 16:10-16:30

Speaker
Biography:

Metehan Unal holds a B.Sc. degree (honours) from Computer Engineering Department of Ankara University and now he is pursuing for an M.Sc. degree. He worked as trainee in Turkish Aerospace Industry in 2013. Now, he has been working as a Research Assistant in Ankara University since 2015. His research interests include Augmented Reality, Computer Graphics and Artificial Intelligence. He is also an enthusiastic Android developer.

Abstract:

Statement of the Problem: Augmented Reality (AR) is a view that integrates the real world imagery with computer generated sounds, images or 3D objects. It has been possible by AR to place 3D reconstructions of buildings, which have been subject to wear and tear of thousands of years, on a historic site. In this way, cultural heritage sites can be better explained and handed on to future generations. Physical reconstruction in ruined cultural heritage sites can be financially costly and time consuming. In addition, site can be damaged during physical reconstruction. With state-of-art AR technology, 3D models can be placed in-situ without any damage, while increasing the appeal of the area for tourists and enthusiastic students.

The aim of this study is augmenting the video images received from mobile devices or drones with 3D models of Roman Bath which is one of the important cultural heritage sites in Ankara, Turkey.

Methodology & Theoretical Orientation: 3D model of Roman Bath were generated using reconstruction images drawn by expert archaeologists. Using Unity 3D Game Engine, this model was overlaid to the camera stream which is received from mobile devices such as mobile phones and tablets. The location services provided by these mobile devices were also used to place the model using actual GPS locations.  Furthermore, an AR application was developed for drones to augment camera streams from a top-view, (Figure 1).

Findings: The developed application allows the users to display the models augmented on the camera view. The use of drones in this study brings a new dimension to Augmented Reality by adding a third eye to the user. We name this approach as ‘Distant Augmented Reality’.

Conclusion & Significance: The authors expect that such applications not only provide an entertaining way to learn about history but also preserve cultural heritage sites.

Speaker
Biography:

Yoshikatsu Nakajima received his B.E. degree in information and computer science from Keio University, Japan, in 2016. Since 2016, he has been a master student in the Department of Science and Technology and worked as a research assistant of Keio Program for Leading Graduate School at Keio University, Japan. He attended a lot of international and domestic conferences and won five awards in the two years since he started his research. In 2014, he had joined the start-up company Home-tudor Tomonokai as a developer and developed the on-line tutor system by himself using Ruby on Rails. His research interests include augmented reality, SLAM, object recognition, and computer vision.

Abstract:

Camera pose estimation with respect to target scenes is an important technology for superimposing virtual information in augmented reality. However, it is difficult to estimate the camera pose for all possible view angles because feature descriptors such as SIFT are not completely invariant from every perspective. We propose a novel method of robust camera pose estimation using multiple feature descriptor databases generated for each partitioned viewpoint in which the feature descriptor of each keypoint can be almost invariant. Our method estimates the viewpoint class for each input image using deep learning based on the set of training images prepared for each viewpoint class.  We introduce two ways of preparing those images for deep learning and generating databases. In the first method, images are generated by Projection matrix to learn more robustly in the environment by changing those background. The second method uses real images to learn the entire environment around the plane pattern. Through the evaluation result, we confirmed that the number of the correct matches increased and the accuracy of camera pose estimation was improved compared to the conventional method.

Furthermore, we are trying on applying the concept of Viewpoint Class to the field of Object Recognition recently. Object recognition is one of the major research fields in computer vision and has been applied to various fields. In general, conventional methods are not robust in the obstacle and have a problem such that the accuracy is decreased when the camera stagnates at a poor position to the target object. We propose a novel method of object recognition that can be carried in real time by equally dividing the viewpoint around each object in the scene and impartially integrating the Convolutional Neural Network (CNN) outputs from each Viewpoint Class (See Image). We confirmed its effectiveness through experiments.

Yi bin Hou & Jin Wang

Beijing University of Technology, China

Title: Packet loss rate mapped to the quality of experience in the IOT network

Time : 16:50-17:10

Speaker
Biography:

Jin Wang received a Bachelor’s degree in Software Engineering from Beijing University of Chemical Technology, Beijing, China, in 2012.6. And won the National Scholarship in 2010 and won the National Endeavor Fellowship in 2009. She received aMastersr graduate in Computer Application Technology in Shijiazhuang Tiedao University in 2015.1. And published manypapers, includingg ISTP, EI and SCI. Participate in three National Natural Science Fund Project (No: 61203377, 60963011, 61162009) and Jiangxi Natural Science Foundation of China (No: 2009GZS0022), and the Special Research Foundation of Shijiazhuang Tiedao University (No: Z9901501, 20133007). She used to work at the computer center of the Navy General Hospital in 2015.4-2015.7 as an intern technician. Participate in Naval Logistics Project and anesthesia program (CHJ13L012). Now from 2015.4 she is in the school of software engineering, Department of information, Beijing University of Technology, read her PHD, Her research interests are the Internet of things and software engineering and Embedded and  image and video quality assessment in distorting network.

Yibin Hou graduated from xi’ an Jiaotong university computer science department, with a master’s degree in engineering, graduated from the Netherlands EINDHOVEN university of technology department, received a doctor’s degree from the department of engineering. From 2002 to 2013 as vice President of Beijing university of technology. The Beijing university of technology, professor, doctoral supervisor, dean of the school of software, embedded computing, director of the institute, Beijing university of technology, deputy director of academic committee and secretary-general, Beijing Internet software and systems engineering technology research center director. His research interests have been Internet of things.

 

Abstract:

The Internet of things, including Internet technology, including wired and wireless networks.In this paper, we investigate on the QOE and packet loss rate of the network because QOE is important in the network and packet loss rate is the key point in many papers.In order to study the influence of packet loss on the users’ quality of experience QoE and establish the Mapping model of the two when the video transmit in the network, building a NS2 + MyEvalvid simulation platform, by the method of modifying QoS parameters to simulate different degrees of packet loss, focus on the influence of packet loss on QoE and establish the mapping model between them. Experimental results show that, packet loss has a significant influence on Quality of experience. Packet loss rate and the Quality of experience presents a nonlinear relationship, and use Matlab to establish the mapping model, this model’s accuracy is high, easy to operate, can real-time detect packet loss has influence on the user’s quality of experience (QoE). The contribution of this paper is first through research obtained packet loss has a significant effect on the video. Second, based on receiving the packet loss has a significant effect on QoE study and establish the mapping model of packet loss rate and the user’s quality of experience QoE. Next step is to set up considering network packet loss of video quality evaluation model, on the basis of considering the different packet loss rate and different content complexity has effects on QoE which conclude from packet loss has effects on QoE’s part, combine consider other factors such as different packet loss models to establish a video quality evaluation model consider the network packet loss, more accurate prediction of QoE is the future work. E-commerce, such as the Jingdong and Taobao's free trial center, has become a hot topic.

Fig. 1 MyEvalvid system structure

Fig. 23  Src13 fitting curve (PSNR)