Scientific Program

Conference Series Ltd invites all the participants across the globe to attend 3rd Global Summit and Expo on Multimedia & Artificial Intelligence Holiday Inn Lisbon – continental, Lisbon, Portugal.

Submit your Abstract
or e-mail to

[email protected]
[email protected]
[email protected]

Day 1 :

Keynote Forum

Ching Y Suen

Concordia University, Canada

Keynote: How well can computers recognize handwriting ?

Time : 10:05-10:35

Multimedia 2017 International Conference Keynote Speaker Ching Y Suen photo
Biography:

Ching Y. Suen is the Director of CENPARMI and the Concordia Honorary Chair on AI & Pattern Recognition. He received his Ph.D. degree from UBC (Vancouver) and his Master's degree from the University of Hong Kong. He has served as the Chairman of the Department of Computer Science and as the Associate Dean (Research) of the Faculty of Engineering and Computer Science of Concordia University. Prof. Suen has served at numerous national and international professional societies as President, Vice-President, Governor, and Director. He has given 45 invited/keynote papers at conferences and 200 invited talks at various industries and academic institutions around the world. He has been the Principal Investigator or Consultant of 30 industrial projects. His research projects have been funded by the ENCS Faculty and the Distinguished Chair Programs at Concordia University, FCAR (Quebec), NSERC (Canada), the National Networks of Centres of Excellence (Canada), the Canadian Foundation for Innovation, and the industrial sectors in various countries, including Canada, France, Japan, Italy, and the United States. Currently, he is the Editor-in-Chief of the journal of Pattern Recognition, an Adviser or Associate Editor of 5 journals, and Editor of a new book series on Language Processing and Pattern Recognition. Actually he has held previous positions as Editor-in-Chief, or Associate Editor or Adviser of 5 other journals. He is not only the founder of three conferences: ICDAR, IWFHR/ICFHR, and VI, but has also organized numerous international conferences including ICPR, ICDAR, ICFHR, ICCPOL, and as Honorary Chair of numerous international conferences.

Abstract:

Handwriting is one of the most important media of human communication. We write and read every day. Though handwriting can vary considerably in style and neatness, we recognize handwritten materials easily. Actually humans develop their writing skill in their childhood and gradually refine it throughout their lives. This paper examines ways humans write (from primary school to adult writing) and ways of teaching the computer to recognize (handwriting technology) what they produce from ancient (such as carved scripts, old books and documents) to modern times (such as immigration port-of-entry forms, cheques, payment slips, envelopes, and different kinds of notes and messages). Methods such a machine learning and deep classifier structures, extraction of space and margins, slant and line direction, width and narrowness, stroke connections and disconnections will be analyzed with large quantities of data. Both training procedures and learning principles will be presented to illustrate methodologies of enabling computers to produce robust recognition rates for practical applications in the office and in mobile phones. In addition, the art and science of graphology will be reviewed, and techniques of computerizing graphology will be illustrated with interesting examples.
 

Multimedia 2017 International Conference Keynote Speaker Jean-Marc Ogier  photo
Biography:

Jean-Marc Ogier received his PhD degree in computer science from the University of Rouen, France, in 1994. During this period (1991-1994), he worked on graphic recognition for Matra Ms&I Company. From 1994 to 2000, he was an associate professor at the University of Rennes 1 during a first period (1994-1998)  and at the University of Rouen from 1998 to 2001. Now full professor at the university of la Rochelle, Pr Ogier was the head of L3I laboratory (research lab in computer sciences of the university of la Rochelle) which gathers more than 120 members and works mainly of Document Analysis and Content Management. Author of more than 200 publications / communications, he managed several French and European projects dealing with document analysis, either with public institutions, or with private companies.  Pr Ogier was  Deputy Director of the GDR I3 of the French National Research Centre (CNRS) between 2005 and 2013. He was also Chair of the Technical Committee 10 (Graphic Recognition) of the International Association for Pattern Recognition (IAPR) from 2010 to 2015, and is the representative member of France at the governing board of the IAPR. He is now the general chair of the TC6, dealing with computational forensics of the International Association for Pattern Recognition. Jean-Marc Ogier has been the general chair or the program chair of several international scientific events dealing with document analysis (DAS, ICDAR, GREC, …) He was also Vice rector of the university of La Rochelle from 2005 to 2016, and president of VALCONUM association, which is an association aiming at forstering relations between industries and research organizations. He is now the president of the university of La Rochelle

Abstract:

Document engineering is the area of knowledge concerned with principles,tools and processes that improve our ability to create, manage, store, compact, access, and maintain documents. The fields of document recognition and retrieval have grown rapidly in recent years. Such development has been fueled by the emergence of new application areas such as the World Wide Web (WWW), digital libraries, computational forensics for document processing, and video- and camera-based OCR. This talk will address some recent developments in the area of Document Processing.

Keynote Forum

Jiankun Hu

University of New South Wales

Keynote: Multimedia security in big data era: Advances and open questions

Time : 10:00-10:40

Multimedia 2017 International Conference Keynote Speaker Jiankun Hu photo
Biography:

Jiankun Hu is full Professor at the School of Engineering and IT, University of New South Wales, Canberra, Australia. He has worked in the Ruhr University Germany on German Alexander von Humboldt Fellowship1995--1996; research fellow in Delft University of the Netherlands 1997-1998, and  research fellow in Melbourne University, Australia 1998–1999. Jiankun's main research interest is in the field of cyber security including Image Processing/Forensics and machine learning where he has published  many papers in high quality journals. He has served in the editorial board of up to 7 international journals including the top venue IEEE Transactions on Information Forensics and Security and served as Security Symposium Chair of IEEE flagship conferences of IEEE ICC and IEEE Globecom. He  has served at the prestigious Panel of Mathematics, Information and Computing Sciences (MIC), ARC ERA(The Excellence in Research for Australia) Evaluation Committee.

Abstract:

Multimedia audio, video, and images have dominated the Internet traffic and also constituted the major applications in our daily life. Multimedia security has always been a concern in the community due to the issues of movie and music piracy, and privacy of medical images. Conventional cryptography can be applied to multimedia but additional challenges, such as real-time and efficiency, must be addressed. With emerging multimedia applications such as peer-to-peer streaming, and 3D data, multimedia is embracing a big data era where security and privacy are facing new challenges. Most multimedia security surveys have focused on the aspect of issues related to data communication. In the big data era, cloud computing is becoming a major platform which needs attracting due attention in the multimedia community. This keynote speech will provide a report on the advances in the field and open research questions will be discussed. The focus will be placed on efficient algorithms for emerging multimedia applications such a 3D imaging, and cloud based applications including digital rights management.   

  • Virtual Reality| Animation and Simulations | Computer Vision & Pattern Recognition |Computer Graphics & Applications| Image Processing | Artificial Intelligence| 3D analysis, representation and printing |

Session Introduction

David Xu

Regent University, USA

Title: Maya blend shapes for 3D Facial Animation
Speaker
Biography:

David Xu is tenure associate professor at Regent University, specializing in computer 3D animation and movie special effects. He got MFA Computer Graphics in 3D Animation from Pratt Institute in NY. He has served as a senior 3D animator in Sega, Japan; a senior CG special effector in Pacific Digital Image Inc., Hollywood; and as a professor of animation in several colleges and universities where he developed the 3D animation program and curriculum. He has been a committee member of the computer graphics organization Siggraph Electronic Theater, where he was recognized with an award for his work.

Abstract:

Blend shapes, also known as morph target animation, are a powerful way of deforming geometry such as human face to create various facial expressions, from happy to sad. In this presentation, after overviewing Autodesk Maya blend shapes, their features and work flow, Professor Xu will demonstrate and discuss how to create a more efficient workflow by combining blend shapes with Maya’s Set Driven Key features, and how to create a more complex facial animation with advanced blend shape techniques and features. Concepts and techniques such as blend shape deformer, Blend Shape node, Tweak node, Set Driven Key, morph target animation, vertex position, and more will be introduced and demonstrated. Finally, Professor Xu will discuss the advantages and disadvantages of using morph target animation over skeletal animation in 3D facial animation.

Speaker
Biography:

Mort Naraghi-Pour received his Ph.D. degree in electrical engineering from the University of Michigan, Ann Arbor, in 1987. Since August 1987, he has been with the School of Electrical Engineering and Computer Science, Louisiana State University, Baton Rouge, where he is currently the Michel B. Voorhies Distinguished Professor of Electrical Engineering. From June 2000 to January 2002, he was a Senior Member of Technical Staff at Celox Networks, Inc., a network equipment manufacturer in St. Louis, MO. Dr. Naraghi-Pour received the best paper award from WINSYS 2007 for a paper co-authored with his student, Dr. X. Gao. Dr. Naraghi-Pour’s research and teaching interests include wireless communications, broadband networks, information theory, and coding. He has served as a Session Organizer, Session Chair, and member of the Technical Program Committee for numerous national and international conferences.

Abstract:

In ensemble systems, several experts, which may have access to possibly different data, make decisions which are then fused by a combiner (meta-learner) to obtain a final result. Such Ensemble-based systems are well-suited for processing Big-Data from sources such as social media, in-stream monitoring systems, networks, and markets, and provide more accurate results than single expert systems. However, most existing ensemble learning techniques have two limitations: i) they are supervised, and hence they require access to the true label, which is often unknown in practice, and ii) they are not able to evaluate the impact of the various data features/contexts on the final decision, and hence they do not learn which data is required. In this paper we propose a joint estimation-detection method for evaluating the accuracy of each expert as a function of the data features/context and for fusing the experts’ decisions. The proposed method is unsupervised: the true labels are not available and no prior information is assumed regarding the performance of each expert. Extensive simulation results show the improvement of the proposed method as compared to the state-of-the-art approaches. We also provide a systematic, unsupervised method for ranking the informativeness of each feature on the decision making process.

Speaker
Biography:

Ghyslain Gagnon received the Ph.D. degree in electrical engineering from Carleton University, Canada in 2008. He is now an Associate Professor at École de technologie supérieure, Montreal, Canada. He is an executive committee member of ReSMiQ and Director of research laboratory LACIME, a group of 10 Professors and nearly 100 highly-dedicated students and researchers in microelectronics, digital signal processing and wireless communications. Highly inclined towards research partnerships with industry, his research aims at digital signal processing and machine learning with various applications, from media art to building energy management.

Abstract:

This peculiar combination of illuminated electro-technical elements honors the intellectual journey of philosopher Gaston Bachelard (1884-1962), who interlaced forward-thinking ideas underlying the complex interaction of reason and imagination, an important contribution to inspire us a society deeply marked by scientific and artistic creativity. Permanently installed in the main tunnel at École de technologie supérieure, Montreal, Canada, this interactive artwork reminds future engineers of the importance of the rationale-intuitive bilaterality in any technological innovation.

The animation of lighting creates routes of running light blobs through the tunnel. Since the lighted tubes share the space with actual electrical and HVAC pipes, the lighting dynamics gives the impression of flow of useful elements (electricity, network data, air) in the building. A microphone is hidden in an electrical box at the center of the tunnel to allow interactive control. A sound recognition algorithm is used to identify blowing sounds: when users blow in an opening in this electrical box, the flow of light is accelerated, a symbol of the contribution of engineers in such technical systems.

The artwork was designed as an innovation platform, for students to add elements to the installation in the future, allowing increased interactivity. This platform was successfully tested in 2015 by a team who created a luminous tug of war game in the tunnel, with players using their mobile phones as a controlling device.

The installation was nominated at the Media Architecture Biennale awards ceremony, Sydney, 2016.

Nuria Medina-Medina

University of Granada, CITIC-UGR, Spain

Title: Designing video games
Speaker
Biography:

Nuria Medina received her Ph.D. in computer science from the University of Granada (UGR) in 2004, proposing an adaptive and evolutionary model for hypermedia systems. Nowadays, she belongs to the direction team of the Research Centre for Information and Communications Technologies (CITIC-UGR) and is professor at the Department of Computer Languages and Systems at this Spanish University where she directs a project that implements educational games in Andalusian school classrooms (P11-TIC7486).

Abstract:

The video game industry continues with its high rate of growth and, correspondingly, video games are a product present in most of the first-world households. Thousands of people around the planet work in the development of games and billions of players enjoy these multimedia creations; however, from an engineering perspective, critical issues in the design of these video games are not are being sufficiently considered. Therefore, an academic effort to identify and analyze which are the keys of a good design and the possible design solutions in each particular context should be done in relation to the productive and unstoppable world of video games. With this aim, taxonomies, guidelines and design patterns are different approaches in which we have been working. On the other hand, serious games must be specially attended since the serious propose involved in the game implies the need to design and integrate no-ludic contents and the collaboration with no-technical professionals during all the process. In the first case, it is essential to introduce the serious elements in the game so that they remain hidden within the ludic contents. In the second case, an adequate language is crucial to facilitate the communication between the technical team and the subject-domain experts (educators, doctors, etc.). Particularly, our group has been researching to achieve the indispensable balance between the ludic component and the instructive component in educational video games. As a result, our design methodology establishes a ‘divide and conquer' approach where the game challenges and the educational goals are designed and interrelated making use of graphics notations, which allow modeling of the artefacts of the educational video game in a comprehensible form for all the stakeholders. As a study case, an educational adventure to promote reading comprehension has been developed and is being evaluated.

Speaker
Biography:

Mounîm A. El-Yacoubi (PhD,University of Rennes, France, 1996) was with the Service de Recherche Technique de la Poste (SRTP) at Nantes, France, from 1992 to 1996, where he developed software for Handwritten Address Recognition that is still running in Automatic French mail sorting machines. He was a visiting scientist for 18 months at the Centre for Pattern Recognition and Machine Intelligence (CENPARMI) in Montréal, Canada, and then an associated professor (1998-2000) at the Catholic University of Parana (PUC-PR) in Curitiba, Brazil. From 2001 to 2008, he was a Senior Software Engineer at Parascript, Boulder (Colorado, USA), a world leader company in automatic processing of handwritten and printed documents (mail, checks, forms). Since June 2008, he is a Professor at Telecom SudParis, University of Paris Saclay. His main interests include Machine Learning, Human Gesture and Activity recognition, Human Robot Interaction, Video Surveillance and Biometrics, Information Retrieval, and Handwriting Analysis and Recognition.

Abstract:

Human action recognition (HAR) is an active research field, driven by the dramatic price decrease of powerful digital cameras, storage and computing machines, and by the potential of HAR for designing smart engines making sense of today ubiquitous video streams. In video surveillance, for instance, HAR can help determining abnormal events in public facilities. For e-health, HAR may be harnessed as an assistive technology monitoring people with autonomy loss. Recognizing human actions, however, is challenging. Known variability factors such intra-class and inter-class variability, are much more adverse as they involve additional structural variability that is hard to cope with. Besides, actions are not communicative in general, which hinders relevant information acquisition and tracking. Others variability sources include viewing direction and distance w.r.t acquisition sensor, lighting conditions, etc. In this talk, we review the problem of human action and gesture recognition in general. After discussing the challenges above, we propose a new approach that focuses on video sequential input modeling. The modeling is based on a two-layer SVM – Hidden Conditional Random Field (SVM-HCRF) in which SVM acts as a discriminative front-end feature extractor. First, a sliding window technique segments the video sequence into short overlapping segments, described each by a local Bag-of-Words (BOW) of interest points. A first-layer SVM classifier converts each BOW into a vector of class conditional probabilities. The sequence of these vectors serves as the input observation sequence to HCRF for actual human action recognition. We show how this hierarchical modeling optimally combines two different sources of information characterizing motion actions: local motion semantics inferred by SVM, and long range motion feature dependencies modeled by HCRFs at a higher level. Finally, we show how these models can be extended to the problem of conjoint segmentation and recognition of a sequence of actions within a continuous video stream.

 

​Leonel Toledo

Barcelona Supercomputer Center, Spain

Title: Virtual Environments
Speaker
Biography:

Leonel Toledo recieved his Ph.D from Instituto Tecnológico de Estudios Superiores de Monterrey Campus Estado de México in 2014, where he was a full-time professor from 2012 to 2014. He was an assistant professor and researcher and has devoted most of his research work to crowd simulation and visualization optimization. He has worked at the Barcelona Supercomputing Center using general purpose graphics processors for high performance graphics. His thesis work was in Level of detail used to create varied animated crowds. Currently he is a researcher at Barcelona Supercomputer Center.

Abstract:

The process of designing virtual environments is typically an expensive task in both terms of resources and processing power. It is a complex process to create immersive experiences in simulations or video games, even though hardware capabilities are constantly increasing, allowing developers to create impressive scenes, sometimes is not enough. The constant technological advances rely on heavy GPU computations for developers to be able to represent virtual environments that are composed of millions of polygons to represent highly realistic scenes, nevertheless sometimes developers are faced with an important tradeoff between realism and performance. Recently there has been a remarkable increase in the number of middlewares and frameworks that try to solve the technical requirements of complex 3D. For instance, scenes that have several thousands of characters are computationally expensive as well as memory consuming. To attempt to solve this problem, several techniques must be implemented such as level of detail, illumination, collision avoidance, animation transfer, audio management just to mention a few. Most approximate rendering algorithms ignore perception, or use early vision based perceptual metrics to accelerate performance.
 

Visual perception in computer graphics has received a lot of attention over the past few years. By understanding the limitations of the human visual systems, rendering algorithms can be modified to eliminate unnecessary computations which will produce image with no perceivable difference to the observer. For instance, it is known that observers do not require a physically accurate simulation of the illumination in order to perceive a scene as realistic. Optimizing the rendering stage for any given simulation is a complex process and there are many possible ways that can be used to reduce the detail of a geometric mesh, having different advantages and draw-backs for its implementation within a GPU.

Li Liu

University of Shanghai for Science and Technology, China

Title: Generating graphs from key points for near-duplicate document image matching
Speaker
Biography:

Li Liu is a lecturer at the University of Shanghai for Science and Technology. She received the Ph.D. degree in pattern recognition and intelligent system from East China Normal University, Shanghai, China, in 2015. She was with the Centre for Pattern Recognition and Machine Intelligence (CENPARMI), Concordia University, Montreal, Quebec, Canada, from 2013 to 2014 as a visiting doctoral student. Her research interests include pattern recognition, machine learning and image analysis

Abstract:

We propose a novel near-duplicate document image matching approach. Some keypoints are first detected from the image using the difference-of-Gaussian function. We then present a clustering method, based on which the keypoints are clustered into several groups. The number of clusters is determined automatically according to the distributions of the keypoints.  Afterwards, a graph is generated whose nodes correspond to the obtained clusters and the edges describe the relationships between two clusters. Consequently, the problem of image matching is transformed to graph matching. To compute the similarity between two graphs, we build their association graph and then find the maximum weight clique. A thorough evaluation of the performance of the proposed approach is conducted on two different datasets. Promising experimental results demonstrate the effectiveness and validity of this method.

Yan Xu

Shandong Normal University, P. R. China

Title: Image super-resolution reconstruction
Speaker
Biography:

Dr. Yan Xu has been a lecturer at the School of Communication of Shandong Normal University in China since 2013. She obtained her bachelor's degree in computer science and technology (2005) and master's degree in computer software and theory (2008) from Shandong Normal University, and received her doctorate degree in signal and information processing from Beijing University of Posts and Telecommunications (2013). Dr. Xu is engaged in the research on image processing, machine learning, artificial intelligence, data mining, etc. She has published more than 17 papers, and attended many international conferences and academic exchanges in South Korea, Spain, Canada, Turkey and other countries.

Abstract:

Image super-resolution reconstruction is to restore (or reconstruct) a high-resolution (HR) image (or image sequences) from a series of low-resolution (LR) images. It restores the high-frequency information of images, and increases the pixel density to enhance the spatial resolution. Super-resolution has become a hotspot of research topic in recent years. Not only research individual and institute, but also government and military had invested lots of energy and finance in this attractive task. In this presentation, we intend to review its history and development tendency of super-resolution technology, summarize its common and classical thinking and method, compare with deblurring and enhancement method, recommend some promising techniques, and propose some interesting applications, especially in our real life.

Speaker
Biography:

Yea-Shuan Huang graduated from the Computer Science Department of Concordia University, Canada, in 1994. He retired from Industrial Technology Research Institute (Taiwan) in 2006 and at the same year became an associate
professor of Computer Science & Information Engineering Department in Chung-Hua University. His research interests are in the areas of face recognition, gesture analysis, biometrics authentication, OCR, image analysis, computer vision, and pattern recognition. In 2010 and 2011, his lab own individually the third place and the second place of the Utechzone Machine Vision Prize on performing face detection, face recognition, gender recognition
and age recognition. He have received about 50 patents and performed more than 10 technology transfers to industrial companies and research institutes.

Abstract:

Face recognition is essential for human to communicate with each other, and due to its impotantance and applicability, many methods have been propsoed in the last three decades. Among various research directions, feature extraction plays an important role because it correlates to the recognition accuracy considerably. However, only a few efforts were devoted to extract the personalized land-mark facial points and further use the geometrical information of them to perform recognition. In this paper, a novel feature-point bilateral recognition(FPBR) method for recognizing human faces is proposed. At first, a set of distinct feature points is extracted from a test image. Then, from every training face image each detected feature point finds its best matched position through a block matching operation. Further, from the detected feature points and their matched ones, two geometrical models for describing their structure relationships are constructed respectively. With a geometrical model comparison design, the difference of the two geometric models is computed. Then, by associating the average matching strength and the difference of geometric models, the score of forward recognition is produced. Similarly, the score of backward recognition can be also produced by just detecting feature points from a training image and locating their individual matched ones from the test image. By summing up the scores of both forward and backward recognition, a bilateral recognition score is obtained and is used to produce the final recognition result. Beside the bilateral recognition, the used feature, called local vector pattern (LVP), will also be introduced which encodes various pairwise directions of vector as a facial descriptor to strengthen the structure of micropatterns. Experiments on the famous Feret face databases show that the proposed algorithm produce an excellent recognition result and performs much better than two other well-known face recognition methods.
 

Speaker
Biography:

Jehan Janbi is an assistance professor in Computer Science and Information Technology College at Taif University. She got her bachelor of Computer Science from King Abdul-Aziz University, Jeddah, Saudi Arabia. She started her Academic career journey as TA lab supervisor and research assistance in Computer Science department in Qassim University. She upgraded her academic level and got her Master and PhD from Concordia University, Montreal, Canada. Her research area is in text and font recognition, mainly for Arabic script. She worked on encoding Arabic digital font’s design characteristics into a number composed of several digits where each digit represents specific design characteristics. This will enhance manipulating and searching fonts based on their appearance.

Abstract:

In digital world, there are thousands of digital fonts makes selecting an appropriate font is not an intuitive issue. Designers can search for a font like any other file using general information such as name and file format. But for document design purposes, the design features or visual characteristics of fonts are more meaningful for designers than font file information. Therefore, representing fonts’ design features by searchable and comparable data would facilitate searching and selecting a desirable font. One solution is to represent a font’s design features by a code composed of several digits. This solution has been implemented as a computerized system called PANOSE-1 for Latin script fonts. It is used within several font management tools as an option for ordering and searching fonts based on their design features. It is also used in font replacement processes when an application or an operating system detects a missing font in an immigrant document or website. This research defined a new model, PANOSE-A, to extend PANOSE-1 coverage to support Arabic characters. The model defines eight digits in addition to the first digit of PANOSE-1which indicates the font script and family type. Each digit takes value between 0-15 where each value indicates a specific variation of its represented feature. Two digits of the models describe the common variations of the weight and contrast features, which are two essential features in any font design. Another four digits describe the shape of some strokes that usually vary in their design between fonts, such as the end shape of terminal strokes, the shape of the bowl stroke, the shape of curved stroke and the shape of rounded strokes with enclosed counter. The last two digits describe the characteristics of two important vertical references of the Arabic font design which are tooth and loop heights.

  • Young Researchers Forum

Session Introduction

Jin Wang

Beijing University of Technology, China

Title: Packet loss rate mapped to the quality of experience in the IOT network
Speaker
Biography:

Jin Wang received a Bachelor’s degree in Software Engineering from Beijing University of Chemical Technology, Beijing, China, in 2012.6. And won the National Scholarship in 2010 and won the National Endeavor Fellowship in 2009. She received a master graduate in Computer Application Technology in Shijiazhuang Tiedao University in 2015.1. And published many papers including ISTP, EI and SCI. Participate in the National Natural Science Fund Project. She used to work at the computer center of Navy General Hospital in 2015.4-2015.7 as a intern technician. Participate in Naval Logistics Project and anesthesiaprogram. Now from 2015.4 she is in the school of software engineering, Department of information, Beijing University of Technology, read her PHD, Her research interests are the Internet of things and software engineering and Embedded and  image and video quality assessment in distorting network.

Abstract:

The Internet of things, including Internet technology, including wired and wireless networks. In this paper, we investigate on the QOE and packet loss rate of the network because QOE is important in the network and packet loss rate is the key point in many papers. In order to study the influence of packet loss on the users’ quality of experience QoE and establish the Mapping model of the two when the video transmit in the network, building a NS2 + MyEvalvid simulation platform, by the method of modifying QoS parameters to simulate different degrees of packet loss, focus on the influence of packet loss on QoE and establish the mapping model between them. Experimental results show that, packet loss has a significant influence on Quality of experience. Packet loss rate and the Quality of experience presents a nonlinear relationship, and use Matlab to establish the mapping model, this model’s accuracy is high, easy to operate, can real-time detect packet loss has influence on the user’s quality of experience (QoE). The contribution of this paper is first through research obtained packet loss has a significant effect on the video. Second, based on received the packet loss has a significant effect on QoE study and establish the mapping model of packet loss rate and the user’s quality of experience QoE. Next step is to set up considering network packet loss of video quality evaluation model, on the basis of considering different packet loss rate and different content complexity has effects on QoE which conclude from packet loss has effects on QoE’s part, combine consider other factors such as different packet loss models to establish video quality evaluation model consider the network packet loss, more accurate prediction of QoE is the future work.

Fig. 1 MyEvalvid system structure

Fig. 23  Src13 fitting curve (PSNR) 

Speaker
Biography:

Yoshikatsu Nakajima received his B.E. degree in information and computer science from Keio University, Japan, in 2016. Since 2016, he has been a master student in the Department of Science and Technology and worked as a research assistant of Keio Program for Leading Graduate School at Keio University, Japan. He attended a lot of international and domestic conferences and won five awards in the two years since he started his research. In 2014, he had joined the start-up company Home-tudor Tomonokai as a developer and developed the on-line tutor system by himself using Ruby on Rails. His research interests include augmented reality, SLAM, object recognition, and computer vision.

Abstract:

Camera pose estimation with respect to target scenes is an important technology for superimposing virtual information in augmented reality. However, it is difficult to estimate the camera pose for all possible view angles because feature descriptors such as SIFT are not completely invariant from every perspective. We propose a novel method of robust camera pose estimation using multiple feature descriptor databases generated for each partitioned viewpoint in which the feature descriptor of each keypoint can be almost invariant. Our method estimates the viewpoint class for each input image using deep learning based on the set of training images prepared for each viewpoint class.  We introduce two ways of preparing those images for deep learning and generating databases. In the first method, images are generated by Projection matrix to learn more robustly in the environment by changing those background. The second method uses real images to learn the entire environment around the plane pattern. Through the evaluation result, we confirmed that the number of the correct matches increased and the accuracy of camera pose estimation was improved compared to the conventional method.

Furthermore, we are trying on applying the concept of Viewpoint Class to the field of Object Recognition recently. Object recognition is one of the major research fields in computer vision and has been applied to various fields. In general, conventional methods are not robust in the obstacle and have a problem such that the accuracy is decreased when the camera stagnates at a poor position to the target object. We propose a novel method of object recognition that can be carried in real time by equally dividing the viewpoint around each object in the scene and impartially integrating the Convolutional Neural Network (CNN) outputs from each Viewpoint Class (See Image). We confirmed its effectiveness through experiments.

Speaker
Biography:

Metehan Unal holds a B.Sc. degree (honours) from Computer Engineering Department of Ankara University and now he is pursuing for an M.Sc. degree. He worked as trainee in Turkish Aerospace Industry in 2013. Now, he has been working as a Research Assistant in Ankara University since 2015. His research interests include Augmented Reality, Computer Graphics and Artificial Intelligence. He is also an enthusiastic Android developer.

Abstract:

Statement of the Problem: Augmented Reality (AR) is a view that integrates the real world imagery with computer generated sounds, images or 3D objects. It has been possible by AR to place 3D reconstructions of buildings, which have been subject to wear and tear of thousands of years, on a historic site. In this way, cultural heritage sites can be better explained and handed on to future generations. Physical reconstruction in ruined cultural heritage sites can be financially costly and time consuming. In addition, site can be damaged during physical reconstruction. With state-of-art AR technology, 3D models can be placed in-situ without any damage, while increasing the appeal of the area for tourists and enthusiastic students.

The aim of this study is augmenting the video images received from mobile devices or drones with 3D models of Roman Bath which is one of the important cultural heritage sites in Ankara, Turkey.

Methodology & Theoretical Orientation: 3D model of Roman Bath were generated using reconstruction images drawn by expert archaeologists. Using Unity 3D Game Engine, this model was overlaid to the camera stream which is received from mobile devices such as mobile phones and tablets. The location services provided by these mobile devices were also used to place the model using actual GPS locations.  Furthermore, an AR application was developed for drones to augment camera streams from a top-view, (Figure 1).

Findings: The developed application allows the users to display the models augmented on the camera view. The use of drones in this study brings a new dimension to Augmented Reality by adding a third eye to the user. We name this approach as ‘Distant Augmented Reality’.

Conclusion & Significance: The authors expect that such applications not only provide an entertaining way to learn about history but also preserve cultural heritage sites.