• Home
  • Seyed Hossein Khasteh
  • OpenAccess
    • List of Articles Seyed Hossein Khasteh

      • Open Access Article

        1 - Improving Efficiency of Finding Frequent Subgraphs in Graph Stream Using gMatrix Summarization
        masoud kazemi Seyed Hossein Khasteh hamidreza rokhsati
        In many real-world frameworks, dealing with huge domains of nodes and online streaming edges are unavoidable. Transportation systems, IP networks and developed social medias are quintessential examples of such scenarios. One of the most important open problems while dea More
        In many real-world frameworks, dealing with huge domains of nodes and online streaming edges are unavoidable. Transportation systems, IP networks and developed social medias are quintessential examples of such scenarios. One of the most important open problems while dealing with massive graph streams are finding frequent sub-graph. There are some approaches such as count-min for storing the frequent nodes, however performing these methods will result in inaccurate modelling of structures based on the main graph. Having said that, gMatrix is one of the recently developed approaches which can fairly save the important properties of the main graph. In this approach, different hash functions are utilized to store the basis of streams in the main graph. As a result, having the reverse of the hash functions will be extremely useful in calculation of the frequent subgraph. Though gMatrix mainly suffer from two problems. First, they are not really accurate due to high compression rate of the main graph and second, the complexity of returning a query is high. In this thesis, we have presented a new approach based on gMatrix which can reduce the amount of memory usage as well as returning the queries in less amount of time. The main contribution of the introduced approach is to reduce the dependency among the hash functions. This will result in less conflicts while creating the gMatrix later. In this study we have used Cosine Similarity in order to estimate the amount of dependency and similarity among hash functions. Our experimental results prove the higher performance in terms of algorithm and time complexity. Manuscript profile
      • Open Access Article

        2 - Fuzzy Multicore Clustering of Big Data in the Hadoop Map Reduce Framework
        Seyed Omid Azarkasb Seyed Hossein Khasteh Mostafa  Amiri
        A logical solution to consider the overlap of clusters is assigning a set of membership degrees to each data point. Fuzzy clustering, due to its reduced partitions and decreased search space, generally incurs lower computational overhead and easily handles ambiguous, no More
        A logical solution to consider the overlap of clusters is assigning a set of membership degrees to each data point. Fuzzy clustering, due to its reduced partitions and decreased search space, generally incurs lower computational overhead and easily handles ambiguous, noisy, and outlier data. Thus, fuzzy clustering is considered an advanced clustering method. However, fuzzy clustering methods often struggle with non-linear data relationships. This paper proposes a method based on feasible ideas that utilizes multicore learning within the Hadoop map reduce framework to identify inseparable linear clusters in complex big data structures. The multicore learning model is capable of capturing complex relationships among data, while Hadoop enables us to interact with a logical cluster of processing and data storage nodes instead of interacting with individual operating systems and processors. In summary, the paper presents the modeling of non-linear data relationships using multicore learning, determination of appropriate values for fuzzy parameterization and feasibility, and the provision of an algorithm within the Hadoop map reduce model. The experiments were conducted on one of the commonly used datasets from the UCI Machine Learning Repository, as well as on the implemented CloudSim dataset simulator, and satisfactory results were obtained.According to published studies, the UCI Machine Learning Repository is suitable for regression and clustering purposes in analyzing large-scale datasets, while the CloudSim dataset is specifically designed for simulating cloud computing scenarios, calculating time delays, and task scheduling. Manuscript profile