Copyright 2019 © SLIIT. All Rights Reserved. Concept designed and developed by Web Lankan

my-profile

Contact

chamodi.a@sliit.lk

Ms. Chamodi Abisheka

Instructor

Faculty of Computing | Computer Systems Engineering

Career Summary

ACADEMIC QUALIFICATIONS

BSC (HONS) IN INFORMATION TECHNOLOGY SPECIALISING IN CYBER SECURITY- DEPARTMENT OF COMPUTER SYSTEMS AND ENGINEERING, SRI LANKA INSTITUTE OF INFORMATION TECHNOLOGY(SLIIT)-SRI LANKA

HONOURS AND AWARDS

  • Best Performance Award in BSc(Hons) in Information Technology Specializing in Cyber Security
  • Best Paper Award- Security and Technologies Track- ICAC 2021

Research Interests and Memberships

Research Interests

  • Cyber Security
  • Vulnerability Assessment
  • Reverse Engineering
  • Information Security Policies
  • Cryptography

Memberships

Ongoing Research

My Publications

Conference Proceedings

  • P. A. C Abisheka; M. A. F Azra; A. V Poobalan; Janaka Wijekoon; Kavinga Yapa; Mifraz Murthaja   An Automated Solution For Securing Confidential Documents in a BYOD Environment  in 2021 3rd International Conference on Advancements in Computing (ICAC),  year 2022,  Colombo, Sri Lanka SECTION I.Introduction The Bring Your Own Device (BYOD) phenomenon refers to workers providing and using personal devices and applications for corporate work as well as personal work. Unlike traditional top-down adoption, which is often enforced by higher management, the BYOD trend is seen as a bottom-up process or reversed Information Technology(IT) adoption, which begins with workers [1]. The BYOD trend has witnessed a rapid increase in popularity in the present day due to the increase of the usage of mobile phones and other ultra-portable devices. BYOD is particularly important for productivity in organizations, as employees are much more comfortable using their own devices, which they are more familiar with [2]. Other advantages of BYOD include the increase of work accessibility, saving the organizations costs on devices, improving device mobility, and allowing greater flexibility [3]. BYOD has experienced significant modifications and market development since its introduction 1015 years ago, and the industry is projected to be worth 366.95billionby2022.Incomparison,theamountspentin2019wasalittleover186 billion [4]. According to a study conducted by Access Data/CCBJ, almost 70% of companies allow workers to use personal devices in the workplace, and this figure continues to grow [5]. In the context of BYOD, the most important consideration is security. While a bring-your-own-device (BYOD) strategy improves productivity and convenience for companies, it also introduces a variety of security threats to IT systems, infrastructure, and data. With employees using their personal devices for both corporate and personal tasks, it is important that the security of confidential documents in those devices are safe from unauthorized disclosure. This study proposes a BYOD solution; BYODENCE, for the Linux environment that involves provoking advanced scripts (a.k.a. iBots) once official documents are created and accessed. All actions performed on these official documents are tracked and logged to prevent the risk of insider threats or information theft and unauthorized access will be prevented. The solution contains an automatic document classification feature to classify the documents and to control accessibility based on their level of sensitivity; Restricted, High, Moderate, and Low. The low-security level indicates that a particular document does not contain any sensitive data and no risk is associated with its disclosure. The rest of the labels; restricted, high, and moderate, will be used to define distinct levels of sensitive content and to limit the access controls, i.e. read, write, execute, and internal/ external access. The proposed solution focuses on Rich Text Format (a.k.a. RTF) files and the solution is expected to be pertinent for any file type. RTF is a type of text file that can have formatted text and graphics, unlike typical .txt files. The RTF file contains the document text followed by the control information that defines the formatting used in the document; { }. The primary goal of this study is to develop an automated technique for detecting and preventing unauthorized user activities in relation to sensitive organizational documents in industries such as education, finance, and technology. This paper is structured as follows: a review of related work in BYOD technologies is presented in Sec. II, while Sec. III consists of a description of the proposed approach followed by an evaluation of the system in Sec. IV. The paper concludes with Sec. V, which presents a summary and discussion on future work and enhancements. SECTION II.Background BYOD adoption has several benefits that may be emphasized for both individuals and businesses in general. Employees who bring their own devices to work can seamlessly integrate their personal and professional life. This section describes the policy and technological management solutions that have been implemented to address the risks associated with BYOD. A. Network Approach Network access control (NAC) is a zero-trust network access solution that enables organizations to define policies for regulating access to corporate infrastructure by both user-oriented devices and Internet of Things (IoT) devices. NAC can differentiate devices with defined policies and provide access and service across the organizations network. NAC focuses on devices and ignores data accessed or processed by end nodes under its control [6]. B. EDRM Approach Enterprise Digital Rights Management (ERDM) focuses on persistent protection of sensitive information or data by controlling and managing access and usage rights across all endpoints, cloud storage and on-premise storage [7]. C. Virtualization Approach Virtualization is becoming more popular as a means to handle an organizations data in BYOD environments due to its ability to retain a user's desktop environment across sessions and devices [8]. Using a virtualization technique as an endpoint, which may be completely isolated from the organization's network, comes with the added advantage of not posing a risk of data leakage or data storage for assets that are not owned by the firm. This is the most expensive solution to BYOD security issues and is rigid while also lacking flexibility. D. Phone-centered Approach This method protects the device itself, with organizations deploying Mobile Device Management (MDM) solutions. Authentication and containerization technologies help separate organizational data and personal data on mobile devices. However, these approaches are generally preferred on corporate-owned devices or in environments requiring high degrees of security [9]. E. Frameworks and Policy-Based Approach BYOD policies consist of a comprehensive and acceptable usage policy, password policy and Information Security Management System (ISMS) policy. The BYOD policy is designed with the user and the environment in mind. When organizational data is stored on a device, the IT department must maintain a high level of visibility over the devices and their use. [10] Building a strong data loss prevention system is necessary to control or stop the flow of critical information such as Personal Identifiable Information (PII) and organizational information. As a result, organizations require a comprehensive data-privacy management system, framework, policy, or technology to protect sensitive data. [11]. Although the aforementioned approaches address some document protection mechanisms for BYOD environments, all of them apply protection to every document that exists in the user device. Documents that are not sensitive at all may also exist in these systems alongside extremely sensitive documentsof which the Confidentiality Integrity and Availability (CIA) triad must always be protected. Thus, a sensitivity classification of documents is crucial. When there is a clear-cut definition for document sensitivity, protection mechanisms can only be utilized to secure those documents, and this will ensure that resources are not wasted on securing non-sensitive documents. Moreover, data is ever-increasing in the corporate environment. Therefore, more data will gradually be added to each category in addition to trained data. Thus, it is essential to utilize available resources to retrain new data. Accordingly, to fulfill the requirement of automated detection and prevention of unauthorized access on confidential documents, this paper introduces a comprehensive solution that includes the detection and classification of organizational documents, an incremental training process, a role-based access control mechanism, an auditing mechanism, centralized user management, and a session-based access control mechanism. SECTION III.Methodology BYODENCE consists of 4 dominant components: the agent, the iBot, the classification engine and the central server (WSO2 Identity Server). The agent handles all the internal logic such as user sessions, encryption/ decryption of files, role validation, flag injection, and file classification, invoking the classification engine. The iBot is triggered whenever a user action is performed on a file and invokes the agent to grant/ deny access to the file and to classify the file when a file is updated/created. The agent, the iBot, and the classification engine are implemented in a way that it accepts a file path and not the file. This is because all three components are intended to run on the end-user's device, along with concerns of internet/network outages, issues in uploading large files, sharing the user-owned file on the internet, etc. Nonetheless, the agent will have to communicate with the centralized server once to authenticate the user, and the agent is capable of running offline until the user session expires. A. The Agent As illustrated in Fig. 1, agent comprises of 2 endpoints: classify and open. The flow of the classify endpoint is illustrated in purple whilst the open endpoint flow is illustrated in red. The classify endpoint accepts a file path that classifies the file, injects the flag into the file, and encrypts the file content if required. The agent will validate the session prior to performing any actions, reading the session data file which includes a self-contained JSON Web Token (JWT) access token. If the session data file does not exist or the access token is expired, the user will be prompted to authenticate using the WSO2 Identity Server, and a new access token will persist on the session data file (this will be a file created in the home directory of the user, within a hidden directory which is used to store all the file-based configurations related to the system). As the session is stored as a self-contained access token which includes user information such as role, the agent- server communication is massively reduced. Upon successful authentication, if the file is already classified, the admin will be notified asynchronously on the condition that it is identified as an external file. Further, if the file is detected as an external file, sensitivity-based access control will be applied to the file. The file will then be classified invoking the classification engine and the flag will be injected into the file. The flag is the indicator of the file which specifies the agent identifier as the hardware address of the device used to classify the document initially (the agent that owns the file), as well as the sensitivity level of the file content classified by the classification engine, the role required to access the file, and the SHA256 hash of the document text in the format: {{AgentID;Sensitivity;Role;SHA256}}, aligning with the RTF file in a way that kernel level labelling (such as Selinux) will not be interfered. Fig. 1. Agent Implementation Diagram illustrating the flow of the endpoints of agent: classify and open Show All The flag can be injected into the RTF file either to the beginning of the file escaping with backslashes or to the end of the file. This should be done within curly braces, and an escaping is not required in the latter method. Failure to follow the format will result in the flag being displayed along with the contents of the RTF file. Considering the maintenance of the file size and the simplicity of processing, the latter method which does not require escaping (injecting to the end of the file), was selected to be the optimal way of flag injection. Once the flag is injected into the document, depending on the level of sensitivity, the file will be encrypted. If the sensitivity level is low, then the file is not required to be encrypted as the document does not contain any sensitive data. The file will be encrypted for all other sensitivity levels. Once the file content is encrypted, the document will be signed using Digital Signature Algorithm (DSA), which helps to preserve the integrity of the file content and the flag. The open endpoint accepts a file path that grants access to the encrypted files by decrypting the file content which is classified using the classify endpoint. Similar to the classify endpoint, this will validate the user session prior to performing any tasks. Thereafter the digital signature (which is at the end of the document) is verified by comparing a computed digital signature to the signature that is injected into the file. Upon successful verification, the agent checks whether the document is encrypted. If it is not encrypted, meaning that the sensitivity is low, the user is allowed to open the document. However, if the file is encrypted, meaning the file is sensitive, the agent reads the injected flag. If the user roles which are returned by the JWT token upon session establishment contain the role injected into the file (if the user has the role required by the file), the file will be decrypted. The user will then be granted access to open the file and perform read, write actions on the file. The file will not be decrypted otherwise. Similar to the classify endpoint, the admin will be notified if an external file or unauthorized file access is attempted. B. iBot The iBot scripts utilize an open-source tool called "Incron," which is a daemon similar to the cron that monitors file system events. The incron is provided with the path that specifies the directory to which the incron will listen, the mask that specifies the file system event to be monitored, and the command that specifies the action or script to be executed. The iBot acts as the command for the incron tool which gets triggered for IN_MODIFY and IN_OPEN events. As depicted in Fig. 2, when a document is created or updated through CLI or GUI, the "IN_MODIFY" action will be triggered and the related command, which is the iBot, will be executed. The iBot will then make a request to the agent to classify the file, providing the file path. The agent will inject or update the flag to the file and encrypt the file based on the sensitivity level as depicted in section III (A). Correspondingly, when a file is opened through CLI or GUI, the "IN_OPEN" action will be triggered, and the iBot will make a request to the agent to open the file, providing the file path. The iBot will get triggered for IN_MODIFY and IN_OPEN events in all the file types, including hidden files such as swap and lock files. However, after identifying the file type, the iBot will initiate the request to the agent only for RTF file types. While the iBot triggers for IN_MODIFY and IN_OPEN events, it was observed that when the iBot is triggered once for a particular event, it initiates an endless loop due to several external factors as depicted in Fig. 3. When a file is modified by the user, the IN_MODIFY event gets triggered and the iBot will initiate a request to the classify endpoint of the agent, providing the file path. The agent will then read the file content and inject the flag and encrypt the file according to the sensitivity level. However, as the agent reads the file and updates the file at the same time, it results in other IN_ACCESS and IN_MODIFY events, hence an endless iteration. Correspondingly, when a file is accessed by the user, the IN_ACCESS event gets triggered and the iBot will initiate a request to the open endpoint of the agent, providing the file path. The agent will then read the file content and decrypt the file if required. In the same way, as the agent reads and updates the file at the same time, it results in other IN_ACCESS and IN_MODIFY events, hence an endless loop. Fig. 2. Bot implementation diagram illustrating the communication between the components. Show All As the incron tool intends to trigger for any read-write actions of the file, and as the read-write actions carried out within the agent are inevitable, impeding the unexpected triggering of events or execution of the iBot is unfeasible. Hence, in order to implement the loop-free methodology, two blocking states were defined on the iBot: the classify block state and the access block state; these prevent the initiation of the request to the agent in unexpected scenarios, and therefore the iterations will be avoided. The classify block state establishes that the triggering of an IN_MODIFY event for an already classified file without any additional modification should be terminated immediately, without invoking the classify endpoint of the agent. If the IN_MODIFY event is triggered for a file where the hash value of the particular file content and the injected hash value which is available on the flag of the file are the same, the iBot identifies the particular event as the classify block state. If both hash values are the same, it implies that the file content is not modified. Thus, by detecting it as the classify block state, the invocation of the classify endpoint of the agent will be avoided, which will then lead to the avoidance of the IN_MODIFY iteration. The access block state determines that the triggering of an IN_ACCESS event for a file that is already being accessed should be terminated immediately, without invoking the open endpoint of the agent. The iBot uses a mechanism to identify the access block statewhere it maintains a list of all the files being accessed (a.k.a access-list) which will then lead to the detection of an access block state, if the IN_ACCESS event is triggered for a file which is in the particular list. The file entry will be added to the access-list when a file is opened for the first time and will be removed after the file is saved. However, once the file is saved, an additional IN_MODIFY event will be triggered due to the file modification made by the agent for flag injection, signature creation, and encryption as depicted in Fig. 3. This will be detected as the classify block state by the agent, reading the injected hash and calculating the current hash of the file content. The detection of the classify block state by the iBot requires accessing the file and this will trigger another IN_ACCESS event as depicted in Fig. 3. Therefore, if the file entry is removed from the access-list immediately after the file is saved, the additional IN_ACCESS event will not be detected as an access block state and hence, the open endpoint of the agent will be invoked, and the file will be decrypted immediately. That is, if the file is encrypted by the agent, the file will get decrypted immediately due to the unexpected IN_ACCESS event which occurs because of the iBot. In order to avoid this, the file entry will be removed from the access-list after a second, making the particular thread sleep. Therefore, at the time the iBot is accessing the file content to identify the hash, the particular file will be in the access-list and thus, the additional IN_ACCESS event will be detected as an access block state. Fig. 3. Lock handling architecture of the iBot Show All C. The Classification Engine The data sensitivity levels are defined to be: Restricted, High, Moderate, and Low according to the U-M institutional classification standard [12]. Disclosure of data relating to the Payment Card Industry (PCI), and the Federal Information Security Management Act (FISMA) can cause severe harm to individuals and/or organizations, thus that data is categorized as Restricted. Data such as loan details, health plans, and social security numbers where disclosure could cause significant harm, are categorized as High. Data such as intellectual property, contracts, and human resource details where exposure could cause limited harm to individuals and/or organizations are categorized as Moderate. Data where disclosure has little to no risk to individuals and/or organizations are classified as Low [13]. The dataset is prepared by collecting data individually for each sensitivity level from a Google Kaggle and DataWorld dataset search. The text samples are arranged against the classification category based on their content. 500 text samples were obtained for each sensitivity category, making the total number of samples to be 2000. The collected data may contain different types of content such as text, numbers, and special strings. Therefore, in an effort to improve the steadiness of the data, all digits and characters are transformed into uniform keywords. That is, all credit card numbers (ex: 1111 2222 3333 4444) are recorded as Credit Card. In a similar way, all acronyms and similar words are converted to related uniform words. That is tel., tele, hotline, phone and phone number are transformed into Phone Number. This was achieved by creating regular expressions for all generic digits and common words, and the finalized dataset is validated to be free of digits. This process has been followed since; as an example, if the dataset has multiple credit card numbers, the different credit card numbers will be considered to be different tokens, even though they are expected to be identical tokens. Moreover, as the documents are created by the end users, a single word could be written in multiple ways. However, assigning different tokens for the differently written words will reduce the steadiness of the data; thus it was decided to replace this with a single suitable word to convert to a single type of token. Further, the data samples were broken up using whitespace as a delimiter to delete stop words using the Natural Language Toolkit (NLTK) stop words English library. Finally, each row in the data frame was arranged as text and labelled in pairs. The preprocessed data was sent for training, using the "Crme" python machine learning library due to its rapid training and prediction speed [14]. It implements a number of popular algorithms for classification, regression, feature selection, and feature preprocessing and also supports incremental learning [15]. Before starting the training process, features were extracted from the dataset. Therefore, a pipeline was created to build a model with two functions: feature extraction and training. A pair of text and labels were separated using commas as delimiters and provided to the training algorithm in a feature extraction phase. Once the features and the labels are obtained from the dataset, 80% of the data is provided to the training phase and the obtained feature set (label and text segment) was trained with Gaussian Naive Bayes (GNB). The trained model was saved and re-accessed for the sensitivity prediction of new files. The rest of the data was used to calculate the accuracy of the trained model. In feature extraction, a pair of text and labels were separated using commas as delimiters and provided to the training algorithm. To accomplish this, the text-label pairs we obtained earlier were divided into 80% training portions and 20% validation portions. Next extracted features were provided to the training model created, line by line. The obtained feature set (label and text segment) was trained with Gaussian Naive Bayes (GNB). The trained model was saved and re-accessed for the sensitivity prediction of new files. Whenever a file is received by the classification API, the same process used under the pre-processing phase of training will be performed for data preprocessing; numbers will be replaced by the word which indicates its type using a regular expression match, the acronyms and similar words will be replaced with related uniform words, and the stop words will be removed. The filtered features will then go through the prediction function, which uses the trained model classification method, and sensitivity will be returned. Fig. 4. The Classification Engine architectural diagram, illustrating the training, fine tuning, prediction and incremental training. Show All As depicted in Fig. 4, incremental learning or retraining is effectuated based on user feedback. When a user is prompted for the accuracy of the sensitivity classification of a file, the user can rate it. Based on the user satisfaction rank, the file, along with the expected sensitivity level, will be forwarded to the incremental learning API. The received data-label pair will go through the data preprocessing phase and the existing model will be retrained, performing single line training in the incremental model; while the existing model will be updated. SECTION IV.Results and discussion The agent takes an average time of 717.6 milliseconds for the complete classification process, which includes validating the session, classifying the file and injecting the flags. This also handles overwriting the flag for already classified files and recalculating the digital signature when decrypting the files. An average of 428.3 milliseconds is taken when a classified file is being opened. This includes user authentication and authorization, verification of the digital signature, decrypting file content, and displaying decrypted content. TABLE I. Sensitivity classification accuracy for different algorithms From Table I, we can observe that the training accuracy of GNB is higher than LR (Logistic Regression), KNN (K Nearest Neighbors), RFC (Random Forest Classifier), and SVM (Support Vector Machine). Therefore, GNB was selected for the initial dataset training and incremental training process. In the training process, the model achieved a 79% accuracy and was able to train the entire dataset within 43.52 milliseconds. On the classification of a given document, the classification engine predicts the sensitivity label within 4 milliseconds, and incremental training was done in 44.25 milliseconds. TABLE II. Sensitivity classification accuracy for different sensitivity levels As mentioned in Table II, there is a high level of accuracy for the classification of Moderate and Restricted documents. It was also observed that the level of accuracy for flag injection was 100% for all 4 sensitivity categories. However, the accuracy of Low and High document classification needs to be increased, as there is a potentially dangerous impact that can occur through incorrectly predicted labels since unencrypted and highly sensitive documents are vulnerable to data breaches. The responsiveness of the iBot with regard to the file system events and the action carried out, which is derived from testing 40 documents consecutively, was recorded to be 100% accurate. The incron intends to invoke the iBot for IN_MODIFY and IN_ACCESS file system events, and the iBot intends to classify, encrypt or decrypt the file with the help of the agent. The log files are used to validate the accuracy of the incron and iBot, where the incron uses the syslog and the iBot uses its own log file to log all the actions carried out. Nevertheless, the incron triggers for all file system events, including swap and lock files, and these events are dropped in the iBot without carrying out any further actions. Moreover, due to the way the incron is triggered for file system events, as mentioned in section III(B), with the iBot executing infinitely at a single event, an endless loop of execution will be resulted. These loops are prevented in the iBot using access and classify block states. However, since the incron works on the user-space, the iBot is invoked asynchronously. Therefore, the decrypted content of the file will not be displayed in the first attempt and reopening of the file is required to view the decrypted content. SECTION V.Conclusion & future work The growing popularity of BYOD makes it evident that the Bring Your Own Device concept is here to stay. Thus, effective mechanisms need to be in place to secure the BYOD framework. This paper demonstrates an automated system that can be used for the classification and securing of sensitive organizational documents by preserving the CIA triad of the said documents. The BYODENCE uses a symmetric encryption algorithm for the encryption and decryption of sensitive documents, where each user is assigned a key. However, this approach has security issues, as encryption keys are stored in the server. Further, this cannot be addressed with asymmetric cryptography as multiple users must be able to decrypt a single file. Hence, this can be addressed using the Kerberos Authentication Protocol, which will act as a trusted 3rd party that distributes keys to the agents. When considering complications that may arise when BYODENCE is installed on a machine with a virus guard, as virus guards work by comparing known file signatures with virus signatures, BYODENCE will not be detected as a virus. Furthermore, in order to maintain the integrity of the agent, the agent will be authorized during the session establishment using the MAC address of the machine that corresponds to the user credentials which prevents the agent from getting compromised and replaced by a rogue agent through an attacker. Currently, only role-based file access is implemented. This can be further enhanced with sensitivity-based file access, which verifies the file's sensitivity before granting access such as read and write. Additionally, future developments include a malicious file check function that verifies the integrity and legitimacy of the files prior to them being communicated to the agents endpoints. Predominantly, BYODENCE can be substantially improved by overriding the kernel-level methods, instead of using incron, which will eventually provide solutions for decrypting the file before opening and implementing the sensitivity-based access control in a much more efficient way. Other future improvements include broadening the supported operating systems such as Windows and Mac OS. This can also be expanded to support other common file types such as docx and pdf. Authors Figures References Keywords Metrics More Like This Security and privacy challenge in Bring Your Own Device environment: A Systematic Literature Review 2016 International Conference on Information Management and Technology (ICIMTech) Published: 2016 User Information Security Behavior Towards Data Breach in Bring Your Own Device (BYOD) Enabled Organizations - Leveraging Protection Motivation Theory SoutheastCon 2018 Published: 2018 Show More,    IEEE,    9-11 Dec. 2021,  8/3/2022

TOP
en_USEnglish
si_LKSinhala ta_INTamil en_USEnglish
logo