"C:\\Users\\GAMER\\AppData\\Roaming\\Python\\Python38\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n",
"Warming up PyWSD (takes ~10 secs)... took 4.643778324127197 secs.\n"
"[nltk_data] Package stopwords is already up-to-date!\n"
]
}
],
"source": [
"#Sets random number generator seeds for Python's random, NumPy's np.random, and PyTorch's CPU and GPU\n",
"def set_seed(seed: int):\n",
" random.seed(seed)\n",
" np.random.seed(seed)\n",
" torch.manual_seed(seed)\n",
" torch.cuda.manual_seed_all(seed)\n",
"\n",
"# Downloads NLTK resources and suppresses warnings before setting a seed value\n",
"nltk.download('punkt')\n",
"nltk.download('brown')\n",
"nltk.download('wordnet')\n",
"nltk.download('stopwords')\n",
"warnings.filterwarnings('ignore')\n",
"set_seed(42)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" A network is a large system consisting of many similar parts that are connected together to allow movement or communication along the parts, or\n",
"between the parts and a control centre. There are different types of networks available. Telecommunication networks , Television or radio network\n",
",Transport networks , Social networks. a digital telecommunications network, which allows nodes to share resources. In computer networks, computing\n",
"devices exchange data with each other using connections between nodes (data links). A network is a large system consisting of many similar parts that\n",
"are connected together to allow movement or communication along the parts, or between the parts and a control centre. There are different types of\n",
"networks available. Telecommunication networks , Television or radio network ,Transport networks , Social networks. a digital telecommunications\n",
"network, which allows nodes to share resources. In computer networks, computing devices exchange data with each other using connections between\n",
"nodes (data links). The Internet is the global system of interconnected computer networks that use the Internet protocol suite to link devices\n",
"worldwide. ▪It is a network of networks ▪Consists of private, public, academic, business, and government networks of local to global scope. ▪Linked\n",
"by a broad array of electronic, wireless, and optical networking technologies. system of rules that allow two or more entities of a communications\n",
"system to transmit information (wiki) ▪ the formal system of rules for correct behavior on official occasions (Cambridge Extensible markup language\n",
"is Designed to store and transport data ,Both human- and machine\u0002readable (self descriptive) , Often used for distributing data over networks ,Used by\n",
"may other tools like protocols. The main and the only component of XML is called an element . An element has 3 components 1. Start tag 2. Body 3.\n",
"End tag. An element has a name . Element names are case-sensitive .Element names must start with a letter or underscore . This is the XML\n",
"declaration ◦ Provides the instructions for the processor to understand the details of the XML file ◦ Encoding attribute indicates the character set\n",
"◦ UTF-8 = Unicode Transformation Format (with 8-bit blocks to represent a character) .An element may have attribute(s) ◦ Attributes describe the\n",
"element .Attribute value is always quoted (either single or double quote). Computer based systems can be mainly divided into 2 types, according to the\n",
"distribution of the components. Standalone Computer System - All the components are executed within a single device, Do not need a network, Usually\n",
"one or tightly coupled set of technologies are used to develop (JAVA, .NET). Distributed system- The components are distributed and executed in\n",
"multiple devices, Need a network, Multiple and loosely coupled set of technologies are used to develop (HTML+CSS+JS + PHP). Client-server architecture\n",
"(3-tier)- 3-tier architecture is used, when there is a need for data persistence and also to separate the application logic from the data . This can\n",
"be seen as an extension of 2-tier architecture. Client-server architecture (n-tier)- When there is a need for further separation and distribution of\n",
"the components, more tiers can be added and extend the 2-tier or 3-tier architecture into an n-tier architecture. A network is a large system\n",
"consisting of many similar parts that are connected together to allow movement or communication along the parts, or between the parts and a control\n",
"centre. There are different types of networks available. Telecommunication networks , Television or radio network ,Transport networks , Social\n",
"networks. a digital telecommunications network, which allows nodes to share resources. In computer networks, computing devices exchange data with\n",
"each other using connections between nodes (data links). The Internet is the global system of interconnected computer networks that use the Internet\n",
"protocol suite to link devices worldwide. ▪It is a network of networks ▪Consists of private, public, academic, business, and government networks of\n",
"local to global scope. ▪Linked by a broad array of electronic, wireless, and optical networking technologies. system of rules that allow two or more\n",
"entities of a communications system to transmit information (wiki) ▪ the formal system of rules for correct behavior on official occasions\n",
"(Cambridge Extensible markup language is Designed to store and transport data ,Both human- and machine\u0002readable (self descriptive) ,Often used for\n",
"distributing data over networks ,Used by may other tools like protocols. The main and the only component of XML is called an element . An element has\n",
"3 components 1. Start tag 2. Body 3. End tag. An element has a name . Element names are case-sensitive .Element names must start with a letter or\n",
"underscore . This is the XML declaration ◦ Provides the instructions for the processor to understand the details of the XML file ◦ Encoding\n",
"attribute indicates the character set ◦ UTF-8 = Unicode Transformation Format (with 8-bit blocks to represent a character) .An element may have\n",
"attribute(s) ◦ Attributes describe the element .Attribute value is always quoted (either single or double quote). Computer based systems can be\n",
"mainly divided into 2 types, according to the distribution of the components. Standalone Computer System - All the components are executed within a\n",
"single device, Do not need a network, Usually one or tightly coupled set of technologies are used to develop (JAVA, .NET). Distributed system- The\n",
"components are distributed and executed in multiple devices, Need a network, Multiple and loosely coupled set of technologies are used to develop\n",
"(HTML+CSS+JS + PHP). Client-server architecture (3-tier)- 3-tier architecture is used, when there is a need for data persistence and also to separate\n",
"the application logic from the data . This can be seen as an extension of 2-tier architecture. Client-server architecture (n-tier)- When there is a\n",
"need for further separation and distribution of the components, more tiers can be added and extend the 2-tier or 3-tier architecture into an n-tier\n",
"architecture. DNS is a network, which consists of Domain Name Servers . DNS helps to map the domain name to the IP address. Unified Resource\n",
"Identifier (URI) 18 . URI is a string of characters designed for unambiguous identification of resources. URI is extensible via the URI scheme.\n",
"Unified Resource Name(URN) is a persistent, location-independent identifier. Website can be seen as a collection of web pages with static content\n",
".Early websites were entirely developed only using HTML – Nowadays, some server-side application components and databases are used to dynamically\n",
"generate the content – However, still the content is not user tailored. Web application is a single page or a collection of web pages, with\n",
"interactive components to dynamically generate the content E-commerce is a large domain, which covers many related concepts like – Internet\n",
"marketing – Electronic fund transfer – Online transaction processing. E-commerce systems provide online buying and selling over the internet. There\n",
"is a large variety of types of ecommerce systems – Online goods/soft items(software, e-books, videos) – Retail services (travel, food, cloths) –\n",
"Marketing services (advertising, auctions) – Customer services (help centers, online banking). Advantages of e-commerce . To businesses – After the\n",
"capital cost, maintenance cost is low – Global customers – Increased market share. Disadvantages of e-commerce. To businesses – For physical items,\n",
"storing and distributing is needed – Need to update the system frequently – Depends on the power and the internet. DNS is a network, which consists\n",
"of Domain Name Servers . DNS helps to map the domain name to the IP address. Unified Resource Identifier (URI) 18 . URI is a string of characters\n",
"designed for unambiguous identification of resources. URI is extensible via the URI scheme. Unified Resource Name(URN) is a persistent, location-\n",
"independent identifier. Website can be seen as a collection of web pages with static content .Early websites were entirely developed only using HTML\n",
"– Nowadays, some server-side application components and databases are used to dynamically generate the content – However, still the content is not\n",
"user tailored. Web application is a single page or a collection of web pages, with interactive components to dynamically generate the content\n",
"E-commerce is a large domain, which covers many related concepts like – Internet marketing – Electronic fund transfer – Online transaction processing.\n",
"E-commerce systems provide online buying and selling over the internet. There is a large variety of types of ecommerce systems – Online goods/soft\n",
"centers, online banking). Advantages of e-commerce . To businesses – After the capital cost, maintenance cost is low – Global customers – Increased\n",
"market share. Disadvantages of e-commerce. To businesses – For physical items, storing and distributing is needed – Need to update the system\n",
"frequently – Depends on the power and the internet The Internet is the global system of interconnected computer networks that use the Internet\n",
"protocol suite to link devices worldwide. ▪It is a network of networks ▪Consists of private, public, academic, business, and government networks of\n",
"local to global scope. ▪Linked by a broad array of electronic, wireless, and optical networking technologies. system of rules that allow two or more\n",
"entities of a communications system to transmit information (wiki) ▪ the formal system of rules for correct behavior on official occasions\n",
"(Cambridge Extensible markup language is Designed to store and transport data ,Both human- and machine\u0002readable (self descriptive) ,Often used for\n",
"distributing data over networks ,Used by may other tools like protocols. The main and the only component of XML is called an element . An element has\n",
"3 components 1. Start tag 2. Body 3. End tag. An element has a name . Element names are case-sensitive .Element names must start with a letter or\n",
"underscore . This is the XML declaration ◦ Provides the instructions for the processor to understand the details of the XML file ◦ Encoding\n",
"attribute indicates the character set ◦ UTF-8 = Unicode Transformation Format (with 8-bit blocks to represent a character) .An element may have\n",
"attribute(s) ◦ Attributes describe the element .Attribute value is always quoted (either single or double quote). Computer based systems can be\n",
"mainly divided into 2 types, according to the distribution of the components. Standalone Computer System - All the components are executed within a\n",
"single device, Do not need a network, Usually one or tightly coupled set of technologies are used to develop (JAVA, .NET). Distributed system- The\n",
"components are distributed and executed in multiple devices, Need a network, Multiple and loosely coupled set of technologies are used to develop\n",
"(HTML+CSS+JS + PHP). Client-server architecture (3-tier)- 3-tier architecture is used, when there is a need for data persistence and also to separate\n",
"the application logic from the data . This can be seen as an extension of 2-tier architecture. Client-server architecture (n-tier)- When there is a\n",
"need for further separation and distribution of the components, more tiers can be added and extend the 2-tier or 3-tier architecture into an n-tier\n",
"architecture. DNS is a network, which consists of Domain Name Servers . DNS helps to map the domain name to the IP address. Unified Resource\n",
"Identifier (URI) 18 . URI is a string of characters designed for unambiguous identification of resources. URI is extensible via the URI scheme.\n",
"Unified Resource Name(URN) is a persistent, location-independent identifier. Website can be seen as a collection of web pages with static content\n",
".Early websites were entirely developed only using HTML – Nowadays, some server-side application components and databases are used to dynamically\n",
"generate the content – However, still the content is not user tailored. Web application is a single page or a collection of web pages, with\n",
"interactive components to dynamically generate the content E-commerce is a large domain, which covers many related concepts like – Internet\n",
"marketing – Electronic fund transfer – Online transaction processing. E-commerce systems provide online buying and selling over the internet. There\n",
"is a large variety of types of ecommerce systems – Online goods/soft items(software, e-books, videos) – Retail services (travel, food, cloths) –\n",
"Marketing services (advertising, auctions) – Customer services (help centers, online banking). Advantages of e-commerce . To businesses – After the\n",
"capital cost, maintenance cost is low – Global customers – Increased market share. Disadvantages of e-commerce. To businesses – For physical items,\n",
"storing and distributing is needed – Need to update the system frequently – Depends on the power and the internet\n",
"\n",
"\n"
]
}
],
"source": [
"text = \"\"\"\n",
"A network is a large system consisting of many similar parts that are connected together to allow \n",
"movement or communication along the parts, or between the parts and a control centre. There are \n",
"different types of networks available. Telecommunication networks , Television or radio network \n",
",Transport networks , Social networks. a digital telecommunications network, which allows nodes to \n",
"share resources. In computer networks, computing devices exchange data with each other using \n",
"connections between nodes (data links). A network is a large system consisting of many similar parts that are connected together to allow \n",
"movement or communication along the parts, or between the parts and a control centre. There are \n",
"different types of networks available. Telecommunication networks , Television or radio network \n",
",Transport networks , Social networks. a digital telecommunications network, which allows nodes to \n",
"share resources. In computer networks, computing devices exchange data with each other using \n",
"connections between nodes (data links). \n",
"The Internet is the global system of interconnected computer networks that use the Internet protocol \n",
"suite to link devices worldwide. ▪It is a network of networks ▪Consists of private, public, academic, \n",
"business, and government networks of local to global scope. ▪Linked by a broad array of electronic, \n",
"wireless, and optical networking technologies. system of rules that allow two or more entities of a \n",
"communications system to transmit information (wiki) ▪ the formal system of rules for correct behavior \n",
"on official occasions (Cambridge\n",
"Extensible markup language is Designed to store and transport data ,Both human- and machine\u0002readable (self descriptive) ,\n",
"Often used for distributing data over networks ,Used by may other tools like \n",
"protocols. The main and the only component of XML is called an element . An element has 3 \n",
"components 1. Start tag 2. Body 3. End tag. An element has a name . Element names are case-sensitive \n",
".Element names must start with a letter or underscore . \n",
"This is the XML declaration ◦ Provides the instructions for the processor to understand the details of the \n",
"XML file ◦ Encoding attribute indicates the character set ◦ UTF-8 = Unicode Transformation Format (with \n",
"8-bit blocks to represent a character) .An element may have attribute(s) ◦ Attributes describe the \n",
"element .Attribute value is always quoted (either single or double quote).\n",
"Computer based systems can be mainly divided into 2 types, according to the distribution of the \n",
"components. Standalone Computer System - All the components are executed within a single device, Do \n",
"not need a network, Usually one or tightly coupled set of technologies are used to develop (JAVA, .NET).\n",
"Distributed system- The components are distributed and executed in multiple devices, Need a network,\n",
"Multiple and loosely coupled set of technologies are used to develop (HTML+CSS+JS + PHP).\n",
"Client-server architecture (3-tier)- 3-tier architecture is used, when there is a need for data persistence \n",
"and also to separate the application logic from the data . This can be seen as an extension of 2-tier \n",
"architecture. Client-server architecture (n-tier)- When there is a need for further separation and \n",
"distribution of the components, more tiers can be added and extend the 2-tier or 3-tier architecture into \n",
"an n-tier architecture.\n",
"A network is a large system consisting of many similar parts that are connected together to allow \n",
"movement or communication along the parts, or between the parts and a control centre. There are \n",
"different types of networks available. Telecommunication networks , Television or radio network \n",
",Transport networks , Social networks. a digital telecommunications network, which allows nodes to \n",
"share resources. In computer networks, computing devices exchange data with each other using \n",
"connections between nodes (data links). \n",
"The Internet is the global system of interconnected computer networks that use the Internet protocol \n",
"suite to link devices worldwide. ▪It is a network of networks ▪Consists of private, public, academic, \n",
"business, and government networks of local to global scope. ▪Linked by a broad array of electronic, \n",
"wireless, and optical networking technologies. system of rules that allow two or more entities of a \n",
"communications system to transmit information (wiki) ▪ the formal system of rules for correct behavior \n",
"on official occasions (Cambridge\n",
"Extensible markup language is Designed to store and transport data ,Both human- and machine\u0002readable (self descriptive) ,Often used for distributing data over networks ,Used by may other tools like \n",
"protocols. The main and the only component of XML is called an element . An element has 3 \n",
"components 1. Start tag 2. Body 3. End tag. An element has a name . Element names are case-sensitive \n",
".Element names must start with a letter or underscore . \n",
"This is the XML declaration ◦ Provides the instructions for the processor to understand the details of the \n",
"XML file ◦ Encoding attribute indicates the character set ◦ UTF-8 = Unicode Transformation Format (with \n",
"8-bit blocks to represent a character) .An element may have attribute(s) ◦ Attributes describe the \n",
"element .Attribute value is always quoted (either single or double quote).\n",
"Computer based systems can be mainly divided into 2 types, according to the distribution of the \n",
"components. Standalone Computer System - All the components are executed within a single device, Do \n",
"not need a network, Usually one or tightly coupled set of technologies are used to develop (JAVA, .NET).\n",
"Distributed system- The components are distributed and executed in multiple devices, Need a network,\n",
"Multiple and loosely coupled set of technologies are used to develop (HTML+CSS+JS + PHP).\n",
"Client-server architecture (3-tier)- 3-tier architecture is used, when there is a need for data persistence \n",
"and also to separate the application logic from the data . This can be seen as an extension of 2-tier \n",
"architecture. Client-server architecture (n-tier)- When there is a need for further separation and \n",
"distribution of the components, more tiers can be added and extend the 2-tier or 3-tier architecture into \n",
"an n-tier architecture.\n",
"DNS is a network, which consists of Domain Name Servers . DNS helps to map the domain name to the \n",
"IP address. Unified Resource Identifier (URI) 18 . URI is a string of characters designed for unambiguous \n",
"identification of resources. URI is extensible via the URI scheme. Unified Resource Name(URN) is a \n",
"Advantages of e-commerce . To businesses – After the capital cost, maintenance cost is low – Global \n",
"customers – Increased market share. Disadvantages of e-commerce. To businesses – For physical items, \n",
"storing and distributing is needed – Need to update the system frequently – Depends on the power and \n",
"the internet\n",
"The Internet is the global system of interconnected computer networks that use the Internet protocol \n",
"suite to link devices worldwide. ▪It is a network of networks ▪Consists of private, public, academic, \n",
"business, and government networks of local to global scope. ▪Linked by a broad array of electronic, \n",
"wireless, and optical networking technologies. system of rules that allow two or more entities of a \n",
"communications system to transmit information (wiki) ▪ the formal system of rules for correct behavior \n",
"on official occasions (Cambridge\n",
"Extensible markup language is Designed to store and transport data ,Both human- and machine\u0002readable (self descriptive) ,Often used for distributing data over networks ,Used by may other tools like \n",
"protocols. The main and the only component of XML is called an element . An element has 3 \n",
"components 1. Start tag 2. Body 3. End tag. An element has a name . Element names are case-sensitive \n",
".Element names must start with a letter or underscore . \n",
"This is the XML declaration ◦ Provides the instructions for the processor to understand the details of the \n",
"XML file ◦ Encoding attribute indicates the character set ◦ UTF-8 = Unicode Transformation Format (with \n",
"8-bit blocks to represent a character) .An element may have attribute(s) ◦ Attributes describe the \n",
"element .Attribute value is always quoted (either single or double quote).\n",
"Computer based systems can be mainly divided into 2 types, according to the distribution of the \n",
"components. Standalone Computer System - All the components are executed within a single device, Do \n",
"not need a network, Usually one or tightly coupled set of technologies are used to develop (JAVA, .NET).\n",
"Distributed system- The components are distributed and executed in multiple devices, Need a network,\n",
"Multiple and loosely coupled set of technologies are used to develop (HTML+CSS+JS + PHP).\n",
"Client-server architecture (3-tier)- 3-tier architecture is used, when there is a need for data persistence \n",
"and also to separate the application logic from the data . This can be seen as an extension of 2-tier \n",
"architecture. Client-server architecture (n-tier)- When there is a need for further separation and \n",
"distribution of the components, more tiers can be added and extend the 2-tier or 3-tier architecture into \n",
"an n-tier architecture.\n",
"DNS is a network, which consists of Domain Name Servers . DNS helps to map the domain name to the \n",
"IP address. Unified Resource Identifier (URI) 18 . URI is a string of characters designed for unambiguous \n",
"identification of resources. URI is extensible via the URI scheme. Unified Resource Name(URN) is a \n",
"device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\") #checks if a CUDA-enabled GPU is available. \n",
"question_model = question_model.to(device)\n",
"summary_model = summary_model.to(device)\n",
"\n",
"print(\"device is\", device)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"original Text >>\n",
" A network is a large system consisting of many similar parts that are connected together to allow movement or communication along the parts, or\n",
"between the parts and a control centre. There are different types of networks available. Telecommunication networks , Television or radio network\n",
",Transport networks , Social networks. a digital telecommunications network, which allows nodes to share resources. In computer networks, computing\n",
"devices exchange data with each other using connections between nodes (data links). A network is a large system consisting of many similar parts that\n",
"are connected together to allow movement or communication along the parts, or between the parts and a control centre. There are different types of\n",
"networks available. Telecommunication networks , Television or radio network ,Transport networks , Social networks. a digital telecommunications\n",
"network, which allows nodes to share resources. In computer networks, computing devices exchange data with each other using connections between\n",
"nodes (data links). The Internet is the global system of interconnected computer networks that use the Internet protocol suite to link devices\n",
"worldwide. ▪It is a network of networks ▪Consists of private, public, academic, business, and government networks of local to global scope. ▪Linked\n",
"by a broad array of electronic, wireless, and optical networking technologies. system of rules that allow two or more entities of a communications\n",
"system to transmit information (wiki) ▪ the formal system of rules for correct behavior on official occasions (Cambridge Extensible markup language\n",
"is Designed to store and transport data ,Both human- and machine\u0002readable (self descriptive) , Often used for distributing data over networks ,Used by\n",
"may other tools like protocols. The main and the only component of XML is called an element . An element has 3 components 1. Start tag 2. Body 3.\n",
"End tag. An element has a name . Element names are case-sensitive .Element names must start with a letter or underscore . This is the XML\n",
"declaration ◦ Provides the instructions for the processor to understand the details of the XML file ◦ Encoding attribute indicates the character set\n",
"◦ UTF-8 = Unicode Transformation Format (with 8-bit blocks to represent a character) .An element may have attribute(s) ◦ Attributes describe the\n",
"element .Attribute value is always quoted (either single or double quote). Computer based systems can be mainly divided into 2 types, according to the\n",
"distribution of the components. Standalone Computer System - All the components are executed within a single device, Do not need a network, Usually\n",
"one or tightly coupled set of technologies are used to develop (JAVA, .NET). Distributed system- The components are distributed and executed in\n",
"multiple devices, Need a network, Multiple and loosely coupled set of technologies are used to develop (HTML+CSS+JS + PHP). Client-server architecture\n",
"(3-tier)- 3-tier architecture is used, when there is a need for data persistence and also to separate the application logic from the data . This can\n",
"be seen as an extension of 2-tier architecture. Client-server architecture (n-tier)- When there is a need for further separation and distribution of\n",
"the components, more tiers can be added and extend the 2-tier or 3-tier architecture into an n-tier architecture. A network is a large system\n",
"consisting of many similar parts that are connected together to allow movement or communication along the parts, or between the parts and a control\n",
"centre. There are different types of networks available. Telecommunication networks , Television or radio network ,Transport networks , Social\n",
"networks. a digital telecommunications network, which allows nodes to share resources. In computer networks, computing devices exchange data with\n",
"each other using connections between nodes (data links). The Internet is the global system of interconnected computer networks that use the Internet\n",
"protocol suite to link devices worldwide. ▪It is a network of networks ▪Consists of private, public, academic, business, and government networks of\n",
"local to global scope. ▪Linked by a broad array of electronic, wireless, and optical networking technologies. system of rules that allow two or more\n",
"entities of a communications system to transmit information (wiki) ▪ the formal system of rules for correct behavior on official occasions\n",
"(Cambridge Extensible markup language is Designed to store and transport data ,Both human- and machine\u0002readable (self descriptive) ,Often used for\n",
"distributing data over networks ,Used by may other tools like protocols. The main and the only component of XML is called an element . An element has\n",
"3 components 1. Start tag 2. Body 3. End tag. An element has a name . Element names are case-sensitive .Element names must start with a letter or\n",
"underscore . This is the XML declaration ◦ Provides the instructions for the processor to understand the details of the XML file ◦ Encoding\n",
"attribute indicates the character set ◦ UTF-8 = Unicode Transformation Format (with 8-bit blocks to represent a character) .An element may have\n",
"attribute(s) ◦ Attributes describe the element .Attribute value is always quoted (either single or double quote). Computer based systems can be\n",
"mainly divided into 2 types, according to the distribution of the components. Standalone Computer System - All the components are executed within a\n",
"single device, Do not need a network, Usually one or tightly coupled set of technologies are used to develop (JAVA, .NET). Distributed system- The\n",
"components are distributed and executed in multiple devices, Need a network, Multiple and loosely coupled set of technologies are used to develop\n",
"(HTML+CSS+JS + PHP). Client-server architecture (3-tier)- 3-tier architecture is used, when there is a need for data persistence and also to separate\n",
"the application logic from the data . This can be seen as an extension of 2-tier architecture. Client-server architecture (n-tier)- When there is a\n",
"need for further separation and distribution of the components, more tiers can be added and extend the 2-tier or 3-tier architecture into an n-tier\n",
"architecture. DNS is a network, which consists of Domain Name Servers . DNS helps to map the domain name to the IP address. Unified Resource\n",
"Identifier (URI) 18 . URI is a string of characters designed for unambiguous identification of resources. URI is extensible via the URI scheme.\n",
"Unified Resource Name(URN) is a persistent, location-independent identifier. Website can be seen as a collection of web pages with static content\n",
".Early websites were entirely developed only using HTML – Nowadays, some server-side application components and databases are used to dynamically\n",
"generate the content – However, still the content is not user tailored. Web application is a single page or a collection of web pages, with\n",
"interactive components to dynamically generate the content E-commerce is a large domain, which covers many related concepts like – Internet\n",
"marketing – Electronic fund transfer – Online transaction processing. E-commerce systems provide online buying and selling over the internet. There\n",
"is a large variety of types of ecommerce systems – Online goods/soft items(software, e-books, videos) – Retail services (travel, food, cloths) –\n",
"Marketing services (advertising, auctions) – Customer services (help centers, online banking). Advantages of e-commerce . To businesses – After the\n",
"capital cost, maintenance cost is low – Global customers – Increased market share. Disadvantages of e-commerce. To businesses – For physical items,\n",
"storing and distributing is needed – Need to update the system frequently – Depends on the power and the internet. DNS is a network, which consists\n",
"of Domain Name Servers . DNS helps to map the domain name to the IP address. Unified Resource Identifier (URI) 18 . URI is a string of characters\n",
"designed for unambiguous identification of resources. URI is extensible via the URI scheme. Unified Resource Name(URN) is a persistent, location-\n",
"independent identifier. Website can be seen as a collection of web pages with static content .Early websites were entirely developed only using HTML\n",
"– Nowadays, some server-side application components and databases are used to dynamically generate the content – However, still the content is not\n",
"user tailored. Web application is a single page or a collection of web pages, with interactive components to dynamically generate the content\n",
"E-commerce is a large domain, which covers many related concepts like – Internet marketing – Electronic fund transfer – Online transaction processing.\n",
"E-commerce systems provide online buying and selling over the internet. There is a large variety of types of ecommerce systems – Online goods/soft\n",
"centers, online banking). Advantages of e-commerce . To businesses – After the capital cost, maintenance cost is low – Global customers – Increased\n",
"market share. Disadvantages of e-commerce. To businesses – For physical items, storing and distributing is needed – Need to update the system\n",
"frequently – Depends on the power and the internet The Internet is the global system of interconnected computer networks that use the Internet\n",
"protocol suite to link devices worldwide. ▪It is a network of networks ▪Consists of private, public, academic, business, and government networks of\n",
"local to global scope. ▪Linked by a broad array of electronic, wireless, and optical networking technologies. system of rules that allow two or more\n",
"entities of a communications system to transmit information (wiki) ▪ the formal system of rules for correct behavior on official occasions\n",
"(Cambridge Extensible markup language is Designed to store and transport data ,Both human- and machine\u0002readable (self descriptive) ,Often used for\n",
"distributing data over networks ,Used by may other tools like protocols. The main and the only component of XML is called an element . An element has\n",
"3 components 1. Start tag 2. Body 3. End tag. An element has a name . Element names are case-sensitive .Element names must start with a letter or\n",
"underscore . This is the XML declaration ◦ Provides the instructions for the processor to understand the details of the XML file ◦ Encoding\n",
"attribute indicates the character set ◦ UTF-8 = Unicode Transformation Format (with 8-bit blocks to represent a character) .An element may have\n",
"attribute(s) ◦ Attributes describe the element .Attribute value is always quoted (either single or double quote). Computer based systems can be\n",
"mainly divided into 2 types, according to the distribution of the components. Standalone Computer System - All the components are executed within a\n",
"single device, Do not need a network, Usually one or tightly coupled set of technologies are used to develop (JAVA, .NET). Distributed system- The\n",
"components are distributed and executed in multiple devices, Need a network, Multiple and loosely coupled set of technologies are used to develop\n",
"(HTML+CSS+JS + PHP). Client-server architecture (3-tier)- 3-tier architecture is used, when there is a need for data persistence and also to separate\n",
"the application logic from the data . This can be seen as an extension of 2-tier architecture. Client-server architecture (n-tier)- When there is a\n",
"need for further separation and distribution of the components, more tiers can be added and extend the 2-tier or 3-tier architecture into an n-tier\n",
"architecture. DNS is a network, which consists of Domain Name Servers . DNS helps to map the domain name to the IP address. Unified Resource\n",
"Identifier (URI) 18 . URI is a string of characters designed for unambiguous identification of resources. URI is extensible via the URI scheme.\n",
"Unified Resource Name(URN) is a persistent, location-independent identifier. Website can be seen as a collection of web pages with static content\n",
".Early websites were entirely developed only using HTML – Nowadays, some server-side application components and databases are used to dynamically\n",
"generate the content – However, still the content is not user tailored. Web application is a single page or a collection of web pages, with\n",
"interactive components to dynamically generate the content E-commerce is a large domain, which covers many related concepts like – Internet\n",
"marketing – Electronic fund transfer – Online transaction processing. E-commerce systems provide online buying and selling over the internet. There\n",
"is a large variety of types of ecommerce systems – Online goods/soft items(software, e-books, videos) – Retail services (travel, food, cloths) –\n",
"Marketing services (advertising, auctions) – Customer services (help centers, online banking). Advantages of e-commerce . To businesses – After the\n",
"capital cost, maintenance cost is low – Global customers – Increased market share. Disadvantages of e-commerce. To businesses – For physical items,\n",
"storing and distributing is needed – Need to update the system frequently – Depends on the power and the internet\n",
"\n",
"\n",
"Summarized Text >>\n",
"Network is a large system consisting of many similar parts that are connected together to allow movement or communication along the parts. In computer networks, computing devices exchange data with each other using connections between nodes (data links) the main and the only component of xml is\n",
"called an element - it has three components: start tag 2. Body 3. End tag.\n",
"\n",
"\n"
]
}
],
"source": [
"def postprocesstext (content): # takes a string content as input\n",
" final=\"\"\n",
" for sent in sent_tokenize(content):\n",
" sent = sent.capitalize()\n",
" final = final +\" \"+sent\n",
" return final\n",
"\n",
"def summarizer(\n",
" text,\n",
" model,\n",
" tokenizer,\n",
" max_len = 512\n",
" ):\n",
" text = text.strip().replace(\"\\n\",\" \")#removes leading and trailing whitespace and replaces newline characters in the input\n",
" text = \"summarize: \"+text\n",
" #encodes the modified text using the tokenizer.\n",
" #input_ids contains the numerical representations of the tokens, while attention_mask marks which tokens should be attended to by the model and which should be ignored.\n",
"#Generates a list of distractors (alternative options) for a given word using WordNet-finds hyponyms (related concepts) for a given synonym set (syn) \n",
"def get_distractors_wordnet(syn,word):\n",
" distractors=[]\n",
" word= word.lower()\n",
" orig_word = word\n",
" if len(word.split())>0:\n",
" word = word.replace(\" \",\"_\")\n",
"\n",
" hypernym = syn.hypernyms() #hypernyms- a more general concept\n",
" if len(hypernym) == 0: \n",
" return distractors\n",
" \n",
" for item in hypernym[0].hyponyms():\n",
" name = item.lemmas()[0].name()\n",
" if name == orig_word:\n",
" continue\n",
" name = name.replace(\"_\",\" \")\n",
" name = \" \".join(w.capitalize() for w in name.split())\n",
" if name is not None and name not in distractors:\n",
" distractors.append(name)\n",
" #list of formatted distractors.\n",
" return distractors\n",
"\n",
"#Similarity measures and the Lesk algorithm to choose the most relevant synset\n",
"def get_wordsense(sent,word):\n",
" word= word.lower()\n",
" \n",
" if len(word.split())>0:\n",
" word = word.replace(\" \",\"_\")\n",
" \n",
" \n",
" synsets = wn.synsets(word,'n')\n",
" if synsets:\n",
" wup = max_similarity(sent, word, 'wup', pos='n')\n",
" adapted_lesk_output = adapted_lesk(sent, word, pos='n')\n",
" lowest_index = min (synsets.index(wup),synsets.index(adapted_lesk_output))\n",
" return synsets[lowest_index]\n",
" else:\n",
" return None\n",
"\n",
"#Generates distractors using the ConceptNet API.\n",
"def get_distractors_conceptnet(word):\n",
" word = word.lower()\n",
" original_word= word\n",
" if (len(word.split())>0):\n",
" word = word.replace(\" \",\"_\")\n",
"\n",
" distractor_list = [] \n",
" #Construct a URL to query the ConceptNet API with the word as the start and end node.\n",
"C:\\Users\\GAMER\\AppData\\Roaming\\Python\\Python38\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n"
"d:\\Edu Me\\PP2\\files/lesson-summarization-qna is already a clone of https://huggingface.co/HiranyaDilukshi/lesson-summarization-qna. Make sure you pull the latest changes with `repo.git_pull()`.\n",
" 0%| | 0/640 [00:00<?, ?it/s]You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
" 4%|▍ | 27/640 [00:59<20:21, 1.99s/it]"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
" 4%|▍ | 28/640 [01:01<20:55, 2.05s/it]"
]
}
],
"source": [
"# Setting up and initialization of the training of Seq2Seq model for text summarization\n",