22 Module 3 - Day 5

Topics - Debugging python programs - Tips for debugging, looking at trace, error messages to find faults - debugging with pdb, sample session - Distributing libraries and packages. - One sample example of creating a python package - standalone script with dependencies library and usable script

For today’s practice make use of notebook module3-day5.ipynb created in your enviroment. Shut down kernel for all previous notebooks (if in runing condition) by right cliking on notbeook on left hand side file browser

22.1 Reading pdf files

You will need pypdf library

import pypdf

with open("upload.pdf", "rb") as f:
    pdfreader = pypdf.PdfReader(f)
    page0 = pdfreader.pages[0].extract_text()
    page1 = pdfreader.pages[1].extract_text()

print(page0)

 
National Load Despatch Centre 
राष्ट्र ीय भार प्रेषण कें द्र 
POWER SYSTEM OPERATION CORPORATION LIMITED 
पॉवर सिस्टम ऑपरेशन कारपोरेशन सिसमटेड 
 (Government of India Enterprise/ भारत िरकार का उद्यम) 
B-9, QUTUB INSTITUTIONAL AREA, KATWARIA SARAI, NEW DELHI -110016 
बी-9, क़ु तुब इन्स्टीट्यूशनि एररया, कटवाररया िराये, न्यू सिल्ली-110016 
_______________________________________________________________________________________________________________________________________ 
Ref: POSOCO/NLDC/SO/Daily PSP Report                    दिन ांक: 16th Jul 2020 
 
To, 
 
 
1. क र्यक री दनिेशक, पू .क्षे .भ  .प्रे .के.,  14 , गोल्फ क्लब रोड , कोलक त  - 700033 
Executive Director, ERLDC, 14 Golf Club Road, Tollygunge, Kolkata, 700033 
2. क र्यक री दनिेशक, ऊ. क्षे. भ . प्रे. के., 18/ ए , शहीि जीत दसांह सनसनव ल म गय, नई दिल्ली – 110016 
Executive Director, NRLDC, 18-A, Shaheed Jeet Singh Marg, Katwaria Sarai, New Delhi – 110016 
3. क र्यक री दनिेशक, प .क्षे .भ  .प्रे .के., एफ -3 , एम आई डी सी क्षेत्र , अांधेरी, म ांबई –  400093  
Executive Director, WRLDC, F-3, M.I.D.C. Area, Marol, Andheri (East), Mumbai-400093  
4. क र्यक री दनिेशक, ऊ. पू. क्षे. भ . प्रे. के., डोांगदतएह, लोअर नोांग्रह , ल पलांग, दशलोांग – 793006 
Executive Director, NERLDC, Dongteih, Lower Nongrah, Lapalang, Shillong - 793006, Meghalaya 
5. क र्यक री दनिेशक , ि .क्षे .भ  .प्रे .के.,  29 , रेस कोसय क्रॉस रोड, बांगल रु –  560009  
Executive Director, SRLDC, 29, Race Course Cross Road, Bangalore-560009 
 
Sub: Daily PSP Report for the date 15.07.2020. 
 
महोिर्/Dear Sir, 
 
आई०ई०जी०सी०-2010 की ध र  स.-5.5.1 के प्र वध न के अन स र, दिन ांक 15-ज ल ई-2020 की अखिल 
भ रतीर् प्रण ली की िैदनक दग्रड दनष्प िन ररपोर्य र ०भ ०प्रे०के० की वेबस इर् पर उप्लब्ध है |  
 
As per article 5.5.1 of the Indian Electricity Grid Code, the daily report pertaining power supply position of All India 
Power System for the date 15th July 2020, is available at the NLDC website.  
 
 
 
 
 
 
धन्यव ि,

print(page1)

NR WR SR ER NER TOTAL
59882 41115 34238 21526 2730 159491
1114 0 0 0 6 1120
1398 998 807 447 48 3698
355 33 77 149 29 643
11 49 128 - - 187
39.60 16.60 41.59 4.60 0.03 102
12.6 0.0 0.0 0.0 0.0 12.6
65470 43593 38117 21535 2827 160654
22:20 10:29 10:00 21:20 19:41 21:26
Region FVI < 49.7 49.7 - 49.8 49.8 - 49.9 < 49.9 49.9 - 50.05 > 50.05
All India 0.057 0.16 1.81 13.19 15.16 76.52 8.32
Max.Demand Shortage during Energy Met Drawal OD(+)/UD(-) Max OD Energy
Region States Met during the 
day(MW)
maximum 
Demand(MW) (MU) Schedule
(MU) (MU) (MW) Shortage 
(MU)
Punjab 11090 0 237.9 146.8 -1.8 49 0.0
Haryana 9388 0 209.4 152.8 0.7 325 1.9
Rajasthan 12087 0 262.4 119.7 5.4 809 0.0
Delhi 5726 0 118.6 102.8 -1.4 228 0.0
NR UP 22873 0 448.9 208.5 2.0 546 0.4
Uttarakhand 1899 0 42.8 20.7 0.8 111 0.0
HP 1366 0 28.6 -2.6 -0.2 91 0.0
J&K(UT) & Ladakh(UT) 2177 544 43.1 20.3 0.4 502 10.3
Chandigarh 295 0 6.0 5.9 0.2 61 0.0
Chhattisgarh 3685 0 86.9 36.8 0.8 468 0.0
Gujarat 13478 0 286.2 87.6 4.0 527 0.0
MP 9547 0 214.7 113.8 -3.8 198 0.0
WR Maharashtra 16964 0 365.1 138.1 -1.9 457 0.0
Goa 405 0 8.5 8.2 -0.2 33 0.0
DD 246 0 5.3 5.3 0.0 19 0.0
DNH 614 0 14.0 13.8 0.2 44 0.0
AMNSIL 777 0 17.1 4.2 0.7 272 0.0
Andhra Pradesh 6439 0 141.0 45.6 -1.3 607 0.0
Telangana 8614 0 167.3 81.6 -2.5 385 0.0
SR Karnataka 8486 0 155.1 51.1 -3.4 650 0.0
Kerala 3077 0 65.2 46.1 0.5 179 0.0
Tamil Nadu 12371 0 271.3 125.9 -3.7 573 0.0
Puducherry 349 0 7.5 7.5 -0.1 35 0.0
Bihar 5740 0 111.5 106.0 -0.3 386 0.0
DVC 2989 0 62.7 -42.6 -0.7 206 0.0
Jharkhand 1438 0 26.3 18.5 -1.0 124 0.0
ER Odisha 3983 0 82.2 -0.2 -0.2 325 0.0
West Bengal 7917 0 162.6 47.2 -0.8 303 0.0
Sikkim 100 0 1.4 1.5 -0.1 17 0.0
Arunachal Pradesh 120 3 2.0 1.8 0.2 40 0.0
Assam 1759 23 30.0 27.1 -0.1 135 0.0
Manipur 183 1 2.6 2.3 0.3 37 0.0
NER Meghalaya 307 2 5.3 -1.3 0.3 52 0.0
Mizoram 89 1 1.5 1.2 0.0 13 0.0
Nagaland 140 2 2.2 2.3 -0.2 23 0.0
Tripura 298 7 4.9 5.9 0.7 66 0.0
Bhutan Nepal Bangladesh
53.3 -1.5 -19.1
2337.0 -271.3 -1110.0
NR WR SR ER NER TOTAL
352.1 -295.4 95.0 -145.8 -6.0 0.0
359.2 -293.7 84.6 -152.6 -3.4 -6.0
7.1 1.6 -10.5 -6.9 2.6 -6.0
NR WR SR ER NER TOTAL
3838 14847 11792 3445 677 34598
9289 23225 14423 4892 47 51876
13127 38072 26215 8337 723 86473
NR WR SR ER NER All India
546 1080 370 482 7 2486
25 13 14 0 0 52
355 33 77 149 29 643
26 33 47 0 0 106
40 82 19 0 22 163
71 73 210 5 0 359
1063 1314 737 636 58 3809
6.71 5.54 28.51 0.73 0.05 9.43
42.55 10.54 45.35 24.19 49.63 29.09
1.068
1.102Based on State Max Demands
Diversity factor = Sum of regional or state maximum demands / All India maximum demand
*Source: RLDCs for solar connected to ISTS; SLDCs for embedded solar. Limited visibility of embedded solar data.
Executive Director-NLDC
Share of RES in total generation (%)
Share of Non-fossil fuel (Hydro,Nuclear and RES) in total generation(%)
H. All India Demand Diversity Factor
Based on Regional Max Demands
Lignite
Hydro
Nuclear
Gas, Naptha & Diesel
RES (Wind, Solar, Biomass & Others)
Total
State Sector
Total
G. Sourcewise generation (MU)
Coal
Actual(MU)
O/D/U/D(MU)
F. Generation Outage(MW)
Central Sector
Day Peak (MW)
E. Import/Export by Regions (in MU) - Import(+ve)/Export(-ve); OD(+)/UD(-)
Schedule(MU)
D. Transnational Exchanges (MU) - Import(+ve)/Export(-ve)   
Actual (MU)
Energy Shortage (MU)
Maximum Demand Met During the Day (MW) (From NLDC SCADA)
Time Of Maximum Demand Met (From NLDC SCADA)
B. Frequency Profile (%)
C. Power Supply Position in States
Demand Met during Evening Peak hrs(MW) (at 2000 hrs; from RLDCs)
Peak Shortage (MW)
Energy Met (MU)
Hydro Gen (MU)
Wind Gen (MU)
Solar Gen (MU)*
Report for previous day Date of Reporting: 16-Jul-2020
A. Power Supply Position at All India and Regional level

import csv

def extract_table(page):
    lines = page.split("\n")
    tableA = lines[:9] # take first 9 lines
    headers = tableA[0].split()
    data = [line.strip().split() for line in tableA[1:]]
    return headers, data

def write_csv(headers, data, filename):
    with open(filename, "w") as f:
        csvf = csv.writer(f)
        csvf.writerow(headers)
        for row in data:
            csvf.writerow(row)

with open("upload.pdf", "rb") as f:
    pdfreader = pypdf.PdfReader(f)
    page1 = pdfreader.pages[1].extract_text()
    headers, data = extract_table(page1)
    write_csv(headers, data, "tableA.csv")

!cat tableA.csv

NR,WR,SR,ER,NER,TOTAL
59882,41115,34238,21526,2730,159491
1114,0,0,0,6,1120
1398,998,807,447,48,3698
355,33,77,149,29,643
11,49,128,-,-,187
39.60,16.60,41.59,4.60,0.03,102
12.6,0.0,0.0,0.0,0.0,12.6
65470,43593,38117,21535,2827,160654

import pandas as pd

df = pd.read_csv("tableA.csv")

df

	NR	WR	SR	ER	NER	TOTAL
0	59882.0	41115.0	34238.00	21526	2730	159491.0
1	1114.0	0.0	0.00	0	6	1120.0
2	1398.0	998.0	807.00	447	48	3698.0
3	355.0	33.0	77.00	149	29	643.0
4	11.0	49.0	128.00	-	-	187.0
5	39.6	16.6	41.59	4.60	0.03	102.0
6	12.6	0.0	0.00	0.0	0.0	12.6
7	65470.0	43593.0	38117.00	21535	2827	160654.0

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   NR      8 non-null      float64
 1   WR      8 non-null      float64
 2   SR      8 non-null      float64
 3   ER      8 non-null      object 
 4   NER     8 non-null      object 
 5   TOTAL   8 non-null      float64
dtypes: float64(4), object(2)
memory usage: 516.0+ bytes

df.ER.str.replace("-","")

0    21526
1        0
2      447
3      149
4         
5     4.60
6      0.0
7    21535
Name: ER, dtype: object

df['ER'] = pd.to_numeric(df.ER.str.replace("-",""))

df['NER'] = pd.to_numeric(df.NER.str.replace("-",""))

df

	NR	WR	SR	ER	NER	TOTAL
0	59882.0	41115.0	34238.00	21526.0	2730.00	159491.0
1	1114.0	0.0	0.00	0.0	6.00	1120.0
2	1398.0	998.0	807.00	447.0	48.00	3698.0
3	355.0	33.0	77.00	149.0	29.00	643.0
4	11.0	49.0	128.00	NaN	NaN	187.0
5	39.6	16.6	41.59	4.6	0.03	102.0
6	12.6	0.0	0.00	0.0	0.00	12.6
7	65470.0	43593.0	38117.00	21535.0	2827.00	160654.0

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   NR      8 non-null      float64
 1   WR      8 non-null      float64
 2   SR      8 non-null      float64
 3   ER      7 non-null      float64
 4   NER     7 non-null      float64
 5   TOTAL   8 non-null      float64
dtypes: float64(6)
memory usage: 516.0 bytes

df.sum()

NR       128282.20
WR        85804.60
SR        73408.59
ER        43661.60
NER        5640.03
TOTAL    325907.60
dtype: float64

22.1.1 when will yopu automate using python program?

When you know that use case is going to be used every or frequently.
The data comes in same format everytime!
decide with your judgement if you want to write program or take data out mannualy.

22.2 Debuging

Look carefuly the error trace given by interpreter

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[25], line 1
----> 1 x

NameError: name 'x' is not defined

def square(x):
    return a*a

def sumofsquares(a, b):
    return square(a) + square(b)

sumofsquares(40, 50)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[26], line 7
      4 def sumofsquares(a, b):
      5     return square(a) + square(b)
----> 7 sumofsquares(40, 50)

Cell In[26], line 5, in sumofsquares(a, b)
      4 def sumofsquares(a, b):
----> 5     return square(a) + square(b)

Cell In[26], line 2, in square(x)
      1 def square(x):
----> 2     return a*a

NameError: name 'a' is not defined

def square(a):
    return a*a

def sumofsquares(a, b):
    return square(a) + square(b)

sumofsquares(40, 50)

def square(a):
    return a*a

def sumofsquares(a, b):
    return square(a) + squareb)

sumofsquares(40, 50)

  Cell In[28], line 5
    return square(a) + squareb)
                              ^
SyntaxError: unmatched ')'

def square(a):
    return a*a

def sumofsquares(a, b):
    return square(a) + square(b)

sumofsquares(40, 50)

def mysum(*nums): # the function expects separate numeric arguments
    s = 0
    for n in nums:
        s += n
    return s

mysum(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) #

mysum([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[33], line 1
----> 1 mysum([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

Cell In[32], line 4, in mysum(*nums)
      2 s = 0
      3 for n in nums:
----> 4     s += n
      5 return s

TypeError: unsupported operand type(s) for +=: 'int' and 'list'

0 + [1, 2, 3, 4]

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[34], line 1
----> 1 0 + [1, 2, 3, 4]

TypeError: unsupported operand type(s) for +: 'int' and 'list'

def mysum(*nums): # the function expects separate numeric arguments
    s = 0
    for n in nums:
        print(s)
        print(n)
        s += n
    return s

mysum([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

0
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[36], line 1
----> 1 mysum([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

Cell In[35], line 6, in mysum(*nums)
      4     print(s)
      5     print(n)
----> 6     s += n
      7 return s

TypeError: unsupported operand type(s) for +=: 'int' and 'list'

nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
mysum(*nums) # this actually opens up the list items and passed as separate arguments

sum = sumofsquares(50, 60) # overwriting the bulit in function

type(sum)

int

sum(nums)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[40], line 1
----> 1 sum(nums)

TypeError: 'int' object is not callable

del sum

sum(nums)

22.2.1 giving approprotate names to vaiariable and functions

def function2(data, i):
    return [item[i] for item in data]


def function1(x):
    y = []
    n = len(x[0]) # str, dict, list, tuple
    for i in range(n):
        y.append(function2(x, i))
    return y

Questions 1. What is x in function1? 2. What is data in function2? # possibly a list, tuple 3. what is function2 doing? 4. what is function1 doing?

matrix = [[11, 12, 13],
          [21, 22, 23],
          [31, 32, 33]]

function2(matrix, 0) # this is 0th column

[11, 21, 31]

function2(matrix, 1) # it returns 1st column

[12, 22, 32]

function2(matrix, 2)

[13, 23, 33]

def column(matrix, colnum):
    return [row[colnum] for row in matrix]

function1(matrix) # collection of columns

[[11, 21, 31], [12, 22, 32], [13, 23, 33]]

matrix # collection of rows

[[11, 12, 13], [21, 22, 23], [31, 32, 33]]

def transpose(matrix):
    cols = []
    colcount = len(matrix[0]) 
    for i in range(colcount):
        cols.append(column(matrix, i))
    return y

def column(matrix, colnum):
    """finds nth column from the matrix
    assumes that all rows of matrix are of same size
    """
    return [row[colnum] for row in matrix]


def transpose(matrix):
    """finds transpose of a 2d matrix
    assumes that all rows of matrix are of same size
    """
    firstrow = matrix[0]
    colcount = len(firstrow) 
    return [column(matrix, c) for c in range(colcount)]

transpose(matrix)

[[11, 21, 31], [12, 22, 32], [13, 23, 33]]

Naming Convention

i,j,k are for iteration varaiables , preferable if those are integers
x,y ..look like int/float from algebra!
if you know exactly what is the integers/float data, make use of that
for example instaed i , I
can use col/c if i represents column index
for example instaed i , I can use row/r if i represents row index
for iterating over 2D data, each item from 2D can be iterated as [row for row in data2D] instead [item for item in data2d]
Give function name which represents the activity done by that function clearly.

22.3 Python debugger

python comes with built in debugger, pdb

%%file extract_table.py
import pypdf
import csv
import sys

def extract_table(page):
    lines = page.split("\n")
    tableA = lines[:9] # take first 9 lines
    headers = tableA[0].split()
    data = [line.strip().split() for line in tableA[1:]]
    return headers, data

def write_csv(headers, data, filename):
    with open(filename, "w") as f:
        csvf = csv.writer(f)
        csvf.writerow(headers)
        for row in data:
            csvf.writerow(row)

def extract_table_from_pdf(pdffile, csvfile):
    with open(pdffile, "rb") as f:
        pdfreader = pypdf.PdfReader(f)
        page1 = pdfreader.pages[1].extract_text()
        headers, data = extract_table(page1)
        write_csv(headers, data, csvfile)

if __name__ == "__main__":
    pdf = sys.argv[1]
    csvfile = sys.argv[2]
    extract_table_from_pdf(pdf, csvfile)

Overwriting extract_table.py

some useful commands that you will need inside debugger - l # print the program with line number - h # show help … it will show all possible commnad - h b # show help for command b - b 34 # this sets breakpoint at line no. 34 - c # continue till next breakpoint - n # executye current line and go to next line and stop - p var # this prints the var in context

22.4 Packaging your python code

22.4.1 pyhton script, instructions, requirements.txt

extract_table.py
requirements.txt
README.md

22.4.2 Creating executable using pyinstaller

You will need third party library pyinstaller

!pip install pyinstaller

6033.94s - pydevd: Sending message related to process being replaced timed-out after 5 seconds

Defaulting to user installation because normal site-packages is not writeable

Collecting pyinstaller

  Downloading pyinstaller-6.19.0-py3-none-manylinux2014_x86_64.whl.metadata (8.5 kB)

Collecting altgraph (from pyinstaller)

  Downloading altgraph-0.17.5-py2.py3-none-any.whl.metadata (7.5 kB)

Requirement already satisfied: packaging>=22.0 in /opt/tljh/user/lib/python3.12/site-packages (from pyinstaller) (24.1)

Collecting pyinstaller-hooks-contrib>=2026.0 (from pyinstaller)

  Downloading pyinstaller_hooks_contrib-2026.1-py3-none-any.whl.metadata (16 kB)

Requirement already satisfied: setuptools>=42.0.0 in /opt/tljh/user/lib/python3.12/site-packages (from pyinstaller) (74.1.2)

Downloading pyinstaller-6.19.0-py3-none-manylinux2014_x86_64.whl (741 kB)

   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 741.4/741.4 kB 10.8 MB/s eta 0:00:00

Downloading pyinstaller_hooks_contrib-2026.1-py3-none-any.whl (452 kB)

Downloading altgraph-0.17.5-py2.py3-none-any.whl (21 kB)

Installing collected packages: altgraph, pyinstaller-hooks-contrib, pyinstaller

  WARNING: The scripts pyi-archive_viewer, pyi-bindepend, pyi-grab_version, pyi-makespec, pyi-set_version and pyinstaller are installed in '/home/jupyter-vikrant/.local/bin' which is not on PATH.

  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.

Successfully installed altgraph-0.17.5 pyinstaller-6.19.0 pyinstaller-hooks-contrib-2026.1

Following command will create executable for your python program

pyinstaller -F extract_table.py

this will create an executable with name extract_table inside dist folder. This executable can be distributed which users can use directly as commnad line tool.

22.5 Python package

Python pacakge has some folder structure. Here is one sample

You will need to follow some folder structure

tableA
  |
  |--setup.py
  |--requirements.txt
  +-A
    |
    |--__init__.py
    |-- extract_table.py
    +-B
      |
      |-- __init__.py
      |-- stats.py

!mkdir tableA
!mkdir tableA/A
!mkdir tableA/A/B

6720.97s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
6726.09s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
6731.22s - pydevd: Sending message related to process being replaced timed-out after 5 seconds

!touch tableA/A/__init___.py
!touch tableA/A/B/__init___.py

6781.43s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
6786.56s - pydevd: Sending message related to process being replaced timed-out after 5 seconds

%%file tableA/A/extract_tableA.py
import pypdf
import csv
import sys

def extract_table(page):
    lines = page.split("\n")
    tableA = lines[:9] # take first 9 lines
    headers = tableA[0].split()
    data = [line.strip().split() for line in tableA[1:]]
    return headers, data

def write_csv(headers, data, filename):
    with open(filename, "w") as f:
        csvf = csv.writer(f)
        csvf.writerow(headers)
        for row in data:
            csvf.writerow(row)

def extract_table_from_pdf(pdffile, csvfile):
    with open(pdffile, "rb") as f:
        pdfreader = pypdf.PdfReader(f)
        page1 = pdfreader.pages[1].extract_text()
        headers, data = extract_table(page1)
        write_csv(headers, data, csvfile)

if __name__ == "__main__":
    pdf = sys.argv[1]
    csvfile = sys.argv[2]
    extract_table_from_pdf(pdf, csvfile)

Writing tableA/A/extract_tableA.py

%%file tableA/A/B/stats.py

def mean(nums):
    pass

def std(nums):
    pass

Writing tableA/A/B/stats.py

%%file tableA/setup.py
from distutils.core import setup

setup(
    name="tableA",
    version='1.0',
    description="A sample package",
    author="Vikrant",
    author_email="sads@sdsa.com",
    url="https://somesample.web.com",
    packages=['A','A.B'],
    install_requires=[
        'pypdf',
        'pandas'
    ],
)

Overwriting tableA/setup.py

%%file tableA/reqirements.txt
pypdf
pandas

Writing tableA/reqirements.txt