import pypdf22 Module 3 - Day 5
Topics - Debugging python programs - Tips for debugging, looking at trace, error messages to find faults - debugging with pdb, sample session - Distributing libraries and packages. - One sample example of creating a python package - standalone script with dependencies library and usable script
For today’s practice make use of notebook module3-day5.ipynb created in your enviroment. Shut down kernel for all previous notebooks (if in runing condition) by right cliking on notbeook on left hand side file browser
22.1 Reading pdf files
You will need pypdf library
with open("upload.pdf", "rb") as f:
pdfreader = pypdf.PdfReader(f)
page0 = pdfreader.pages[0].extract_text()
page1 = pdfreader.pages[1].extract_text()print(page0)
National Load Despatch Centre
राष्ट्र ीय भार प्रेषण कें द्र
POWER SYSTEM OPERATION CORPORATION LIMITED
पॉवर सिस्टम ऑपरेशन कारपोरेशन सिसमटेड
(Government of India Enterprise/ भारत िरकार का उद्यम)
B-9, QUTUB INSTITUTIONAL AREA, KATWARIA SARAI, NEW DELHI -110016
बी-9, क़ु तुब इन्स्टीट्यूशनि एररया, कटवाररया िराये, न्यू सिल्ली-110016
_______________________________________________________________________________________________________________________________________
Ref: POSOCO/NLDC/SO/Daily PSP Report दिन ांक: 16th Jul 2020
To,
1. क र्यक री दनिेशक, पू .क्षे .भ .प्रे .के., 14 , गोल्फ क्लब रोड , कोलक त - 700033
Executive Director, ERLDC, 14 Golf Club Road, Tollygunge, Kolkata, 700033
2. क र्यक री दनिेशक, ऊ. क्षे. भ . प्रे. के., 18/ ए , शहीि जीत दसांह सनसनव ल म गय, नई दिल्ली – 110016
Executive Director, NRLDC, 18-A, Shaheed Jeet Singh Marg, Katwaria Sarai, New Delhi – 110016
3. क र्यक री दनिेशक, प .क्षे .भ .प्रे .के., एफ -3 , एम आई डी सी क्षेत्र , अांधेरी, म ांबई – 400093
Executive Director, WRLDC, F-3, M.I.D.C. Area, Marol, Andheri (East), Mumbai-400093
4. क र्यक री दनिेशक, ऊ. पू. क्षे. भ . प्रे. के., डोांगदतएह, लोअर नोांग्रह , ल पलांग, दशलोांग – 793006
Executive Director, NERLDC, Dongteih, Lower Nongrah, Lapalang, Shillong - 793006, Meghalaya
5. क र्यक री दनिेशक , ि .क्षे .भ .प्रे .के., 29 , रेस कोसय क्रॉस रोड, बांगल रु – 560009
Executive Director, SRLDC, 29, Race Course Cross Road, Bangalore-560009
Sub: Daily PSP Report for the date 15.07.2020.
महोिर्/Dear Sir,
आई०ई०जी०सी०-2010 की ध र स.-5.5.1 के प्र वध न के अन स र, दिन ांक 15-ज ल ई-2020 की अखिल
भ रतीर् प्रण ली की िैदनक दग्रड दनष्प िन ररपोर्य र ०भ ०प्रे०के० की वेबस इर् पर उप्लब्ध है |
As per article 5.5.1 of the Indian Electricity Grid Code, the daily report pertaining power supply position of All India
Power System for the date 15th July 2020, is available at the NLDC website.
धन्यव ि,
print(page1)NR WR SR ER NER TOTAL
59882 41115 34238 21526 2730 159491
1114 0 0 0 6 1120
1398 998 807 447 48 3698
355 33 77 149 29 643
11 49 128 - - 187
39.60 16.60 41.59 4.60 0.03 102
12.6 0.0 0.0 0.0 0.0 12.6
65470 43593 38117 21535 2827 160654
22:20 10:29 10:00 21:20 19:41 21:26
Region FVI < 49.7 49.7 - 49.8 49.8 - 49.9 < 49.9 49.9 - 50.05 > 50.05
All India 0.057 0.16 1.81 13.19 15.16 76.52 8.32
Max.Demand Shortage during Energy Met Drawal OD(+)/UD(-) Max OD Energy
Region States Met during the
day(MW)
maximum
Demand(MW) (MU) Schedule
(MU) (MU) (MW) Shortage
(MU)
Punjab 11090 0 237.9 146.8 -1.8 49 0.0
Haryana 9388 0 209.4 152.8 0.7 325 1.9
Rajasthan 12087 0 262.4 119.7 5.4 809 0.0
Delhi 5726 0 118.6 102.8 -1.4 228 0.0
NR UP 22873 0 448.9 208.5 2.0 546 0.4
Uttarakhand 1899 0 42.8 20.7 0.8 111 0.0
HP 1366 0 28.6 -2.6 -0.2 91 0.0
J&K(UT) & Ladakh(UT) 2177 544 43.1 20.3 0.4 502 10.3
Chandigarh 295 0 6.0 5.9 0.2 61 0.0
Chhattisgarh 3685 0 86.9 36.8 0.8 468 0.0
Gujarat 13478 0 286.2 87.6 4.0 527 0.0
MP 9547 0 214.7 113.8 -3.8 198 0.0
WR Maharashtra 16964 0 365.1 138.1 -1.9 457 0.0
Goa 405 0 8.5 8.2 -0.2 33 0.0
DD 246 0 5.3 5.3 0.0 19 0.0
DNH 614 0 14.0 13.8 0.2 44 0.0
AMNSIL 777 0 17.1 4.2 0.7 272 0.0
Andhra Pradesh 6439 0 141.0 45.6 -1.3 607 0.0
Telangana 8614 0 167.3 81.6 -2.5 385 0.0
SR Karnataka 8486 0 155.1 51.1 -3.4 650 0.0
Kerala 3077 0 65.2 46.1 0.5 179 0.0
Tamil Nadu 12371 0 271.3 125.9 -3.7 573 0.0
Puducherry 349 0 7.5 7.5 -0.1 35 0.0
Bihar 5740 0 111.5 106.0 -0.3 386 0.0
DVC 2989 0 62.7 -42.6 -0.7 206 0.0
Jharkhand 1438 0 26.3 18.5 -1.0 124 0.0
ER Odisha 3983 0 82.2 -0.2 -0.2 325 0.0
West Bengal 7917 0 162.6 47.2 -0.8 303 0.0
Sikkim 100 0 1.4 1.5 -0.1 17 0.0
Arunachal Pradesh 120 3 2.0 1.8 0.2 40 0.0
Assam 1759 23 30.0 27.1 -0.1 135 0.0
Manipur 183 1 2.6 2.3 0.3 37 0.0
NER Meghalaya 307 2 5.3 -1.3 0.3 52 0.0
Mizoram 89 1 1.5 1.2 0.0 13 0.0
Nagaland 140 2 2.2 2.3 -0.2 23 0.0
Tripura 298 7 4.9 5.9 0.7 66 0.0
Bhutan Nepal Bangladesh
53.3 -1.5 -19.1
2337.0 -271.3 -1110.0
NR WR SR ER NER TOTAL
352.1 -295.4 95.0 -145.8 -6.0 0.0
359.2 -293.7 84.6 -152.6 -3.4 -6.0
7.1 1.6 -10.5 -6.9 2.6 -6.0
NR WR SR ER NER TOTAL
3838 14847 11792 3445 677 34598
9289 23225 14423 4892 47 51876
13127 38072 26215 8337 723 86473
NR WR SR ER NER All India
546 1080 370 482 7 2486
25 13 14 0 0 52
355 33 77 149 29 643
26 33 47 0 0 106
40 82 19 0 22 163
71 73 210 5 0 359
1063 1314 737 636 58 3809
6.71 5.54 28.51 0.73 0.05 9.43
42.55 10.54 45.35 24.19 49.63 29.09
1.068
1.102Based on State Max Demands
Diversity factor = Sum of regional or state maximum demands / All India maximum demand
*Source: RLDCs for solar connected to ISTS; SLDCs for embedded solar. Limited visibility of embedded solar data.
Executive Director-NLDC
Share of RES in total generation (%)
Share of Non-fossil fuel (Hydro,Nuclear and RES) in total generation(%)
H. All India Demand Diversity Factor
Based on Regional Max Demands
Lignite
Hydro
Nuclear
Gas, Naptha & Diesel
RES (Wind, Solar, Biomass & Others)
Total
State Sector
Total
G. Sourcewise generation (MU)
Coal
Actual(MU)
O/D/U/D(MU)
F. Generation Outage(MW)
Central Sector
Day Peak (MW)
E. Import/Export by Regions (in MU) - Import(+ve)/Export(-ve); OD(+)/UD(-)
Schedule(MU)
D. Transnational Exchanges (MU) - Import(+ve)/Export(-ve)
Actual (MU)
Energy Shortage (MU)
Maximum Demand Met During the Day (MW) (From NLDC SCADA)
Time Of Maximum Demand Met (From NLDC SCADA)
B. Frequency Profile (%)
C. Power Supply Position in States
Demand Met during Evening Peak hrs(MW) (at 2000 hrs; from RLDCs)
Peak Shortage (MW)
Energy Met (MU)
Hydro Gen (MU)
Wind Gen (MU)
Solar Gen (MU)*
Report for previous day Date of Reporting: 16-Jul-2020
A. Power Supply Position at All India and Regional level
import csv
def extract_table(page):
lines = page.split("\n")
tableA = lines[:9] # take first 9 lines
headers = tableA[0].split()
data = [line.strip().split() for line in tableA[1:]]
return headers, data
def write_csv(headers, data, filename):
with open(filename, "w") as f:
csvf = csv.writer(f)
csvf.writerow(headers)
for row in data:
csvf.writerow(row)
with open("upload.pdf", "rb") as f:
pdfreader = pypdf.PdfReader(f)
page1 = pdfreader.pages[1].extract_text()
headers, data = extract_table(page1)
write_csv(headers, data, "tableA.csv")!cat tableA.csvNR,WR,SR,ER,NER,TOTAL
59882,41115,34238,21526,2730,159491
1114,0,0,0,6,1120
1398,998,807,447,48,3698
355,33,77,149,29,643
11,49,128,-,-,187
39.60,16.60,41.59,4.60,0.03,102
12.6,0.0,0.0,0.0,0.0,12.6
65470,43593,38117,21535,2827,160654
import pandas as pddf = pd.read_csv("tableA.csv")df| NR | WR | SR | ER | NER | TOTAL | |
|---|---|---|---|---|---|---|
| 0 | 59882.0 | 41115.0 | 34238.00 | 21526 | 2730 | 159491.0 |
| 1 | 1114.0 | 0.0 | 0.00 | 0 | 6 | 1120.0 |
| 2 | 1398.0 | 998.0 | 807.00 | 447 | 48 | 3698.0 |
| 3 | 355.0 | 33.0 | 77.00 | 149 | 29 | 643.0 |
| 4 | 11.0 | 49.0 | 128.00 | - | - | 187.0 |
| 5 | 39.6 | 16.6 | 41.59 | 4.60 | 0.03 | 102.0 |
| 6 | 12.6 | 0.0 | 0.00 | 0.0 | 0.0 | 12.6 |
| 7 | 65470.0 | 43593.0 | 38117.00 | 21535 | 2827 | 160654.0 |
df.info()<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 NR 8 non-null float64
1 WR 8 non-null float64
2 SR 8 non-null float64
3 ER 8 non-null object
4 NER 8 non-null object
5 TOTAL 8 non-null float64
dtypes: float64(4), object(2)
memory usage: 516.0+ bytes
df.ER.str.replace("-","")0 21526
1 0
2 447
3 149
4
5 4.60
6 0.0
7 21535
Name: ER, dtype: object
df['ER'] = pd.to_numeric(df.ER.str.replace("-",""))df['NER'] = pd.to_numeric(df.NER.str.replace("-",""))df| NR | WR | SR | ER | NER | TOTAL | |
|---|---|---|---|---|---|---|
| 0 | 59882.0 | 41115.0 | 34238.00 | 21526.0 | 2730.00 | 159491.0 |
| 1 | 1114.0 | 0.0 | 0.00 | 0.0 | 6.00 | 1120.0 |
| 2 | 1398.0 | 998.0 | 807.00 | 447.0 | 48.00 | 3698.0 |
| 3 | 355.0 | 33.0 | 77.00 | 149.0 | 29.00 | 643.0 |
| 4 | 11.0 | 49.0 | 128.00 | NaN | NaN | 187.0 |
| 5 | 39.6 | 16.6 | 41.59 | 4.6 | 0.03 | 102.0 |
| 6 | 12.6 | 0.0 | 0.00 | 0.0 | 0.00 | 12.6 |
| 7 | 65470.0 | 43593.0 | 38117.00 | 21535.0 | 2827.00 | 160654.0 |
df.info()<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 NR 8 non-null float64
1 WR 8 non-null float64
2 SR 8 non-null float64
3 ER 7 non-null float64
4 NER 7 non-null float64
5 TOTAL 8 non-null float64
dtypes: float64(6)
memory usage: 516.0 bytes
df.sum()NR 128282.20
WR 85804.60
SR 73408.59
ER 43661.60
NER 5640.03
TOTAL 325907.60
dtype: float64
22.1.1 when will yopu automate using python program?
- When you know that use case is going to be used every or frequently.
- The data comes in same format everytime!
- decide with your judgement if you want to write program or take data out mannualy.
22.2 Debuging
Look carefuly the error trace given by interpreter
x--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[25], line 1 ----> 1 x NameError: name 'x' is not defined
def square(x):
return a*a
def sumofsquares(a, b):
return square(a) + square(b)
sumofsquares(40, 50)--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[26], line 7 4 def sumofsquares(a, b): 5 return square(a) + square(b) ----> 7 sumofsquares(40, 50) Cell In[26], line 5, in sumofsquares(a, b) 4 def sumofsquares(a, b): ----> 5 return square(a) + square(b) Cell In[26], line 2, in square(x) 1 def square(x): ----> 2 return a*a NameError: name 'a' is not defined
def square(a):
return a*a
def sumofsquares(a, b):
return square(a) + square(b)
sumofsquares(40, 50)4100
def square(a):
return a*a
def sumofsquares(a, b):
return square(a) + squareb)
sumofsquares(40, 50)Cell In[28], line 5 return square(a) + squareb) ^ SyntaxError: unmatched ')'
def square(a):
return a*a
def sumofsquares(a, b):
return square(a) + square(b)
sumofsquares(40, 50)4100
def mysum(*nums): # the function expects separate numeric arguments
s = 0
for n in nums:
s += n
return smysum(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) # 55
mysum([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[33], line 1 ----> 1 mysum([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) Cell In[32], line 4, in mysum(*nums) 2 s = 0 3 for n in nums: ----> 4 s += n 5 return s TypeError: unsupported operand type(s) for +=: 'int' and 'list'
0 + [1, 2, 3, 4]--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[34], line 1 ----> 1 0 + [1, 2, 3, 4] TypeError: unsupported operand type(s) for +: 'int' and 'list'
def mysum(*nums): # the function expects separate numeric arguments
s = 0
for n in nums:
print(s)
print(n)
s += n
return smysum([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])0
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[36], line 1 ----> 1 mysum([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) Cell In[35], line 6, in mysum(*nums) 4 print(s) 5 print(n) ----> 6 s += n 7 return s TypeError: unsupported operand type(s) for +=: 'int' and 'list'
nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
mysum(*nums) # this actually opens up the list items and passed as separate arguments0
1
1
2
3
3
6
4
10
5
15
6
21
7
28
8
36
9
45
10
55
sum = sumofsquares(50, 60) # overwriting the bulit in function type(sum)int
sum(nums)--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[40], line 1 ----> 1 sum(nums) TypeError: 'int' object is not callable
del sumsum(nums)55
22.2.1 giving approprotate names to vaiariable and functions
def function2(data, i):
return [item[i] for item in data]
def function1(x):
y = []
n = len(x[0]) # str, dict, list, tuple
for i in range(n):
y.append(function2(x, i))
return yQuestions 1. What is x in function1? 2. What is data in function2? # possibly a list, tuple 3. what is function2 doing? 4. what is function1 doing?
matrix = [[11, 12, 13],
[21, 22, 23],
[31, 32, 33]]function2(matrix, 0) # this is 0th column[11, 21, 31]
function2(matrix, 1) # it returns 1st column[12, 22, 32]
function2(matrix, 2)[13, 23, 33]
def column(matrix, colnum):
return [row[colnum] for row in matrix]function1(matrix) # collection of columns[[11, 21, 31], [12, 22, 32], [13, 23, 33]]
matrix # collection of rows[[11, 12, 13], [21, 22, 23], [31, 32, 33]]
def transpose(matrix):
cols = []
colcount = len(matrix[0])
for i in range(colcount):
cols.append(column(matrix, i))
return ydef column(matrix, colnum):
"""finds nth column from the matrix
assumes that all rows of matrix are of same size
"""
return [row[colnum] for row in matrix]
def transpose(matrix):
"""finds transpose of a 2d matrix
assumes that all rows of matrix are of same size
"""
firstrow = matrix[0]
colcount = len(firstrow)
return [column(matrix, c) for c in range(colcount)] transpose(matrix)[[11, 21, 31], [12, 22, 32], [13, 23, 33]]
Naming Convention
- i,j,k are for iteration varaiables , preferable if those are integers
- x,y ..look like int/float from algebra!
- if you know exactly what is the integers/float data, make use of that
- for example instaed i , I
- can use col/c if i represents column index
- for example instaed i , I can use row/r if i represents row index
- for iterating over 2D data, each item from 2D can be iterated as [row for row in data2D] instead [item for item in data2d]
- Give function name which represents the activity done by that function clearly.
22.3 Python debugger
python comes with built in debugger, pdb
%%file extract_table.py
import pypdf
import csv
import sys
def extract_table(page):
lines = page.split("\n")
tableA = lines[:9] # take first 9 lines
headers = tableA[0].split()
data = [line.strip().split() for line in tableA[1:]]
return headers, data
def write_csv(headers, data, filename):
with open(filename, "w") as f:
csvf = csv.writer(f)
csvf.writerow(headers)
for row in data:
csvf.writerow(row)
def extract_table_from_pdf(pdffile, csvfile):
with open(pdffile, "rb") as f:
pdfreader = pypdf.PdfReader(f)
page1 = pdfreader.pages[1].extract_text()
headers, data = extract_table(page1)
write_csv(headers, data, csvfile)
if __name__ == "__main__":
pdf = sys.argv[1]
csvfile = sys.argv[2]
extract_table_from_pdf(pdf, csvfile)Overwriting extract_table.py
some useful commands that you will need inside debugger - l # print the program with line number - h # show help … it will show all possible commnad - h b # show help for command b - b 34 # this sets breakpoint at line no. 34 - c # continue till next breakpoint - n # executye current line and go to next line and stop - p var # this prints the var in context
22.4 Packaging your python code
22.4.1 pyhton script, instructions, requirements.txt
- extract_table.py
- requirements.txt
- README.md
22.4.2 Creating executable using pyinstaller
You will need third party library pyinstaller
!pip install pyinstaller6033.94s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
Defaulting to user installation because normal site-packages is not writeable Collecting pyinstaller Downloading pyinstaller-6.19.0-py3-none-manylinux2014_x86_64.whl.metadata (8.5 kB) Collecting altgraph (from pyinstaller) Downloading altgraph-0.17.5-py2.py3-none-any.whl.metadata (7.5 kB) Requirement already satisfied: packaging>=22.0 in /opt/tljh/user/lib/python3.12/site-packages (from pyinstaller) (24.1) Collecting pyinstaller-hooks-contrib>=2026.0 (from pyinstaller) Downloading pyinstaller_hooks_contrib-2026.1-py3-none-any.whl.metadata (16 kB) Requirement already satisfied: setuptools>=42.0.0 in /opt/tljh/user/lib/python3.12/site-packages (from pyinstaller) (74.1.2) Downloading pyinstaller-6.19.0-py3-none-manylinux2014_x86_64.whl (741 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 741.4/741.4 kB 10.8 MB/s eta 0:00:00 Downloading pyinstaller_hooks_contrib-2026.1-py3-none-any.whl (452 kB) Downloading altgraph-0.17.5-py2.py3-none-any.whl (21 kB) Installing collected packages: altgraph, pyinstaller-hooks-contrib, pyinstaller WARNING: The scripts pyi-archive_viewer, pyi-bindepend, pyi-grab_version, pyi-makespec, pyi-set_version and pyinstaller are installed in '/home/jupyter-vikrant/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. Successfully installed altgraph-0.17.5 pyinstaller-6.19.0 pyinstaller-hooks-contrib-2026.1
Following command will create executable for your python program
pyinstaller -F extract_table.py
this will create an executable with name extract_table inside dist folder. This executable can be distributed which users can use directly as commnad line tool.
22.5 Python package
Python pacakge has some folder structure. Here is one sample
You will need to follow some folder structure
tableA
|
|--setup.py
|--requirements.txt
+-A
|
|--__init__.py
|-- extract_table.py
+-B
|
|-- __init__.py
|-- stats.py
!mkdir tableA
!mkdir tableA/A
!mkdir tableA/A/B6720.97s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
6726.09s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
6731.22s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
!touch tableA/A/__init___.py
!touch tableA/A/B/__init___.py6781.43s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
6786.56s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
%%file tableA/A/extract_tableA.py
import pypdf
import csv
import sys
def extract_table(page):
lines = page.split("\n")
tableA = lines[:9] # take first 9 lines
headers = tableA[0].split()
data = [line.strip().split() for line in tableA[1:]]
return headers, data
def write_csv(headers, data, filename):
with open(filename, "w") as f:
csvf = csv.writer(f)
csvf.writerow(headers)
for row in data:
csvf.writerow(row)
def extract_table_from_pdf(pdffile, csvfile):
with open(pdffile, "rb") as f:
pdfreader = pypdf.PdfReader(f)
page1 = pdfreader.pages[1].extract_text()
headers, data = extract_table(page1)
write_csv(headers, data, csvfile)
if __name__ == "__main__":
pdf = sys.argv[1]
csvfile = sys.argv[2]
extract_table_from_pdf(pdf, csvfile)Writing tableA/A/extract_tableA.py
%%file tableA/A/B/stats.py
def mean(nums):
pass
def std(nums):
passWriting tableA/A/B/stats.py
%%file tableA/setup.py
from distutils.core import setup
setup(
name="tableA",
version='1.0',
description="A sample package",
author="Vikrant",
author_email="sads@sdsa.com",
url="https://somesample.web.com",
packages=['A','A.B'],
install_requires=[
'pypdf',
'pandas'
],
)Overwriting tableA/setup.py
%%file tableA/reqirements.txt
pypdf
pandasWriting tableA/reqirements.txt