Print the line starting with >YP
from short.sequ.txt
, and count the number of characters from the next line until the space is recognized.
I would like to repeat this task to finally print the >YP
line for the maximum and maximum characters.
I want to update max_len and max_protein whenever I find the maximum value.(Minimum value is the same)
I created it using the global variable, but if possible, I would like to write this kind of code without using the global variable.
In that case, I don't know what to change the global variable to, so please let me know.
Thank you for your cooperation.
with open("short.sequ.txt") asf:
max_len = 0
max_protein=""
min_len=1000000000000000
min_protein=""
def change (protein, seq):
global max_len, max_protain, min_len, min_protain
seq_len=len(seq)
if seq_len == 0:
return
else:
print([seq_len])
if seq_len>max_len:
max_len, max_protain, min_len, min_protain=seq_len, protein, seq_len, protein
if seq_len<min_len:
min_len, min_protein = seq_len, protein
a_line=""
a_seq=""
for line inf:
strip_line=line.rstrip()
if strip_line.startswith(">"):
a_line = strip_line
change(a_line,a_seq)
a_seq=""
print( strip_line + '\n')
else:
# A = [len(a_seq+trip_line)]
a_seq+=strip_line
msg1 = "Maximum amino acid entry:"
msg2 = "Length:"
msg3 = "Minimum amino acid entry:"
print(msg1,max_protein,msg2,max_len)
print(msg3,min_protein,msg2,min_len)
short.sequ.txt
>YP_009518834.1 passive uncharacterized protein YjiT [Escherichia colistr.K-12substr.MG1655]
MGQSEYISWVKCTSWLSNFVNLRGRQPDGRPLYYHATNDEYTQLTQLLRAVGQSQSNICNRDFAACFV
LFCSEWYRRDYERQCGWTWDPIYKKIGISFTATELGTIVPKGMEDYWLRPIRFYESERRNFLGTLFSEGG
LPFRLKESDSRFLAVFSRILGQYEQAKQSGFSALSLAVIEKSALPTVFSEDTSVELISHMADNLNSL
VLTHNLINHKEPVQQLEKVHPTWRSEFPIPLDETGTHFLNGLLCAASVEAKPRLQKNKSTRCQFYWSEK
HPDELRVIVSLPDEVSFPVTSEPSTTRFELAICEDGEEVSGLGPAYLENRQATVRLRKSEVRFGRQNP
SAGLSLVARAGGMIVGSIKLDSEIAIGEVPLTFIVDADQWLLQGQASCSVRSSDVLIVLPRDNSNVAGF
DGQSRAVNVLGLKALPVKGCQDVTVTANETYRRITGREQISIGRFAALNGKRASWVCHPDETFIGVPKVIS
TLPDIQSIDVTRYTC
>YP_009518833.1 uncharacterized protein YtiA [Escherichia colistr.K-12 substr.MG1655]
MKEFLFLFHSTVGVIQTRKALQAAGMTFRVSDIPRDLRGGCGLCIWLTCPGEEIQWVIPGLTESIYCQQ
DGVWRCIAHYGVSPR
>YP_009518832.1iraD leader peptide [Escherichia colistr.K-12substr.MG1655]
MENEHQYSGARCSGQAAYVAKRQECAK
>YP_009518831.1 protein YtiD [Escherichia colistr.K-12substr.MG1655]
MADYAEINNPPELSSSGDKYFHLRNYSEYSEYTSGFFLSLMIFIKS
>YP_009518830.1 protein YtiC [Escherichia colistr.K-12substr.MG1655]
MPVNGIFDVFDMLSIYIIYKLIVSNNNTWLIMRK
>YP_009518829.1 passive YjfA [Escherichia colistr.K-12substr.MG1655]
MHMVTYPCLTSRRFQLALIHRVDKRTSMHSRTASESTGARIHRPWCARHQVRPAWRCQYDKLHRVPR
SPELRLDSGPGYTTGSYRY
Below, we divide the contents of short.sequ.txt
by empty lines (\n\n
) and sort them by the length of the individual protein sequence.
import re
msg1 = "Maximum amino acid entry:"
msg2 = "Length:"
msg3 = "Minimum amino acid entry:"
with open("short.sequ.txt") as f:
assoc = {}
for pin f.read().split("\n\n"):
if not re.match('^>YP_',p): continue
arr = p.split("\n")
assoc[arr[0]]=sum(map(len,arr[1:]))
sa=sorted (assoc.items(), key=lambdax:x[1])
print('{}{}\n{}{}.format(msg1, sa[-1][0], msg2, sa[-1][1]))
print('{}{}\n{}{}.format(msg3, sa[0][0], msg2, sa[0][1]))
=>
Maximum amino acid entries: >YP_009518834.1 passive uncharacterized protein YjiT [Escherichia colistr.K-12 substr.MG1655]
Length: 505
Minimum amino acid entry: > YP_009518832.1iraD leader peptide [Escherichia colistr.K-12 substr.MG1655]
Length: 27
© 2023 OneMinuteCode. All rights reserved.