Comment: The singleton mode means that a class can only create a unique instance. This topic appears very frequently in interviews, because it examines not only the singleton mode, but also the Python language To what extent you have mastered it, I suggest you use decorators and metaclasses to implement the singleton mode, because these two methods are the most versatile, and you can also show your knowledge of the two of decorators and metaclasses by the way. Understanding of key knowledge points.
Method 1: Use decorators to implement singleton mode.
from functools import wraps
def singleton(cls):
"""Singleton Decorator"""
instances = {}
@wraps(cls)
def wrapper(*args, **kwargs):
if cls not in instances:
instances[cls] = cls(*args, **kwargs)
return instances[cls]
return wrapper
@singleton
class President:
pass
Extensions: Decorator is a very distinctive grammar in Python. Use a function to decorate another function or class, adding extra capabilities to it. The functions that are usually implemented through decoration are crosscutting functions, that is, functions that are not necessarily related to normal business logic and can be dynamically added or removed. Decorator can provide services such as caching, proxy, context environment and so on for the code. It is a practice of the proxy mode in the design pattern. When writing decorators, functions with decoration functions (the
wrapper
function in the code above) are usually decorated withwraps
in thefunctools
module. The most important role of this decorator is to be decorated A class or function dynamically adds a__wrapped__
attribute, which will retain the class or function before being decorated, so that when we don’t need the decoration function, we can use it to cancel the decorator. For example, we can usePresident = President .__wrapped__
to cancel the singleton handling of thePresident
class. What needs to be reminded is that the above singleton is not thread-safe. If you want to be thread-safe, you need to lock the code that creates the object. In Python, theRLock
object of thethreading
module can be used to provide locks, and theacquire
andrelease
methods of the lock object can be used to implement locking and unlocking operations. Of course, a more convenient way is to use thewith
context syntax of the lock object to perform implicit lock and unlock operations.
Method 2: Use metaclasses to implement singleton mode.
class SingletonMeta(type):
"""Custom Singleton Metaclass"""
def __init__(cls, *args, **kwargs):
cls.__instance = None
super().__init__(*args, **kwargs)
def __call__(cls, *args, **kwargs):
if cls.__instance is None:
cls.__instance = super().__call__(*args, **kwargs)
return cls.__instance
class President(metaclass=SingletonMeta):
pass
Extensions: Python is an object-oriented programming language. In an object-oriented world, everything is an object. Objects are created through classes, and classes themselves are objects, and objects like classes are created through metaclasses. When we define a class, if no parent is specified for a class, the default parent is
object
, and if no metaclass is specified for a class, then the default metaclass istype
. Through a custom metaclass, we can change the default behavior of a class, just like in the above code, we changed the constructor of thePresident
class through the__call__
magic method of the metaclass.
Supplement: Regarding the singleton mode, you may be asked about its application scenarios in the interview. Usually the state of an object is shared by other objects, you can design it as a singleton. For example, the database connection pool objects and configuration objects used in the project are usually singletons, so as to ensure that the database connection and data obtained everywhere The configuration information is completely consistent; and because the object has only a unique instance, the time and space overhead caused by repeatedly creating the object is fundamentally avoided, and the multiple occupation of resources is also avoided. For another example, the log operation in the project usually uses the singleton mode. This is because the shared log file is always open and there can only be one instance to operate it, otherwise it will cause confusion when writing the log.
Remarks: A typical problem of giving people a head, usually an intermediate variable is needed to exchange two variables. If the intermediate variable is not allowed, the exclusive OR operation can be used in other programming languages to exchange the two variables. The value of, but there is a simpler and clearer way in Python.
method one:
a = a ^ b
b = a ^ b
a = a ^ b
Method Two:
a, b = b, a
Extension: It should be noted that this approach of
a, b = b, a
is not actually tuple unpacking, although many people think so. There areROT_TWO
instructions in Python bytecode instructions to support this operation, and similarlyROT_THREE
, for more than 3 elements, such asa, b, c, d = b, c, d, a
, It will be used to create tuples and unpack tuples. If you want to know the bytecode instructions corresponding to your code, you can use thedis
function of thedis
module in the Python standard library to disassemble your Python code.
Question 003: Write a function to delete duplicate elements in the list, requiring the relative position of the elements to remain unchanged after deduplication.
Remarks: This question often appears in the junior/intermediate Python job interviews. The question originated from the 10th question in the first chapter of the book "Python Cookbook". Many interview questions are actually on this book. So I suggest you have time to study this book.
def dedup(items):
no_dup_items = []
seen = set()
for item in items:
if item not in seen:
no_dup_items.append(item)
seen.add(item)
return no_dup_items
If you want, you can also transform the above function into a generator, the code is shown below.
def dedup(items):
seen = set()
for item in items:
if item not in seen:
yield item
seen.add(item)
Extension: Since the bottom layer of the collection in Python uses hash storage, the
in
andnot in
member operations of the collection are far better than the list in performance, so the above code we use the collection to save Elements that have already appeared. The elements in the collection must behashable
objects, so the above code will be invalid when the list element is not ahashable
object. To solve this problem, you can add a parameter to the function, which can be designed to return a hash code or a hashable `Function of the object.
Subject 004: Assuming that you are using the official CPython, tell the results of the following code.
Remarks: The following program has no meaning for actual development, but it is a big pit in CPython. This question aims to examine how well the interviewer knows about the official Python interpreter.
a, b, c, d = 1, 1, 1000, 1000
print(a is b, c is d)
def foo():
e = 1000
f = 1000
print(e is f, e is d)
g = 1
print(g is a)
foo()
operation result:
True False
True False
True
The result of a is b
in the above code is True
but the result of c is d
is False
, which is really puzzling. For performance optimization considerations, the CPython interpreter caches frequently used integer objects in an object pool called small_ints
. The integer value of the small_ints
cache is set to the range of [-5, 256]
, that is to say, in any place where these integers are referenced, there is no need to recreate the int
object, but directly reference the buffer pool In the object. If the integer is not in this range, then even if the values of the two integers are the same, they are different objects.
In order to further improve performance, the bottom layer of CPython has made another setting. For integers whose value is not in the cache range of small_ints
in the same code block, if there is already an integer object with the same value in the same code block, then directly Refer to the object, otherwise create a new int
object. It should be noted that this rule applies to numeric types, but for strings, the length of the string needs to be considered. You can prove this by yourself.
Extensions: If you use PyPy (another Python interpreter implementation, support JIT, improve the shortcomings of CPython, it is better than CPython in performance, but the support for third-party libraries is slightly worse) to run the above The code, you will find that all the outputs are True.
Comment: This topic mainly wants to examine the application scenarios of Lambda functions. The subtext is to ask whether you have used Lambda functions in your project, and in what scenarios will you use Lambda functions to judge you Ability to write code. Because Lambda functions are usually used in higher-order functions, the main function is to finally realize the decoupling of the code by passing the function to the function or letting the function return to the function.
Lambda function is also called anonymous function. It is a small function that can be implemented with a single line of code. The Lambda function in Python can only write one expression, and the execution result of this expression is the return value of the function, without writing the return
keyword. Because Lambda functions don't have names, they won't have naming conflicts with other functions.
Extension: During the interview, you may be asked to use Lambda functions to implement some functions, that is, use one line of code to implement the function required by the topic, for example: use one line of code to implement the factorial function, and use one line of code to implement Find the function of the greatest common divisor, etc.
fac = lambda x: __import__('functools').reduce(int.__mul__, range(1, x + 1), 1) gcd = lambda x, y: y% x and gcd(y% x, x) or x
In fact, the main purpose of Lambda function is to pass a function into another higher-order function (such as Python built-in filter
, map
, etc.) to decouple the function and enhance the flexibility and versatility of the function. The following example uses the filter
and map
functions to filter out odd numbers from the list and square them to form a new list. Because higher-order functions are used, the rules for filtering and mapping data are the callers of the function. It is passed in through another function, so the filter
and map
functions are not coupled with specific filtering and mapping data rules.
items = [12, 5, 7, 10, 8, 19]
items = list(map(lambda x: x ** 2, filter(lambda x: x% 2, items)))
print(items) # [25, 49, 361]
Extension: It will be simpler and clearer to implement the above code with the generation of the list, the code is as follows.
items = [12, 5, 7, 10, 8, 19] items = [x ** 2 for x in items if x% 2] print(items) # [25, 49, 361]
Comment: The frequency of the topic itself is very high, but there is no technical content in terms of the topic. For this kind of interview question, when answering, you must make your answer exceed the interviewer’s expectations, so that you can get a better impression. So the point to answer this question is not only to be able to tell the difference between shallow copy and deep copy, the two major problems that may be encountered during deep copy, but also to tell the Python standard library's support for shallow copy and deep copy, and then you can Talk about how lists and dictionaries implement copy operations and how to implement deep copying through serialization and deserialization. Finally, you can also mention the prototype pattern in the design pattern and its application in the project.
Shallow copy usually only copies the object itself, while deep copy not only copies the object, but also recursively copy the objects associated with the object. Deep copy may encounter two problems: one is that if an object directly or indirectly references itself, it will cause endless recursive copies; the other is that deep copy may also copy data originally designed to be shared by multiple objects. Python implements shallow copy and deep copy operations through the copy
and deepcopy
functions in the copy
module. deepcopy
can save the copied objects through the memo
dictionary, thereby avoiding the self Reference recursion; in addition, you can customize the copy behavior of a specified type of object through the pickle
function of the copyreg
module.
The essence of the deepcopy
function is actually one-time serialization and one-time return serialization of the object. In the interview questions, we also tested the deep copy operation of the object with a custom function. Obviously we can use the dumps
and of the
pickle module. loads
to do it, the code is as follows.
import pickle
my_deep_copy = lambda obj: pickle.loads(pickle.dumps(obj))
The slicing operation [:]
of the list is equivalent to the shallow copy of the list object, and the copy
method of the dictionary can realize the shallow copy of the dictionary object. Object copy is actually a faster way to create objects. In Python, the creation of an object through the constructor is a two-stage construction, the first is the allocation of memory space, and then the initialization. When creating an object, we can also create a new object based on the "prototype" object. The creation and initialization of the object are completed by copying the prototype object (copy memory). This approach is more efficient. This is the design pattern. Prototype mode. In Python, we can implement the prototype mode through metaclasses. The code is shown below.
import copy
class PrototypeMeta(type):
"""The metaclass of the prototype pattern"""
def __init__(cls, *args, **kwargs):
super().__init__(*args, **kwargs)
# Bind the clone method to the object to realize the object copy
cls.clone = lambda self, is_deep=True: \
copy.deepcopy(self) if is_deep else copy.copy(self)
class Person(metaclass=PrototypeMeta):
pass
p1 = Person()
p2 = p1.clone() # deep copy
p3 = p1.clone(is_deep=False) # shallow copy
Comment: When the interviewer asks this question, an opportunity to show oneself is in front of you. You have to ask the interviewer first: "Are you talking about the official CPython interpreter?". This rhetorical question can show that you know the different implementation versions of the Python interpreter, and you also know that the interviewer wants to ask about CPython. Of course, many interviewers have no idea about the differences between the underlying implementations of different Python interpreters. Therefore, don't think that the interviewer must be better than you. With this confidence, you can complete the interview better.
Python provides automatic memory management, that is to say, the allocation and release of memory space are automatically performed by the Python interpreter at runtime. The automatic memory management function greatly reduces the programmer's workload and can also help the programmer in To some extent solve the problem of memory leaks. Taking the CPython interpreter as an example, its memory management has three key points: reference counting, tag cleaning, and generational collection.
Reference Counting: For the CPython interpreter, every object in Python is actually a PyObject
structure, which has a reference counter member variable named ob_refcnt
inside. When the program is running, the value of ob_refcnt
will be updated to reflect how many variables refer to the object. When the object's reference count value is 0, its memory will be released.
typedef struct _object {
_PyObject_HEAD_EXTRA
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObject;
The following conditions will cause the reference count to increase by 1:
-Object is created -Object is referenced -The object is passed into a function as a parameter -The object is stored as an element in a container
The following conditions will cause the reference count to decrement by 1
:
-Use del
statement to display deleted object references
-Object references are reassigned to other objects
-An object leaves its scope
-The container holding the object itself is destroyed
-The container holding the object deletes the object
The reference count of the object can be obtained through the getrefcount
function of the sys
module. The memory management method of reference counting will be fatal when it encounters circular references, so other garbage collection algorithms are needed to supplement it.
Mark cleanup: CPython uses the "Mark and Sweep" algorithm to solve the circular reference problem that may arise from container types. The algorithm is divided into two phases during garbage collection: marking phase, traversing all objects, if the object is reachable (referenced by other objects), then mark the object as reachable; cleaning phase, traversing the object again, if found If an object is not marked as reachable, it will be recycled. The bottom layer of CPython maintains two double-ended linked lists. One linked list stores the container objects that need to be scanned (let's call it linked list A), and the other linked list stores temporary unreachable objects (let's call it linked list B). In order to implement the "mark-clean" algorithm, in addition to the ref_count
variable that records the current reference count, each node in the linked list also has a gc_ref
variable. This gc_ref
is a copy of ref_count
, so the initial The value is the size of ref_count
. When performing garbage collection, it first traverses the nodes in linked list A, and subtracts 1
from the gc_ref
of all objects referenced by the current object. The main function of this step is to remove the influence of circular references on the reference count. Traverse the nodes in linked list A again, if the node's gc_ref
value is 0
, then this object is marked as "temporarily unreachable" (GC_TENTATIVELY_UNREACHABLE
) and moved to linked list B; if the node's gc_ref
If it is not
0, then the object will be marked as "reachable" (
GC_REACHABLE`). For the "reachable" object, the node that can be reached by the node is also recursively marked as "reachable"; The nodes marked as "reachable" in linked list B must be put back into linked list A. After traversing twice, the nodes in the linked list B are the nodes that need to release the memory.
Generational Collection: During the recycling of circularly referenced objects, the entire application will be suspended. In order to reduce the pause time of the application, Python improves the efficiency of garbage collection through generational collection (space for time). The basic idea of generational collection is: the longer an object exists, the less likely it is to be garbage, and such objects should not be garbage collected as much as possible. CPython divides objects into three generations and denoted them as 0
, 1
, and 2
. Each new object is in the 0
generation. If the object survives a round of garbage collection scans, then It will be moved to the 1
generation. Objects that exist in the 1
generation will be less scanned by garbage collection; if the garbage collection scan is performed on the 1
generation, this object will survive again , Then it will be moved to the second generation, where it will be scanned fewer times by garbage collection. The threshold of the generational collection scan can be obtained by the get_threshold
function of the gc
module. This function returns a triplet indicating how many memory allocation operations will be performed after the generation of the 0
garbage collection, and how many times After the 0
generation garbage collection, the 1
generation garbage collection will be executed, and how many times the 1
generation garbage collection will be executed after the 2
generation garbage collection. It should be noted that if a 2
generation of garbage collection is performed once, then the younger generation will perform garbage collection. If you want to modify these thresholds, you can do it through the set_threshold
function of the gc
module.
Remarks: Many interviewees will write iterators and generators, but they cannot explain exactly what iterators and generators are. If you have the same confusion, you can refer to the answer below.
Iterators are objects that implement the iterator protocol. Unlike other programming languages, there are no keywords used to define protocols or express conventions in Python. Words such as interface
and protocol
are not in the keyword list of the Python language. The Python language expresses the agreement through magic methods, which is what we call the agreement, and the two magic methods __next__
and __iter__
represent the iterator protocol. You can use the for-in
loop to fetch the value from the iterator object, or you can use the next
function to fetch the next value in the iterator object. Generator is a syntactic upgrade version of iterator, and it can be implemented with simpler code.
Extension: In interviews, I often ask to write iterators that generate Fibonacci numbers. You can refer to the following code.
class Fib(object): def __init__(self, num): self.num = num self.a, self.b = 0, 1 self.idx = 0 def __iter__(self): return self def __next__(self): if self.idx <self.num: self.a, self.b = self.b, self.a + self.b self.idx += 1 return self.a raise StopIteration()If you use the generator syntax to rewrite the above code, the code will be much simpler and more elegant.
def fib(num): a, b = 0, 1 for _ in range(num): a, b = b, a + b yield a
Topic 009: What is the difference between the match method and the search method of regular expressions?
Comments: Regular expressions are an important tool for string processing, so they are also knowledge points frequently examined in interviews. In Python, there are two ways to use regular expressions, one is to directly call the function in the
re
module, passing in the regular expression and the string to be processed; the other is to pass thecompile of the
remodule first
The function creates a regular expression object, and then calls the method through the object and passes in the string to be processed. If a regular expression is frequently used, we recommend using there.compile
function to create a regular expression object, which will reduce the overhead caused by frequently compiling the same regular expression.
The match
method is to perform regular expression matching from the beginning of the string and return a Match
object or None. The search
method scans the entire string to find a matching pattern, and also returns a Match object or None.
def multiply():
return [lambda x: i * x for i in range(4)]
print([m(100) for m in multiply()])
operation result:
[300, 300, 300, 300]
The running result of the above code can easily be misjudged as [0, 100, 200, 300]
. The first thing to note is that the multiply
function uses generative syntax to return a list. The list stores 4 Lambda functions. These 4 Lambda functions will return the result of multiplying the passed parameter by the i
. It should be noted that there is a closure phenomenon. The life cycle of the local variable i
in the multiply
function is extended. Since the final value of i
is 3
, it passes m(100)
When the Lambda function in the list is called, it will return 300
, and this is the case for 4 calls.
If you want to get the result of [0, 100, 200, 300]
, you can modify the multiply
function in the following ways.
Method 1: Use a generator to let the function get the current value of i
.
def multiply():
return (lambda x: i * x for i in range(4))
print([m(100) for m in multiply()])
or
def multiply():
for i in range(4):
yield lambda x: x * i
print([m(100) for m in multiply()])
Method 2: Use partial functions to avoid closures completely.
from functools import partial
from operator import __mul__
def multiply():
return [partial(__mul__, i) for i in range(4)]
print([m(100) for m in multiply()])
Review: C++, Java, C# and many other programming languages support function overloading. The so-called function overloading refers to multiple functions with the same name in the same scope, and they have different parameter lists (parameters). The number is different or the parameter type is different or both are different), can be distinguished from each other. Overloading is also a kind of polymorphism, because the number and type of parameters are usually used to determine which overloaded function to call at compile time, so it is also called compile-time polymorphism or pre-binding. The subtext of this question is actually to ask the interviewer whether he has experience in other programming languages, whether he understands that Python is a dynamically typed language, and whether he knows the concepts of variable parameters and keyword parameters of functions in Python.
First of all, Python is an interpreted language, and function overloading usually occurs in compiled languages. Secondly, Python is a dynamically typed language. There are no type constraints on the parameters of a function, so it is impossible to distinguish overloads based on parameter types. In addition, the parameters of functions in Python can have default values, and can use variable and keyword parameters. Therefore, even if there is no function overloading, a function can behave differently according to the parameters passed in by the caller.
Remarks: This topic may seem simple, but in fact it is a comparative study of the interviewer's skills. Because Python's built-in
max
function can not only pass in an iterable object to find the largest, but also pass in two or more parameters to find the largest; the most important thing is that you can also specify one by naming the keyword parameterkey
The function used for element comparison can also use thedefault
named keyword parameter to specify the default value returned when the iterable is empty.
The following code is for reference only:
def my_max(*args, key=None, default=None):
"""
Get the largest element in the iterable object or the largest element in two or more actual parameters
:param args: an iterable object or multiple elements
:param key: The function to extract the feature value for element comparison, the default is None
:param default: If the iterable object is empty, the default value is returned, if no default value is given, a ValueError exception is raised
:return: returns the largest element of the iterable object or multiple elements
"""
if len(args) == 1 and len(args[0]) == 0:
if default:
return default
else:
raise ValueError('max() arg is an empty sequence')
items = args[0] if len(args) == 1 else args
max_elem, max_value = items[0], items[0]
if key:
max_value = key(max_value)
for item in items:
value = item
if key:
value = key(item)
if value> max_value:
max_elem, max_value = item, value
return max_elem
Topic 013: Write a function to count the number of occurrences of each number in the incoming list and return the corresponding dictionary.
Comments: The subject of giving away the head, no explanation.
def count_letters(items):
result = {}
for item in items:
if isinstance(item, (int, float)):
result[item] = result.get(item, 0) + 1
return result
You can also directly use the Counter
class of the collections
module in the Python standard library to solve this problem. Counter
is a subclass of dict
, which uses each element in the incoming sequence as a key, and the element appears The number of times is used as the value to construct the dictionary.
from collections import Counter
def count_letters(items):
counter = Counter(items)
return {key: value for key, value in counter.items() \
if isinstance(key, (int, float))}
Comments: Basically it is also a question of giving people a head. As long as you have used the
os
module, you should know how to do it.
The walk
function of the os
module of the Python standard library provides the function of traversing a folder, and it returns a generator.
import os
g = os.walk('/Users/Hao/Downloads/')
for path, dir_list, file_list in g:
for dir_name in dir_list:
print(os.path.join(path, dir_name))
for file_name in file_list:
print(os.path.join(path, file_name))
Explanation: The
os.path
module provides many tool functions for path operations, which are often used in project development. If the topic clearly requires that theos.walk
function cannot be used, then theos.listdir
function can be used to obtain the files and folders in the specified directory, and then theos.isdir
function can be used to determine which folders are through the loop traversal , For folders, you can traverse through recursive call, which can also realize the operation of traversing a folder.
Question 015: There are currently three denominations of 2 yuan, 3 yuan, and 5 yuan. If you need to change 99 yuan, how many ways are there?
Comment: There is also a very similar question: "A kid can walk 1 step, 2 steps or 3 steps at a time when walking up stairs. How many ways are there after 10 steps?" , The idea of these two topics is the same, if you use recursive function to write it is very simple.
from functools import lru_cache
@lru_cache()
def change_money(total):
if total == 0:
return 1
if total <0:
return 0
return change_money(total-2) + change_money(total-3) + \
change_money(total-5)
Explanation: In the above code, we decorate the recursive function
change_money
with thelru_cache
decorator. If this optimization is not done, the asymptotic time complexity of the above code will be$O(3^ N)$ , and if the value of the parametertotal
is99
, the amount of calculation is very huge. Thelru_cache
decorator caches the execution result of the function, so that it can reduce the overhead caused by repeated operations. This is a space-for-time strategy, and it is also a programming idea of dynamic programming.
For example: n = 2, return:
1 2 4 3
For example: n = 3, return:
1 2 3 8 9 4 7 6 5
The subject itself is not complicated, the code below is for reference only.
def show_spiral_matrix(n):
matrix = [[0] * n for _ in range(n)]
row, col = 0, 0
num, direction = 1, 0
while num <= n ** 2:
if matrix[row][col] == 0:
matrix[row][col] = num
num += 1
if direction == 0:
if col <n-1 and matrix[row][col + 1] == 0:
col += 1
else:
direction += 1
elif direction == 1:
if row <n-1 and matrix[row + 1][col] == 0:
row += 1
else:
direction += 1
elif direction == 2:
if col> 0 and matrix[row][col-1] == 0:
col -= 1
else:
direction += 1
else:
if row> 0 and matrix[row-1][col] == 0:
row -= 1
else:
direction += 1
direction %= 4
for x in matrix:
for y in x:
print(y, end='\t')
print()
items = [1, 2, 3, 4]
print([i for i in items if i> 2])
print([i for i in items if i% 2])
print([(x, y) for x, y in zip('abcd', (1, 2, 3, 4, 5))])
print({x: f'item{x ** 2}' for x in (2, 4, 6)})
print(len({x for x in'hello world' if x not in'abcdefg'}))
Review: Generative (derivative) is one of the characteristic grammars of Python and is almost a must-test content for interviews. Python can create lists, sets, and dictionaries through generative literal syntax.
[3, 4]
[1, 3]
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]
{2:'item4', 4:'item16', 6:'item36'}
6
class Parent:
x = 1
class Child1(Parent):
pass
class Child2(Parent):
pass
print(Parent.x, Child1.x, Child2.x)
Child1.x = 2
print(Parent.x, Child1.x, Child2.x)
Parent.x = 3
print(Parent.x, Child1.x, Child2.x)
Comment: Running the above code first outputs
1 1 1
, you should have no doubt about this. Next, rebind the propertyx
to the classChild1
throughChild1.x = 2
and assign it to2
, soChild1.x
will output2
, andParent
andChild2
Not affected. ExecutingParent.x = 3
will re-assign thex
property of theParent
class to3
. Since thex
property ofChild2
inherits fromParent
, the value ofChild2.x
is also3
; And before we rebind thex
property forChild1
, then itsx
property value will not be affected byParent.x = 3
, but the previous value2
.
1 1 1
1 2 1
3 2 3
Comment: There are many modules in the Python standard library. It is recommended that you introduce the standard library and tripartite libraries you have used based on your past project experience, because these are the ones you are most familiar with and can withstand the interviewer's depth. Dug.
Module name | Introduction |
---|---|
sys | Variables and functions related to the Python interpreter, for example: sys.version , sys.exit() |
os | Functions related to the operating system, for example: os.listdir() , os.remove() |
re | Functions related to regular expressions, for example: re.compile() , re.search() |
math | Functions related to mathematical operations, for example: math.pi , math.e , math.cos |
logging | Classes and functions related to the logging system, for example: logging.Logger , logging.Handler |
json / pickle | A module that implements object serialization and deserialization, for example: json.loads , json.dumps |
hashlib | A module that encapsulates a variety of hash digest algorithms, for example: hashlib.md5 , hashlib.sha1 |
urllib | Contains URL-related sub-modules, such as: urllib.request , urllib.parse |
itertools | Modules that provide various iterators, for example: itertools.cycle , itertools.product |
functools | Function related tool modules, for example: functools.partial , functools.lru_cache |
collections / heapq | A module that encapsulates common data structures and algorithms, for example: collections.deque |
threading / multiprocessing | Multithreading/multiprocessing related classes and function modules, for example: threading.Thread |
concurrent.futures / asyncio | Concurrent programming/asynchronous programming related classes and function modules, for example: ThreadPoolExecutor |
base64 | A module that provides BASE-64 encoding related functions, for example: bas64.encode |
csv | Modules related to reading and writing CSV files, for example: csv.reader , csv.writer |
profile / cProfile / pstats | Modules related to code performance analysis, for example: cProfile.run , pstats.Stats |
unittest | Modules related to unit testing, for example: unittest.TestCase |
Calling the constructor to create an object in Python is a two-stage construction process. First, the __new__
method is executed to obtain the memory space needed to save the object, and then the filling of the memory space data (initialization of object attributes) is performed through __init__
. The return value of the __new__
method is the created Python object (reference), and the first parameter of the __init__
method is this object (reference), so the initialization of the object can be completed in __init__
. __new__
is a class method, its first parameter is a class, and __init__
is an object method, and its first parameter is an object.
Method 1: Do not use the modules and functions in the standard library.
def is_leap_year(year):
"""Judge whether the specified year is a leap year, return False for normal years, and return True for leap years"""
return year% 4 == 0 and year% 100 != 0 or year% 400 == 0
def which_day(year, month, date):
"""Calculate the date passed in is the day of the year"""
# Use nested lists to save the days of each month in normal years and leap years
days_of_month = [
[31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31],
[31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
]
days = days_of_month[is_leap_year(year)][:month-1]
return sum(days) + date
Method 2: Use the datetime
module in the standard library.
import datetime
def which_day(year, month, date):
end = datetime.date(year, month, date)
start = datetime.date(year, 1, 1)
return (end-start).days + 1
Review: Static code analysis tools can extract various static properties from the code, which allows developers to have a better understanding of the complexity, maintainability and readability of the code, as mentioned here Static properties include:
- Does the code comply with coding standards, for example: PEP-8.
- Potential problems in the code, including: syntax errors, indentation problems, missing imports, variable coverage, etc.
- Bad smell in the code.
- The complexity of the code.
- The logic of the code.
Pylint and Flake8 are mainly used in static code analysis at work. Pylint can check out problems such as code errors, bad smells, and irregular codes. The newer version also provides code complexity statistics and can generate inspection reports. Flake8 encapsulates Pyflakes (check code logic errors), McCabe (check code complexity) and Pycodestyle (check whether the code complies with the PEP-8 specification) tools, which can perform the checks provided by these three tools.
Comment: Magic method is also called magic method, which is a characteristic grammar in Python, and it is also a high-frequency question in interviews.
Magic method | Function |
---|---|
__new__ , __init__ , __del__ |
Related to creating and destroying objects |
__add__ , __sub__ , __mul__ , __div__ , __floordiv__ , __mod__ |
Arithmetic operator related |
__eq__ , __ne__ , __lt__ , __gt__ , __le__ , __ge__ |
Relational operator related |
__pos__ , __neg__ , __invert__ |
Unary operator related |
__lshift__ , __rshift__ , __and__ , __or__ , __xor__ |
Bitwise operation related |
__enter__ , __exit__ |
Context Manager Protocol |
__iter__ , __next__ , __reversed__ |
Iterator Protocol |
__int__ , __long__ , __float__ , __oct__ , __hex__ |
Type/base conversion related |
__str__ , __repr__ , __hash__ , __dir__ |
Object representation related |
__len__ , __getitem__ , __setitem__ , __contains__ , __missing__ |
Sequence related |
__copy__ , __deepcopy__ |
Object copy related |
__call__ , __setattr__ , __getattr__ , __delattr__ |
Other magic methods |
In Python, function parameters are divided into positional parameters, variable parameters, keyword parameters, and named keyword parameters. *args
represents variable parameters, which can receive 0
or any number of parameters. When you are not sure how many positional parameters the caller will pass in, you can use variable parameters, which will pack the incoming parameters Into a tuple. **kwargs
stands for keyword parameters, which can receive parameters passed in as parameter name=parameter value
, and the passed parameters will be packaged into a dictionary. If you use both *args
and **kwargs
when defining a function, then the function can receive any parameters.
Comments: High-frequency interview questions, also the simplest decorator, what the interviewer must master.
Method 1: Use functions to implement decorators.
from functools import wraps
from time import time
def record_time(func):
@wraps(func)
def wrapper(*args, **kwargs):
start = time()
result = func(*args, **kwargs)
print(f'{func.__name__} execution time: (time()-start} seconds')
return result
return wrapper
Method 2: Implement decorators with classes. The class has the magic method of __call__
. The object of this class is a callable object and can be used as a decorator.
from functools import wraps
from time import time
class Record:
def __call__(self, func):
@wraps(func)
def wrapper(*args, **kwargs):
start = time()
result = func(*args, **kwargs)
print(f'{func.__name__} execution time: (time()-start} seconds')
return result
return wrapper
Description: The decorator can be used to decorate classes or functions to provide them with additional capabilities. It belongs to the agent mode in the design pattern.
Extension: The decorator itself can also be parameterized, for example, in the above example, if you don’t want to display the execution time of the function in the terminal but want the caller to decide how to output the execution time of the function , Can be done by parameterizing the decorator, the code is shown below.
from functools import wraps
from time import time
def record_time(output):
"""Parameterizable decorator"""
To
def decorate(func):
To
@wraps(func)
def wrapper(*args, **kwargs):
start = time()
result = func(*args, **kwargs)
output(func.__name__, time()-start)
return result
return wrapper
To
return decorate
Duck type is a method used by dynamic type language to judge whether an object is a certain type, also called duck judgment method. Simply put, duck type refers to judging whether a bird is a duck. We only care about whether it swims like a duck, howls like a duck, and whether it walks like a duck is enough. In other words, if the object's behavior is consistent with our expectations (can receive certain messages), we assume that it is a certain type of object.
In the Python language, there are many bytes-like objects (eg: bytes
, bytearray
, array.array
, memoryview
), file-like objects (eg: StringIO
, BytesIO
, GzipFile
, socket
), path-like objects (such as: str
, bytes
), among which file-like objects can support read
and write
operations, and can be read and written like files. This is the so-called The behavior of a duck can be judged as a duck's judgment method. Another example is the extend
method of lists in Python. The parameters it needs do not have to be lists, as long as it is an iterable object, there is no problem.
Explanation: The duck type of dynamic language greatly simplifies the application of design patterns.
There are four scopes in Python: local scope (Local), nested scope (Embedded), global scope (Global), and built-in scope (Built-in), when searching for an identifier, the search will be performed in the order of LEGB. If the identifier is not found in all scopes, a NameError
exception will be raised.
Closure is a technology that implements lexical binding in programming languages (Python, JavaScript, etc.) that support first-class functions. When capturing a closure, its free variables (variables defined outside the function but used inside the function) will be determined at the time of capture, so that it can run as usual even if it is out of the context of the capture. Simply put, a closure can be understood as a function that can read the internal variables of other functions. Under the circumstances, the local variable of the function ends its life cycle after the function call ends, but the closure makes the life cycle of the local variable extended. When using closures, you need to pay attention. Closures will prevent the objects created in the function from being garbage collected, which may cause a lot of memory overhead, so closures must not be abused.
Topic 29: Talk about the application scenarios and advantages and disadvantages of multithreading and multiprocess in Python.
Thread is the basic unit of operating system to allocate CPU, process is the basic unit of operating system to allocate memory. Usually the program we run will contain one or more processes, and each process contains one or more threads. The advantage of multithreading is that multiple threads can share the memory space of the process, so the communication between processes is very easy to achieve; but if the official CPython interpreter is used, multithreading is subject to the GIL (Global Interpreter Lock) and cannot use the CPU. Multi-core features, this is a big problem. The use of multiple processes can make full use of the multi-core characteristics of the CPU, but the communication between processes is relatively troublesome and requires the use of IPC mechanisms (pipes, sockets, etc.).
Multithreading is suitable for I/O-intensive applications that spend a lot of time on I/O operations, but do not have too many parallel computing requirements and do not take up too much memory. Multi-process is suitable for performing computationally intensive tasks (such as video encoding and decoding, data processing, scientific computing, etc.), tasks that can be decomposed into multiple parallel subtasks and the results of subtasks can be combined, and there are no restrictions on memory usage and no Tasks that are strongly dependent on I/O operations.
Extension: There are usually three options for concurrent programming in Python: multithreading, multiprocessing and asynchronous programming. Asynchronous programming achieves cooperative concurrency. Through the user mode switching of multiple cooperating subprograms, efficient use of the CPU is realized. This method is also very suitable for I/O-intensive applications.
Comments: Don't memorize the so-called reference answer for this kind of question. It is enough to say something that you are most familiar with.
- Both
print
andexec
in Python 2 are keywords, which became functions in Python 3. - There is no
long
type in Python 3. Integers are allint
types. - The inequality sign
<>
in Python 2 is discarded in Python 3, and!=
is used uniformly. - The
xrange
function in Python 2 is replaced by therange
function in Python 3. - Python 3 improves the insecure
input
function in Python 2, and discards theraw_input
function. - The
file
function in Python 2 is replaced by theopen
function in Python 3. - The
/
operation in Python 2 is for the type ofint
to divide. In Python 3, you need to use//
to do the division. - In Python 3, the code for catching exceptions in Python 2 has been improved. Obviously, Python 3 is written in a more reasonable way.
- The scope of loop variables in Python 3 productions has been better controlled and will not affect variables with the same name outside the productions.
- The
round
function in Python 3 can return theint
orfloat
type, and theround
function in Python 2 returns thefloat
type. - The
str
type of Python 3 is a Unicode string, and thestr
type of Python 2 is a byte string, which is equivalent to thebytes
in Python 3. - The comparison operators in Python 3 must compare objects of the same kind.
- Classes defined in Python 3 are all new-style classes. The classes defined in Python 2 are divided into new-style classes (classes that explicitly inherit from
object
) and old-style classes (classic classes). New-style classes and old-style classes are in MRO There is a very significant difference in the problem. The new-style class can use the__class__
attribute to get its own type, and the new-style class can use the__slots__
magic. - Python 3 has stricter requirements for code indentation. If spaces and tabs are mixed, a
TabError
will be raised. - The
keys
,values
, anditems
methods of the dictionary in Python 3 no longer return thelist
object, but return theview object
, and the built-inmap
,filter
and other functions are no longer Return alist
object, but return an iterator object. - The names of some modules in the Python 3 standard library are different from those of Python 2. In terms of tripartite libraries, some tripartite libraries only support Python 2, and some only support Python 3.
"Monkey patch" is a feature of dynamically typed languages. When the code is running, the methods, attributes, functions, etc. in the code are changed without modifying the source code to achieve the effect of a hot patch. Many system security patches are also implemented through monkey patches, but the use of monkey patches should be avoided in actual development to avoid inconsistent code behavior.
When using the gevent
library, we will execute gevent.monkey.patch_all()
at the beginning of the code. The function of this line of code is to replace the socket
module in the standard library, so that we are using In the case of socket
, the code can be coroutineized without modifying any code to achieve the purpose of improving performance. This is the application of the monkey patch.
In addition, if you want to replace the json
in the standard library with the ujson
tripartite library, you can also use the monkey patch method. The code is shown below.
import json, ujson
json.__name__ ='ujson'
json.dumps = ujson.dumps
json.loads = ujson.loads
The Mock
technology in unit testing is also an application of monkey patches. The unittest.mock
module in Python is a module that solves the problem of substituting Mock
objects for the objects dependent on the tested objects in unit tests.
class A:
def who(self):
print('A', end='')
class B(A):
def who(self):
super(B, self).who()
print('B', end='')
class C(A):
def who(self):
super(C, self).who()
print('C', end='')
class D(B, C):
def who(self):
super(D, self).who()
print('D', end='')
item = D()
item.who()
Comments: This question has found two knowledge points:
- MRO (Method Resolution Order) in Python. In the absence of multiple inheritance, a message is sent to the object. If the object does not have a corresponding method, the order of upward (parent) search is very clear. If you go back to the
object
class (the parent class of all classes) and no corresponding method is found, then anAttributeError
exception will be raised. But when there is multiple inheritance, especially when diamond inheritance (diamond inheritance) appears, the MRO must be determined by tracing back to the end to find which method should be found. The classes in Python 3 and the new-style classes in Python 2 use C3 algorithm to determine MRO, which is a method similar to breadth first search; Old-style classes (classic classes) in Python 2 use depth-first search to determine MRO. If you are not sure about the MRO, you can use themro
method or the__mro__
attribute of the class to get the MRO list of the class.- Use of
super()
function. When using thesuper
function, you can usesuper(type, object)
to specify which object to start from which class to search for the parent class method. So thesuper(B, self).who()
in the aboveB
class code means that starting from class B, search upwards for thewho
method ofself
(object of class D), so you will findC
Thewho
method in the class, because the MRO list of theD
class object isD --> B --> C --> A --> object
.
ACBD
Topic 33: Write a function to evaluate inverse Polish expressions. You cannot use Python's built-in functions.
Review: Reverse Polish expression is also called "postfix expression". Compared with the usual "infix expression", reverse Polish expression does not require parentheses to determine the priority of the operation, such as
5 * (2 + 3 )
The corresponding reverse Polish expression is5 2 3 + *
. Inverse Polish expression evaluation requires the use of a stack structure. The scan expression will be pushed into the stack when it encounters an operand, and two elements will be popped out of the stack when it encounters an operator, and the result of the operation will be pushed into the stack. After the expression scan is over, there is only one number in the stack, and this number is the final operation result, which can be popped directly from the stack.
import operator
class Stack:
"""Stack (FILO)"""
def __init__(self):
self.elems = []
def push(self, elem):
"""Into the stack"""
self.elems.append(elem)
def pop(self):
"""Pull"""
return self.elems.pop()
@property
def is_empty(self):
"""Check if the stack is empty"""
return len(self.elems) == 0
def eval_suffix(expr):
"""Reverse Polish expression evaluation"""
operators = {
'+': operator.add,
'-': operator.sub,
'*': operator.mul,
'/': operator.truediv
}
stack = Stack()
for item in expr.split():
if item.isdigit():
stack.push(float(item))
else:
num2 = stack.pop()
num1 = stack.pop()
stack.push(operators[item](num1, num2))
return stack.pop()
There are roughly two ways to implement string replacement in Python: the replace
method of strings and the sub
method of regular expressions.
Method 1: Use the replace
method of strings.
message ='hello, world!'
print(message.replace('o','O').replace('l','L').replace('he','HE'))
Method 2: Use the sub
method of regular expressions.
import re
message ='hello, world!'
pattern = re.compile('[aeiou]')
print(pattern.sub('#', message))
Extension: There is also a related interview question. Sort the list of saved file names. The file names are required to be sorted by alphabet and number size. For example, for the list
filenames = ['a12.txt','a8 .txt','b10.txt','b2.txt','b19.txt','a3.txt']
, the result of sorting is['a3.txt','a8.txt','a12 .txt','b2.txt','b10.txt','b19.txt']
. As a reminder, you can fill in the file name by replacing the string, and use thesorted
function to sort the file name after the fill. You can think about how to solve this problem.
Profiling code performance can use the cProfile
and pstats
modules in the Python standard library. The run
function of cProfile
can execute the code and collect statistics, create a Stats
object and print a simple profiling report. Stats
is a class in the pstats
module, which is a statistical object. Of course, you can also use the tripartite tools line_profiler
and memory_profiler
to analyze the time and memory consumed by each line of code. These two tripartite tools will output the profiling structure in a very friendly way. If you use PyCharm, you can use the "Profile" menu item of the "Run" menu to perform performance analysis on the code. In PyCharm, you can use a table or call graph to display the results of the performance analysis.
The following is an example of profiling code performance using cProfile
.
example.py
import cProfile
def is_prime(num):
for factor in range(2, int(num ** 0.5) + 1):
if num% factor == 0:
return False
return True
class PrimeIter:
def __init__(self, total):
self.counter = 0
self.current = 1
self.total = total
def __iter__(self):
return self
def __next__(self):
if self.counter <self.total:
self.current += 1
while not is_prime(self.current):
self.current += 1
self.counter += 1
return self.current
raise StopIteration()
cProfile.run('list(PrimeIter(10000))')
If you use the line_profiler
tripartite tool, you can directly analyze the performance of each line of code of the is_prime
function. You need to add a profiler
decorator to the is_prime
function. The code is shown below.
@profiler
def is_prime(num):
for factor in range(2, int(num ** 0.5) + 1):
if num% factor == 0:
return False
return True
Install line_profiler
.
pip install line_profiler
Use line_profiler
.
kernprof -lv example.py
The result of the operation is shown below.
Line # Hits Time Per Hit% Time Line Contents
================================================== ============
1 @profile
2 def is_prime(num):
3 86624 48420.0 0.6 50.5 for factor in range(2, int(num ** 0.5) + 1):
4 85624 44000.0 0.5 45.9 if num% factor == 0:
5 6918 3080.0 0.4 3.2 return False
6 1000 430.0 0.4 0.4 return True
Question 36: How to use the random
module to generate random numbers, realize random disorder and random sampling?
Comment: The subject of giving people a head, because the commonly used modules in the Python standard library should be familiar to Python developers. If this question cannot be answered, the entire interview will basically be ruined.
- The
random.random()
function can generate random floating point numbers between[0.0, 1.0)
. - The
random.uniform(a, b)
function can generate a random floating point number between[a, b]
or[b, a]
. - The
random.randint(a, b)
function can generate a random integer between[a, b]
or[b, a]
. - The
random.shuffle(x)
function can realize random shuffle of the sequencex
in place. - The
random.choice(seq)
function can extract a random element from a non-empty sequence. - The
random.choices(population, weights=None, *, cum_weights=None, k=1)
function can randomly sample from the population (with replacement sampling) out of a sample of capacityk
and return a list of samples , You can specify the weight of the individual through the parameter. If the weight is not specified, the probability of the individual being selected is equal. - The
random.sample(population, k)
function can randomly sample (sample without replacement) from the population and get a sample with a capacity ofk
and return a list of samples.
Extension: In addition to generating uniformly distributed random numbers, the function provided by the
random
module can also generate random numbers with other distributions. For example, therandom.gauss(mu, sigma)
function can generate Gaussian distribution (positive Random distribution);random.paretovariate(alpha)
function will generate Pareto distributed random numbers;random.gammavariate(alpha, beta)
function will generate gamma distributed random numbers.
Review: Pooling technology is a typical space-for-time strategy. The database connection pools and thread pools we use are all applications of pooling technology. The
ThreadPoolExecutor of the Python standard library
currrent.futuresmodule
Is the realization of the thread pool, if you want to figure out its working principle, you can refer to the following.
The thread pool is a technology used to reduce the overhead caused by the creation and destruction of the thread itself, which is a typical space-for-time operation. If the application needs to frequently dispatch tasks to threads for execution, the thread pool is a must, because the creation and release of threads involves a large number of low-level system operations, and the overhead is relatively high. If the application can be used during the working period, the threads will be created and released. The operation becomes the pre-creation and borrowing operation, which will greatly reduce the underlying overhead. After the application is started, the thread pool creates a certain number of threads and puts them in the idle queue. These threads are initially blocked and will not consume CPU resources, but will take up a small amount of memory space. When the task arrives, take out an idle thread from the queue, dispatch the task to this thread to run, and mark the thread as occupied. When all threads in the thread pool are occupied, you can choose to automatically create a certain number of new threads for processing more tasks, or you can choose to queue tasks until there are free threads available. After the task is executed, the thread does not exit and ends, but continues to remain in the pool waiting for the next task. When the system is relatively idle, when most threads are idle for a long time, the thread pool can automatically destroy some threads and reclaim system resources. Based on this pre-creation technology, the thread pool allocates the overhead caused by thread creation and destruction to each specific task. The more the number of executions, the smaller the thread overhead shared by each task.
Generally, thread pools must have the following components:
- Thread pool manager: used to create and manage thread pools.
- Work threads and thread queues: the threads actually executed in the thread pool and the containers that hold these threads.
- Task interface: abstract the tasks performed by threads to form a task interface to ensure that the thread pool has nothing to do with specific tasks.
- Task queue: A container in the thread pool that holds tasks waiting to be executed.
To give a simple example, the variable a
is a dictionary, and performing the operation of int(a['x'])
may cause the above three types of exceptions. If there is no key x
in the dictionary, a KeyError
will be raised; if the value corresponding to the key x
is not str
, float
, int
, bool
and bytes-like
type, call When the int
function constructs an object of type int
, it will raise a TypeError
; if a[x]
is a string or byte string, and the corresponding content cannot be processed into int
, it will raise ValueError
.
def extend_list(val, items=[]):
items.append(val)
return items
list1 = extend_list(10)
list2 = extend_list(123, [])
list3 = extend_list('a')
print(list1)
print(list2)
print(list3)
Review: When a Python function is defined, the value of the default parameter
items
is calculated, that is,[]
. Because the default parameteritems
refers to the object[]
, every time you call this function, if you operate on the list referenced byitems
, the next time you call, the default parameter will still refer to the previous list instead of reassigning[]
, so there will be previously added elements in the list. If you re-assignitems
by passing parameters, thenitems
will refer to the new list object instead of the default list object. This question is often asked in interviews. It is usually not recommended to use the default parameters of the container type. Code inspection tools like PyLint will also question and warn this code.
[10,'a']
[123]
[10,'a']
Question 40: How to read a large file, for example, the memory is only 4G, how to read a file with a size of 8G?
Obviously, it is unrealistic for 4G memory to load 8G files at one time. In this case, you must consider multiple reading and batch processing. To read a file in Python, you can first obtain the file object through the open
function. When reading the file, you can specify the read size through the size
parameter of the read
method, or you can use the offset
of the seek
method The parameter specifies the position to read, so that you can control the number of bytes and total bytes of data read at a time. In addition, you can use the built-in function
iter` to process file objects into iterator objects, and only read a small amount of data for processing each time. The code is roughly written as follows.
with open('...','rb') as file:
for data in iter(lambda: file.read(2097152), b''):
pass
On Linux systems, large files can be cut into small pieces by the split
command, and then the data can be processed by reading the cut small files. For example, the following command cuts a large file named filename
into multiple files with a size of 512M.
split -b 512m filename
If you want, you can also cut the file named filename
into 10 files, the command is as follows.
split -n 10 filename
Extended: The external sorting is very similar to the above situation, because the processed data cannot be loaded into the memory at one time, and can only be placed on the external memory (usually a hard disk) with slower reading and writing. "Sort-Merge Algorithm" is a commonly used external sorting strategy. In the sorting phase, first read in the amount of data that can be placed in the memory, sort and output it to a temporary file, and proceed accordingly, organize the data to be sorted into multiple ordered temporary files, and then merge these temporary files in the merge phase The files are combined into a large ordered file, and this large ordered file is the result of sorting.
Each Python file is a module, and the folder where these files are saved is a package, but this folder as a Python package must have a file named __init__.py
, otherwise the package cannot be imported. Usually, a folder can also have subfolders, which means that a package can also have subpackages. The __init__.py
in the subpackage is not necessary. Modules and packages solve the problem of naming conflicts in Python. Different packages can have modules with the same name, and different modules can have variables, functions, or classes with the same name. In Python, you can use import
or from ... import ...
to import packages and modules. When importing, you can also use the as
keyword to alias packages, modules, classes, functions, variables, etc. , So as to completely solve the problem of naming conflicts in programming, especially when developing multi-person collaborative teams.
Review: The company’s Python coding standards basically refer to PEP-8 or Google Open Source Project Style Guide , the latter also mentioned that the Lint tool can be used to check the degree of standardization of the code. Interview When encountering this kind of problem, you can first talk about these two reference standards, and then focus on the precautions for Python coding.
- Use of spaces -Use spaces to indicate indentation and do not use tabs (Tab). -Each level of indentation related to grammar is represented by 4 spaces. -The number of characters in each line should not exceed 79 characters. If the expression is too long and occupies multiple lines, all lines except the first line should be indented with 4 spaces in addition to the normal width. -The definition of functions and classes must be separated by two blank lines before and after the code. -In the same class, each method should be separated by a blank line. -There should be one space on the left and right sides of the binary operator, and only one space is fine.
- Identifier naming
-Variables, functions, and properties should be spelled in lowercase letters, and if there are multiple words, use underscores to connect them.
-The protected instance attributes in the class should start with an underscore.
-Private instance attributes in the class should start with two underscores.
-The names of classes and exceptions should be capitalized.
-Module-level constants should be in all capital letters, and if there are multiple words, use underscores to connect them.
-For instance methods of the class, the first parameter should be named
self
to represent the object itself. -For the class method of the class, the first parameter should be namedcls
to indicate the class itself. - Expressions and statements
-Use the inline form of the negative word instead of putting the negative word in front of the entire expression. For example:
if a is not b
is easier to understand thanif not a is b
. -Don't use the method of checking the length to determine whether a string, list, etc. isNone
or has no elements, you should useif not x
to check it. -Even if there is only one line of code inif
branch,for
loop,except
exception capture, etc., do not write the code together withif
,for
,except
, etc., and write the code separately clearer. -Theimport
statement is always placed at the beginning of the file. -When importing modules,from math import sqrt
is better thanimport math
. -If there are multipleimport
statements, they should be divided into three parts, from top to bottom are Python standard module, third-party module and custom module, each The parts inside should be arranged according to the alphabetical order of the module name.
Question 43: If you run the following code, will it report an error? If you report an error, please explain what is wrong. If you don't report an error, please tell the code execution result.
class A:
def __init__(self, value):
self.__value = value
@property
def value(self):
return self.__value
obj = A(1)
obj.__value = 2
print(obj.value)
print(obj.__value)
Comment: There are two points of investigation in this question, one point is the understanding of the object property access permissions at the beginning of
_
and__
and the@property
decorator, and the other is The understanding of dynamic language does not require too much explanation.
1
2
Extension: If you don't want to dynamically add new attributes to the object when the code is running, you can use the
__slots__
magic when defining the class. For example, we can add a line__slots__ = ('__value', )
to the aboveA
, and run the above code again, and anAttributeError
error will be generated at the original line 10.
prices = {
'AAPL': 191.88,
'GOOG': 1186.96,
'IBM': 149.24,
'ORCL': 48.44,
'ACN': 166.89,
'FB': 208.09,
'SYMC': 21.29
}
Review: The high-level usage of the
sorted
function often appears in interviews. Thekey
parameter can be passed in a function name or a Lambda function, and the return value of the function represents the comparison of elements when sorting in accordance with.
sorted(prices, key=lambda x: prices[x], reverse=True)
Remarks: The
collections
module of the Python standard library provides a lot of useful data structures, which are not clear to every developer. For example, the question callednamedtuple
, in the interview I participated in Among them, 90% of interviewers cannot accurately state its role and application scenarios. In addition,deque
is also a very useful but often overlooked class, as well asCounter
,OrderedDict
,defaultdict
,UserDict
and other classes. Do you know their usage?
When using an object-oriented programming language, defining a class is the most common thing. Sometimes, we will use a class with only attributes but no methods. The objects of this class are usually only used to organize data and cannot receive messages. So we call this kind of data class or degenerate class, just like the structure in the C language. We don't recommend using this degenerate class. You can use namedtuple
(named tuple) to replace this class in Python.
from collections import namedtuple
Card = namedtuple('Card', ('suite','face'))
card1 = Card('heart', 13)
card2 = Card('Grass Flower', 5)
print(f'{card1.suite}{card1.face}')
print(f'{card2.suite}{card2.face}')
Named tuples are immutable containers like ordinary tuples. Once the data is stored in the top-level attribute of namedtuple
, the data can no longer be modified, which means that all attributes on the object follow "write once, multiple The principle of “read twice”. Unlike ordinary tuples, the data in named tuples has access names, and the saved data can be obtained by name instead of index. Not only is it easier to operate, but the code is more readable.
The essence of a named tuple is a class, so it can also be used as a parent class to create subclasses. In addition, named tuples have a series of built-in methods. For example, you can use the _asdict
method to process the named tuples into a dictionary, or you can use the _replace
method to create a shallow copy of the named tuple object.
class MyCard(Card):
def show(self):
faces = ['','A', '2', '3', '4', '5', '6', '7', '8', '9', '10','J', 'Q','K']
return f'{self.suite}{faces[self.face]}'
print(Card) # <class'__main__.Card'>
card3 = MyCard('square', 12)
print(card3.show()) # Block Q
print(dict(card1._asdict())) # {'suite':'红桃','face': 13}
print(card2._replace(suite='block')) # Card(suite='block', face=5)
All in all, named tuples can better organize the data structure and make the code clearer and readable. In many scenarios, it is a substitute for tuples, dictionaries, and data classes. When you need to create immutable classes that take up less space, named tuples are a good choice.
Requirement: Write a function, pass in a list of several integers, and return this element if the number of occurrences of an element in the list exceeds 50%.
def more_than_half(items):
temp, times = None, 0
for item in items:
if times == 0:
temp = item
times += 1
else:
if item == temp:
times += 1
else:
times -= 1
return temp
Comment: The question on LeetCode, which appeared in the Python interview, uses the feature that the number of occurrences of the element exceeds 50%. The occurrence of the same element as
temp
will increase the count value by 1, and appear withtemp
Different elements will decrement the count value by 1. If the count value is
0, it means that the previous element has no effect on the final result. Use
tempto write down the current element and set the count value to
1. Eventually, the element that appears more than 50% will be assigned to the variable
temp`.
Requirement: Write a function, the incoming parameter is a list (the elements in the list may also be a list), and return the maximum nesting depth of the list. For example: the nesting depth of the list
[1, 2, 3]
is1
, and the nesting depth of the list[[1], [2, [3]]]
is3
.
def list_depth(items):
if isinstance(items, list):
max_depth = 1
for item in items:
max_depth = max(list_depth(item) + 1, max_depth)
return max_depth
return 0
Review: Seeing the title, it should be natural to think of using recursive methods to check each element in the list.
Requirement: There is a function to obtain data through the network (may be abnormal due to network reasons), write a decorator so that this function can retry the specified number of times when the specified exception occurs, and retry each time Before random delay for a period of time, the longest delay time can be controlled by parameters.
method one:
from functools import wraps
from random import random
from time import sleep
def retry(*, retry_times=3, max_wait_secs=5, errors=(Exception, )):
def decorate(func):
@wraps(func)
def wrapper(*args, **kwargs):
for _ in range(retry_times):
try:
return func(*args, **kwargs)
except errors:
sleep(random() * max_wait_secs)
return None
return wrapper
return decorate
Method Two:
from functools import wraps
from random import random
from time import sleep
class Retry(object):
def __init__(self, *, retry_times=3, max_wait_secs=5, errors=(Exception, )):
self.retry_times = retry_times
self.max_wait_secs = max_wait_secs
self.errors = errors
def __call__(self, func):
@wraps(func)
def wrapper(*args, **kwargs):
for _ in range(self.retry_times):
try:
return func(*args, **kwargs)
except self.errors:
sleep(random() * self.max_wait_secs)
return None
return wrapper
Remarks: We have emphasized more than once that decorators are almost a must-question content for Python interviews. This topic is slightly more complicated than the previous one. What it needs is a parameterized decorator.
Comment: The problem of bad street is basically a problem of giving away heads.
Method One: Reverse Slicing
def reverse_string(content):
return content[::-1]
Method Two: Reverse stitching
def reverse_string(content):
return''.join(reversed(content))
Method Three: Recursive call
def reverse_string(content):
if len(content) <= 1:
return content
return reverse_string(content[1:]) + content[0]
Method Four: Deque
from collections import deque
def reverse_string(content):
q = deque()
q.extendleft(content)
return''.join(q)
Method 5: Reverse assembly
from io import StringIO
def reverse_string(content):
buffer = StringIO()
for i in range(len(content)-1, -1, -1):
buffer.write(content[i])
return buffer.getvalue()
Method 6: Reverse stitching
def reverse_string(content):
return''.join([content[i] for i in range(len(content)-1, -1, -1)])
Method Seven: Half Exchange
def reverse_string(content):
length, content = len(content), list(content)
for i in range(length // 2):
content[i], content[length-1-i] = content[length-1-i], content[i]
return''.join(content)
Method 8: Counterpoint exchange
def reverse_string(content):
length, content = len(content), list(content)
for i, j in zip(range(length // 2), range(length-1, length // 2-1, -1)):
content[i], content[j] = content[j], content[i]
return''.join(content)
Extension: These methods are actually the same, and it is enough to give several representative ones during the interview. Let me leave a question for everyone. Which of the above methods have better performance? We have mentioned the methods of analyzing code performance before. You can use these methods to check whether the answers you give are correct.
Requirement: There are
1000000
elements in the list, and the value range is[1000, 10000)
. Design a function to find the repeated elements in the list.
def find_dup(items: list):
dups = [0] * 9000
for item in items:
dups[item-1000] += 1
for idx, val in enumerate(dups):
if val> 1:
yield idx + 1000
Review: The solution of this question is the same as the principle of Counting Sort, although the number of elements is very large, but The value range
[1000, 10000)
is not very large, there are only 9000 possible values, so you can use adups
list that can store 9000 elements to record the number of occurrences of each element,dups
list The initial value of all elements is0
, through the traversal of the elements in theitems
list, when an element appears, the value of the corresponding position in thedups
list is increased by 1, and finally the value in thedups
list is greater than 1. The elements of corresponds to the repeated elements in theitems
list.
To view more Python interview questions, please move to my Zhihu column "Python Interview Collection".
Original Credits: 骆昊.
English Credits: Jishan Shaikh