Django 的 SessionMiddleware 源码学习
先来看一个问题,后端是怎样将 session 放置到浏览器中的呢?
Django 在 response 对象中有 session 这样一个字段,在返回消息头时,有Set-Cookie
这样一个响应头,而且是两个,一个是sessionid
,另一个是csrftoken
字段。这两个都是键值对,里面有一些信息。
Django 的中间件为我们封装了 Session 这一对象,Django 对开发者做了非常友好的设定。所以我们来看一看这一层封装。
在 PyCharm 中首先进入SessionMiddleware
:
class SessionMiddleware(MiddlewareMixin):
def __init__(self, get_response=None):
xxx
def process_request(self, request):
xxx
def process_response(self, request, response):
xxx
注:xxx 代表省略。这里实现了两个方法:process_request
和 process_response
。这两个方法是 Django 中间件里面可以实现的 5 个方法中的两个。
#初始化方法
def __init__(self, get_response=None):
self.get_response = get_response
engine = import_module(settings.SESSION_ENGINE)
self.SessionStore = engine.SessionStore
这里的初始化方法,首先将 None 赋值给了一个变量,engine 其实是赋值了一个 db 对象 。跟踪进 settings 发现是一个 LazySettings
对象,它没有 SESSION_ENGINE
变量,在它的父类 LazyObject
也没有此变量,再来详细看一看 LazySettings
:
class LazySettings(LazyObject):
def _setup(self, name=None):
settings_module = os.environ.get(ENVIRONMENT_VARIABLE)
if not settings_module:
desc = ("setting %s" % name) if name else "settings"
raise ImproperlyConfigured(
"Requested %s, but settings are not configured. "
"You must either define the environment variable %s "
"or call settings.configure() before accessing settings."
% (desc, ENVIRONMENT_VARIABLE))
self._wrapped = Settings(settings_module)
进入 Settings 对象:
class Settings(object):
def __init__(self, settings_module):
# update this dict from global settings (but only for ALL_CAPS settings)
for setting in dir(global_settings):
if setting.isupper():
setattr(self, setting, getattr(global_settings, setting))
# store the settings module in case someone later cares
self.SETTINGS_MODULE = settings_module
mod = importlib.import_module(self.SETTINGS_MODULE)
tuple_settings = (
"INSTALLED_APPS",
"TEMPLATE_DIRS",
"LOCALE_PATHS",
)
self._explicit_settings = set()
for setting in dir(mod):
if setting.isupper():
setting_value = getattr(mod, setting)
if (setting in tuple_settings and
not isinstance(setting_value, (list, tuple))):
raise ImproperlyConfigured("The %s setting must be a list or a tuple. " % setting)
setattr(self, setting, setting_value)
self._explicit_settings.add(setting)
# xxxx 还有一些代码
注意到有一个 global_settings
变量,跟踪发现其实是一个 py 文件,它里面定义了各种初始化变量,类似于 settings
一样。在这里终于发现了 SESSION_ENGINE
:
SESSION_ENGINE = 'django.contrib.sessions.backends.db'
。看见了吧,这就是一个操纵数据库的文件。
顺便说一句,这里的代码主要是进行各种设置,以后为了进行 settings.
这种操作。这里可以看到,大量使用了 getattr
和 setattr
这种自省操作。
好了,回到 SessionMiddleware
,来看看 import_module
def import_module(name, package=None):
if name.startswith('.'):
if not package:
raise TypeError("relative imports require the 'package' argument")
level = 0
for character in name:
if character != '.':
break
level += 1
name = _resolve_name(name[level:], package, level)
__import__(name)
return sys.modules[name]
这里主要是通过字符串标识的路径来导入该模块或其下的方法/属性。
还有一个 SessionStore
,它是一个类,主要是为了进行 session 的操作。
来看一看 process_request
:
def process_request(self, request):
session_key = request.COOKIES.get(settings.SESSION_COOKIE_NAME)
request.session = self.SessionStore(session_key)
process_request
在收到 request 之后,未解析 url 之前进行。
这里 session_key 主要就是在 cookies 中取出 sessionid
。其实
SESSION_COOKIE_NAME
是在上面提到的 global_setting.py
中定义的:SESSION_COOKIE_NAME = 'sessionid'
。我们以后任何设置都可以在这里找到原型。
session
是一个 SessionStore
对象。进入:
class SessionStore(SessionBase):
"""
实现数据库会话存储
"""
def __init__(self, session_key=None):
super(SessionStore, self).__init__(session_key)
# xxx 代表还有代码
紧接着进入 SessionBase
:
class SessionBase(object):
"""
Base class for all Session classes.
"""
TEST_COOKIE_NAME = 'testcookie'
TEST_COOKIE_VALUE = 'worked'
__not_given = object()
def __init__(self, session_key=None):
self._session_key = session_key
self.accessed = False
self.modified = False
self.serializer = import_string(settings.SESSION_SERIALIZER)
这里主要进行了赋值,注意到 SESSION_SERIALIZER
,在 global_settings.py
中找到:SESSION_SERIALIZER = 'django.contrib.sessions.serializers.JSONSerializer'
。如果你用过 drf 的话,那么对这个 serializer 一定很熟悉,就是做序列化的。
在设置 session 的时候通常认为它就是一个字典,可以进行字典赋值。字典的赋值调用 __setitem__
方法,获取调用 __getitem__
方法。
def __getitem__(self, key):
return self._session[key]
def _get_session(self, no_load=False):
"""
Lazily loads session from storage (unless "no_load" is True, when only
an empty dict is stored) and stores it in the current instance.
"""
self.accessed = True
try:
return self._session_cache
except AttributeError:
if self.session_key is None or no_load:
self._session_cache = {}
else:
self._session_cache = self.load()
return self._session_cache
_session = property(_get_session)
可以看到这里运用了 property
,就是将一个方法变为属性来操作的装饰器。第一次访问服务器时,_session_cache
没有这个变量,然后 session_key
也是空的,no_load 是 False,所以初始化一个 self._session_cache = {}
。这里总体来说,如果你是第一次访问服务器,那么创建一个 session 的字典。
再来看一看 __setitem__
:
def __setitem__(self, key, value):
self._session[key] = value
self.modified = True
def _get_session(self, no_load=False):
"""
Lazily loads session from storage (unless "no_load" is True, when only
an empty dict is stored) and stores it in the current instance.
"""
self.accessed = True
try:
return self._session_cache
except AttributeError:
if self.session_key is None or no_load:
self._session_cache = {}
else:
self._session_cache = self.load()
return self._session_cache
_session = property(_get_session)
这里与上面变化不大,如果是赋值的时候,那么 _session_cache
已经存在,直接赋值即可。
注意这里还没有保存到数据库中。
再来看一看 process_response
:
def process_response(self, request, response):
"""
If request.session was modified, or if the configuration is to save the
session every time, save the changes and set a session cookie or delete
the session cookie if the session has been emptied.
"""
try:
accessed = request.session.accessed
modified = request.session.modified
empty = request.session.is_empty()
except AttributeError:
pass
else:
# First check if we need to delete this cookie.
# The session should be deleted only if the session is entirely empty
if settings.SESSION_COOKIE_NAME in request.COOKIES and empty:
response.delete_cookie(
settings.SESSION_COOKIE_NAME,
path=settings.SESSION_COOKIE_PATH,
domain=settings.SESSION_COOKIE_DOMAIN,
)
else:
if accessed:
patch_vary_headers(response, ('Cookie',))
if (modified or settings.SESSION_SAVE_EVERY_REQUEST) and not empty:
if request.session.get_expire_at_browser_close():
max_age = None
expires = None
else:
max_age = request.session.get_expiry_age()
expires_time = time.time() + max_age
expires = cookie_date(expires_time)
# Save the session data and refresh the client cookie.
# Skip session save for 500 responses, refs #3881.
if response.status_code != 500:
try:
request.session.save()
except UpdateError:
raise SuspiciousOperation(
"The request's session was deleted before the "
"request completed. The user may have logged "
"out in a concurrent request, for example."
)
response.set_cookie(
settings.SESSION_COOKIE_NAME,
request.session.session_key, max_age=max_age,
expires=expires, domain=settings.SESSION_COOKIE_DOMAIN,
path=settings.SESSION_COOKIE_PATH,
secure=settings.SESSION_COOKIE_SECURE or None,
httponly=settings.SESSION_COOKIE_HTTPONLY or None,
)
return response
主要是在执行视图函数之后返回 Response 后调用的,进行一些 session 的返回等。
首先进行赋值:accessed
,modified
这2个值在之前设置为了 True。这里 empty
:
def is_empty(self):
"Returns True when there is no session_key and the session is empty"
try:
return not bool(self._session_key) and not self._session_cache
except AttributeError:
return True
意思就是当没有 session_key 且 session 是空时候为True
之后进行了一些 cookie 的设置以及向 response 写入了一些东西。接着在 process_response
里面我们发现了一个重要的操作:
request.session.save()
我们知道 session
是一个 SessionStore
对象,所以我们就到那里去寻找 save
操作吧:
def save(self, must_create=False):
"""
Saves the current session data to the database. If 'must_create' is
True, a database error will be raised if the saving operation doesn't
create a *new* entry (as opposed to possibly updating an existing
entry).
"""
if self.session_key is None:
return self.create()
data = self._get_session(no_load=must_create)
obj = self.create_model_instance(data)
using = router.db_for_write(self.model, instance=obj)
try:
with transaction.atomic(using=using):
obj.save(force_insert=must_create, force_update=not must_create, using=using)
except IntegrityError:
if must_create:
raise CreateError
raise
except DatabaseError:
if not must_create:
raise UpdateError
raise
这里主要进行 session 的持久化保存,就是保存到数据库。
浏览器第一次访问不会携带 session 。这里第一次会执行
return self.create()
,进入 create()
def create(self):
while True:
self._session_key = self._get_new_session_key()
try:
# Save immediately to ensure we have a unique entry in the
# database.
self.save(must_create=True)
except CreateError:
# Key wasn't unique. Try again.
continue
self.modified = True
return
这里进行了唯一性判断,session_key
保证不重复。然后进入 _get_new_session_key()
:
# crypto.py
def get_random_string(length=12,
allowed_chars='abcdefghijklmnopqrstuvwxyz'
'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'):
"""
Returns a securely generated random string.
The default length of 12 with the a-z, A-Z, 0-9 character set returns
a 71-bit value. log_2((26+26+10)^12) =~ 71 bits
"""
if not using_sysrandom:
# This is ugly, and a hack, but it makes things better than
# the alternative of predictability. This re-seeds the PRNG
# using a value that is hard for an attacker to predict, every
# time a random string is required. This may change the
# properties of the chosen random sequence slightly, but this
# is better than absolute predictability.
random.seed(
hashlib.sha256(
("%s%s%s" % (
random.getstate(),
time.time(),
settings.SECRET_KEY)).encode('utf-8')
).digest())
return ''.join(random.choice(allowed_chars) for i in range(length))
这里使用了 hashlib
模块用 sha256
进行了加密,包括字符,时间以及 settings
文件中的那个 SECRET_KEY
。所以 那个变量非常重要,千万不要外泄。
创建完成之后,在 create 函数中会保存,即执行 save()
函数,跟上面的一样:
def save(self, must_create=False):
if self.session_key is None:
return self.create()
data = self._get_session(no_load=must_create)
obj = self.create_model_instance(data)
using = router.db_for_write(self.model, instance=obj)
try:
with transaction.atomic(using=using):
obj.save(force_insert=must_create, force_update=not must_create, using=using)
except IntegrityError:
if must_create:
raise CreateError
raise
except DatabaseError:
if not must_create:
raise UpdateError
raise
这次就 if 不通过,然后进入 _get_session
,它里面主要是判断 session 是否缓存了,如果缓存直接返回缓存数据,如果没有,那么构造一个 session_cache
字典 。
接着进入 create_model_instance
:、
def create_model_instance(self, data):
"""
Return a new instance of the session model object, which represents the
current session state. Intended to be used for saving the session data
to the database.
"""
return self.model(
session_key=self._get_or_create_session_key(),
session_data=self.encode(data),
expire_date=self.get_expiry_date(),
)
返回一个 model,跟进之后发现它导入了一个 Session
对象:
class Session(AbstractBaseSession):
"""
Django provides full support for anonymous sessions. The session
framework lets you store and retrieve arbitrary data on a
per-site-visitor basis. It stores data on the server side and
abstracts the sending and receiving of cookies. Cookies contain a
session ID -- not the data itself.
The Django sessions framework is entirely cookie-based. It does
not fall back to putting session IDs in URLs. This is an intentional
design decision. Not only does that behavior make URLs ugly, it makes
your site vulnerable to session-ID theft via the "Referer" header.
For complete documentation on using Sessions in your code, consult
the sessions documentation that is shipped with Django (also available
on the Django Web site).
"""
objects = SessionManager()
@classmethod
def get_session_store_class(cls):
from django.contrib.sessions.backends.db import SessionStore
return SessionStore
class Meta(AbstractBaseSession.Meta):
db_table = 'django_session'
这一堆注释解释的很清楚了。注意底下的 db_table = 'django_session'
,它就是在数据库中表的名称。它继承了父类的一些东西。
然后 db_for_write
是查找数据库的代码。这一块暂时停一下。
接着就开始数据库的事务操作了,确保正确将 session 存入数据库。
以上还有些不详细的地方,以后会慢慢补充。